24

Make Android self-hosting

 3 years ago
source link: http://landley.net/aboriginal/about.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Ab Origine - Latin, "From the beginning".

  • Build the simplest linux system capable of compiling itself.
  • Cross compile it to every target supported by QEMU.
  • Boot it under QEMU (or real hardware).
  • Build/test everything else natively on target.

What is Aboriginal Linux?

Creating system images.

Aboriginal Linux is a shell script that builds the smallest/simplest linux system capable of rebuilding itself from source code. This currently requires seven packages: linux, busybox, uClibc, binutils, gcc, make, and bash. The results are packaged into a system image with shell scripts to boot it under QEMU . (It works fine on real hardware too.)

The build supports mostarchitectures QEMU canemulate (x86, arm, powerpc, mips, sh4, sparc...). The build runs as a normal user (no root access required) and should run on any reasonably current distro, downloading and compiling its own prerequisites from source (including cross compilers).

The build is modular; each section can be bypassed or replaced if desired. The build offers a number of configuration options , but if you don't want to run the build yourself you can downloadbinary system images to play with, built for each target with the default options.

(Note: the goal of the 2.0 release is tofrom busybox, uClibc, and gcc/binutils to toybox, musl-libc, and lvm/lld.)

Using system images.

Each system image tarball contains a wrapper script ./run-emulator.sh which boots it to shell prompt. (This requires the emulator QEMU to be installed on the host.) The emulated system's /dev/console is routed to stdin and stdout of the qemu process, so you can just type at it and log the output with "tee". Exiting the shell causes the emulator to shut down and exit.

The wrapper script ./dev-environment.sh calls run-emulator.sh with extra options to tell QEMU to allocate more memory, attach 2 gigabytes of persistent storage to /home in the emulated system, and to hook distcc up to the cross compiler to move the heavy lifting of compilation outside the emulator (if distccd and the appropriate cross compiler are available on the host system).

The wrapper script ./native-build.sh calls dev-environment.sh with abuild control image attached to /mnt in the emulated system, allowing the init script to run /mnt/init instead of launching a shell prompt, providing fully automated native builds. The "static tools" (dropbear, strace) and "linux from scratch" (a chroot tarball) builds are run each release as part of testing, with the results uploaded to the website .

For more information, seeGetting Started or the presentation slides Developing for non-x86 Targets using QEMU .

Downloading Aboriginal Linux

Prebuilt binary images are available for each target, based on the current Aboriginal Linux release. This includes cross compilers, native compilers, root filesystems suitable for chroot, and system images for use with QEMU.

Thebinary README describes each tarball. Therelease notes explain recent changes.

Even if you plan to build your own images from source code, you should probably start by familiarizing yourself with the (known working) binary releases.

Development

To build a system image for a target, download the Aboriginal Linux source code and run "./build.sh" with the name of the target to build (or with no arguments to list available targets). See the "config" file in the source for various environment variables you can export to control the build. See thesource README for additional usage instructions, and therelease notes for recent changes.

Aboriginal Linux is a build system for creating bootable system images, which can be configured to run either on real hardware or under emulators (such as QEMU ). It is intended to reduce or even eliminate the need for further cross compiling, by doing all the cross compiling necessary to bootstrap native development on a given target. (That said, most of what the build does is create and use cross compilers: we cross compile so you don't have to.)

The build system is implemented as a series of bash scripts which run to create the various binary images. The "build.sh" script invokes the other stages in the correct order, but the stages are designed to run individually. (Nothing build.sh itself does is actually important.)

Aboriginal Linux is designed as a series of orthogonal layers (the stages called by build.sh), to increase flexibility and minimize undocumented dependencies. Each layer can be either omitted or replaced with something else. The list of layers is in thesource README.

The project maintains a development repository using the Mercurial source control system. This includes RSS feeds foreach checkin and fornew releases.

Questions about Aboriginal Linux should be addressed to the project's mailing list, or to the maintainer (rob at landley dot net) who has ablog that often includes notes about ongoing Aboriginal Linux development.

In addition to implementing the above, Aboriginal Linux tries to support a number of use cases:

  • Eliminate the need for cross compiling
  • Allow package maintainers to reproduce/fix bugs on more architectures
  • Automated cross-platform regression testing and portability auditing.
  • Use current vanilla packages, even on obscure targets.
  • Provide a minimal self-hosting development environment.
  • Cleanly separate layers
  • Document how to put together a development environment.

Eliminate the need for cross compiling

We cross compile so you don't have to: Moore's Law has made native compiling under emulation a reasonable approach to cross-platform support.

If you need to scale up development, Aboriginal Linux lets you throw hardware at the scalability problem instead of engineering time, using distcc acceleration and distributed package build clusters to compile entire distribution repositories on racks of cheap x86 cloud servers.

But using distcc to call outside the emulator to a cross compiler still acts like a native build. It does not reintroduce the complexities of cross compiling, such as keeping multiple compiler/header/library combinations straight, or preventing configure from confusing the system you build on with the system you deploy on.

Allow package developers and maintainers to reproduce and fix bugs on architectures they don't have access to or experience with.

Bug reports can include a link to a system image and a reproduction sequence (wget source, build, run this test). This provides the maintainer both a way to demonstrate the issue, and a native development environment in which to build and test their fix.

No special hardware is required for this, just an open source emulator (generally QEMU) and a system image to run under it. Use wget to fetch your source, configure and make your package as normal using standard tool names (strip, ld, as, etc), even build and test on a laptop in an airplane without internet access (10.0.2.2 is qemu's alias for the host's 127.0.0.1.).

Automated cross-platform regression testing and portability auditing.

Aboriginal Linux lets you build the same package across multiple architectures, and run the result immediately inside the emulator. You can even set up a cron job to build and test regular repository snapshots of a package's development version automatically, and report regressions when they're fresh, when the developers remember what they did, and when there are few recent changes that may have introduced the bug.

Use current vanilla packages, even on obscure targets.

Nonstandard hardware often receives less testing than common desktop and server platforms, so regressions accumulate. This can lead to a vicious cycle where everybody sticks with private forks of old versions because making the new ones work is too much trouble, and the new ones don't work because nobody's testing and fixing them. The farther you fall behind, the harder it is to catch up again, but only the most recent version accepts new patches, so even the existing fixes don't go upstream. Worst of all, working in private forks becomes the accepted norm, and developers stop even trying to get their patches upstream.

Aboriginal Linux uses the same (current) package versions across all architectures, in as similar a configuration as possible, and with as few patches as we can get away with. We (intentionally) can't upgrade a package for one target without upgrading it for all of them, so we can't put off dealing with less-interesting targets.

This means any supported target stays up to date with current packages in unmodified "vanilla" form, providing an easy upgrade path to the next version and the ability to push your own changes upstream relatively easily.

Provide a minimal self-hosting development environment.

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." - Antoine de Saint Exupery

Most build environments provide dozens of packages, ignoring the questions "do you actually need that?" and "what's it for?" in favor of offering rich functionality.

Aboriginal Linux provides the smallest, simplest starting point capable of rebuilding itself under itself, and of bootstrapping up to build arbitrarily complex environments (such as Linux From Scratch) by building and installing additional packages. (The one package we add which is not strictly required for this, distcc, is installed it in its own subdirectory which is only optionally added to the $PATH.)

This minimalist approach makes it possible to regression test for environmental dependencies. Sometimes new releases of packages simply won't work without perl, or zlib, or some other dependency that previous versions didn't have, not because they meant to but because they were never tested in a build environment that didn't have them, so the dependency leaked in.

By providing a build environment that contains only the bare essentials (relying on you to build and install whatever else you need), Aboriginal Linux lets you document exactly what dependencies packages actually require, figure out what functionality the additional packages provide, and measure the costs and benefits of the extra code.

(Note: the command logging wrapperrecord-commands.sh can actually show which commands were used out of the $PATH when building any package.)

Cleanly separate layers.

The entire build is designed to let you use only the parts of it you want, and skip or replace the rest. The top level "build.sh" script calls other scripts in sequence, each of which is designed to work independently.

The only place package versions are mentioned is "download.sh", the rest of the build is version-agnostic. All it does is populate the "packages" directory, and if you want to provide your own you never need to run this script.

The "host-tools.sh" script protects the build from variations in the host system, both by building known versions of command line tools (in build/host) and adjusting the $PATH to point only to that directory, and by unsetting all environment variables that aren't in a whitelist. If you want to use the host system's unfiltered environment instead, just skip running host-tool.sh.

If you supply your own cross compilers in the $PATH (with the prefixes the given target expects), you can skip the simple-cross-compiler.sh command. Similarly you can provide your own simple root filesystem, your own native compiler, or your own kernel image. You can use your own script to package them if you like.

Document how to put together a development environment.

The build system is designed to be readable. That's why it's written in Bash (rather than something more powerful like Python): so it can act as documentation. Each shell script collects the series of commands you need to run in order to configure, build, and install the appropriate packages, in the order you need to install them in to satisfy their dependencies.

The build is organized as a series of orthogonal stages. These are called in order from build.sh, but may be run (and understood) independently. Dependencies between them are kept to a minimum, and stages which depend on the output of previous stages document this at the start of the file.

The scripts are also extensively commented to explain why they do what they do, and there's design documentation on the website.

Now that the 1.0 release is out, what are the project's new goals?

  • Move from busybox, uclibc, and gcc/binutils to toybox, musl, and llvm.
  • Untangle distro build build system hairballs into distinct layers.
  • Make Android self-hosting

Move from busybox, uclibc, and gcc/binutils to toybox, musl, and llvm (then qcc).

Now that we've got a simple development environment working, we can make it simpler by moving to better packages. Most of this project's new development effort is going into the upstream versions of those packages until they're ready for use here. In the meantime we're maintaining what works, but only really upgrading the kernel version and slowly switching from busybox to toybox one command at a time.)

uClibc: The uClibc project's chronic development problems resulted in multiple year-long gaps between releases, and after the may 2012 release more than three years went by without a release during which time musl-libc went from "git init" to a 1.0 release. At this point it doesn't matter if uClibc did get another release out, it's over , musl is the more interesting project. (Its limitations are lack of target support, but it's easy to port musl to new targets and very hard to clean up the mess uClibc has become.)

toybox: The maintainer of Aboriginal Linuxused to maintain busybox, butleft that project

and went on to createtoybox for reasons explained at length elsewhere ( video , outline ,merged into Android

).

The toybox 1.0 release should include a shell capable of replacing bash, and may include a make implementation (or in qcc, below). This would eliminate two more packages currently used by Aboriginal Linux.

llvm: When gcc and binutils went GPLv3, Aboriginal Linux froze on the last GPLv2 releases, essentially maintaining its own fork of those projects. Several other projects did the same but most of those have since switched to llvm .

Unfortunately, configuring and building llvm is unnecessarily hard (among other things because it's not just implemented in C++ but the 2013 C++ spec, so you need gcc 4.7 or newer to bootstrap it), and nobody seems to have worked out how to canadian cross native compilers out of it yet. But other alternatives likepccor tinycc are both less capable and less actively developed; since the FSF fell on its sword with GPLv3, the new emerging standard is LLVM.

qcc: In the long run, we'd like to put together a new compiler,qcc, but won't have development effort to spare for it before toybox's 1.0 release. Its goal is to combine tinycc and QEMU's Tiny Code Generator into a single multicall binary toolchain (cc, ld, as, strip and so on in a single executable replacing both the gcc and binutils packages) that supports all the output formats QEMU can emulate. (As a single-pass compiler with no intermediate format it wouldn't optimize well, but could bootstrap a native compiler that would.)

Additional goals for qcc would be to absorb ccwrap.c, grow built-in distcc equivalent functionality, and an updated rewrite of cfront to compile C++ code (and thus natively bootstrap LLVM).

Finishing the full development slate would bring the total number of Aboriginal Linux packages down to four: linux, toybox, musl, and qcc.

(Yes, reducing dependency on GPL software and avoiding GPLv3 entirely is a common theme of the above package switches, there's a reason for that: audio ,outline, see alsobelow.)

Untangle distro build system hairballs into distinct layers.

The goal here is to separate what packages you can build from where and how you can build them.

For years, Red Hat only built under Red Hat, Debian only built under Debian, even Gentoo assumed it was building under Gentoo. Building their packages required using their root filesystem, and the only way to get their root filesystem was by installing their package binaries built under their root filesystem. The circular nature of this process meant that porting an existing distribution to a new architecture, or making it use a new C library, was extremely difficult at best.

This led cross compilng build systems to add their own package builds ("the buildroot trap"), and wind up maintaining their own repository of package build recipes, configurations, and dependencies. Their few hundred packages never approached the tens of thousands in full distribution repositories, but the effort of maintaining and upgrading packages would come to dominate the project's development effort until developers left to form new projects and start the cycle over again.

This massive and perpetual reinventing of wheels is wasteful. The proliferation of build systems (buildroot, openembedded, yocto/meego/tizen, and many more) each has its own set of supported boards and its own half-assed package repository, with no ability to mix and match.

The proper way to deal with this is to separate the layers so you can mix and match. Choice of toolchain (and C library), "board support" (kernel configuration, device tree, module selection), and package repository (which existing distro you want to use), all must become independent. Until these are properly separated, your choice of cross compiler limits what boards you can boot the result on (even if the binaries you're building would run in a chroot on that hardware), and either of those choices limit what packages you can install into the resulting system.

This means Aboriginal Linux needs to be able to build _just_ toolchains and provide them to other projects (done), and to accept external toolchains (implemented but not well tested; most other projects produce cross compilers but not native compilers).

It also needs build control images to automatically bootstrap a Debian, Fedora, or Gentoo chroot starting from the minimal development enviornment Aboriginal Linux creates (possibly through an intermediate Linux From Scratch build, followed by fixups to make debian/fedora/gentoo happy with the chroot). It must be able to do this on an arbitrary host, using the existing toolchain and C library in an architecture-agnostic way. (If the existing system is a musl libc built for a microblaze processor, the new chroot should be too.)

None of these distributions make it easy: it's not documented, and it breaks. Some distributions didn't think things through: Gentoo hardwires the list of supported architectures into every package in the repository, for no apparent reason. Adding a new architecture requires touching every package's metadata. Others are outright lazy; building the an allnoconfig Red Hat Enterprise 6.2 kernel under SLES11p2 is kind of hilariously bad: "make clean" spits out an error because the code it added to detect compiler version (something upstream doesn't need) gets confused by "gcc 4.3", which has no .0 on the end so the patchlevel variable is blank. Even under Red Hat's own filesystem, "make allnoconfig" breaks on the first C file, and requires almost two dozen config symbols to be switched on to finish the compilation, becuase they never tested anything but the config they ship. Making something like that work on a Hexagon processor, or making their root filesystem work with a vanilla kernel, is a daunting task.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK