Introduction and Overview

Little Blue Linux (variously referred to here as LB Linux or LBL) is a basic but usable GNU/Linux operating system distribution that includes all the programs needed to rebuild itself from source code. It is most suitable for use on servers rather than desktop or laptop computers, because it lacks a graphical user interface; but it is as flexible and extensible as any other basic GNU/Linux system with C and C++ development tools installed.

Cross-Building Linux, or CBL, is this book. It outlines a step-by-step process you can follow to build a Little Blue Linux system entirely from source code. If you follow it precisely, what you wind up with is a Little Blue Linux system. If you modify the process, what you wind up with will be a variant or derivative of Little Blue Linux — perhaps you’ll like the result enough to give it a name and make it available to other people!

The aspect of all this that I focus the most on — the thing that is most important to me — is the process: the narrative that describes every necessary component of a GNU/Linux system, how they fit together, and how to bootstrap them all, starting from an existing GNU/Linux system, to create a new complete, minimal, self-hosted GNU/Linux system, entirely from source code.

The most important design goal is that the entire process should be as clear and transparent as possible. Ideally, it should be difficult to read through CBL without understanding how the resulting system works and how it was put together. (I realize that’s a pretty lofty objective, but you’ve got to have dreams.)

A secondary goal, almost as important as the first, is that I want to be sure that every piece of the final system was built from source code. That is, I want to be confident that none of the binary programs or libraries from the initial system — where you start the build — are simply copied to the resulting system. I want to be certain that everything has been rebuilt from the ground up. I’ll talk more about why that matters to me later.

It’s also desirable for the resulting system, Little Blue Linux, to be useful. But that utility is only a tertiary consideration! Mostly what I’m interested in is telling a story about how you can create a GNU/Linux system — the programs and libraries and configuration elements that comprise it — and how all those pieces fit together.

As it turns out, I do find that Little Blue Linux is an outstanding server system: since it has only the software packages that I really need, it has a minimal attack surface, and it’s trivial to rebuild everything with specific compiler optimizations for whatever hardware platform I want to run it on, because the CBL process is explicitly designed to be automated.

It would be strange if I didn’t find Little Blue Linux to be ideal for my purposes — after all, I’m making all the policy decisions about what components to use and how to configure them! That leads to yet another design goal for CBL: I would like to ease the path of anyone who wants to build their ideal system to do exactly that — perhaps by starting with CBL and modifying it to suit their own tastes, or perhaps by doing something entirely different.

If you have ideas about what your own ideal computer system should look like and how you want it to work, maybe CBL will serve as a starting point for that. Even if it does not, the most important thing about CBL is that it is a demonstration that there is no deep mystery or ancient magical lore involved in how GNU/Linux systems work — this book is not exactly short, but there is nothing really hard in it. Anyone who wants to build a custom system exactly to their own taste can do it! You just have to do some work.

1. About Little Blue Linux

Before we get into the details of how we’ll build the system, let’s talk a little bit about what you’ll wind up with once it’s complete. All GNU/Linux systems have a fair amount in common with each other, as well as a handful of differentiating factors; this is a brief overview of what those factors are in LB Linux. All of these are discussed more fully later on in the narrative.

1.1. S6-Based Init System

The init framework — which bootstraps the userspace environment and manages all the background "daemon" processes that are needed in a healthy GNU/Linux system — is based on the s6 suite of programs by Laurent Bercot. This uses a bunch of tiny little programs to manage the basic userspace of the system, rather than a few really big ones — looking at the process table on a basic LBL system, there are 38 s6-related processes running, adding up to a total of 6725 pages of RAM.^[1] In contrast, looking at a systemd-managed system (booted into single-user mode so that an absolutely minimal set of programs is running), there are only seven processes running — but they occupy a total of 23,480 pages of memory.

1.2. Package Users

Rather than maintaining a central database of installed packages, and providing specialized package-management tools to query that database to discover things like what files are part of a package or what package a specific file belongs to, LBL simply creates a separate user account for each package. The files installed by each package are owned by the package-specific user, so standard system utilities can be used to determine package information. To find out what package is responsible for a file, for example, you can just use ls -l; or you can find / -user … to list all the files and directories owned by a package.

1.3. Version Control For Configuration Files

I have, for years, had the habit of maintaining a version control repository of configuration files for the programs I use a lot — especially programs like bash and vim and tmux, whose configurations I have customized, in some cases extensively. A lot of people I know do this — it’s very handy to be able to start with a fresh-out-of-the-box operating system installation, and configure everything the way you like it just by checking out your normal configuration files from version control.

After I’d been doing that for a while, it occurred to me that exactly the same considerations apply for system configuration files — things like the configuration files for sshd and sudo and other such programs. Why not put all of those configuration files in a version control repository as well, so you have a record of what files changed, and when, and for what reason? On servers managed by more than one person, it can also be helpful to know who made those changes.

That’s part of what all Little Blue Linux systems have: a git repository for tracking changes to system configuration and policy files, accessed using a simple wrapper script.

1.4. Modern Versions of Everything

Sometimes, when I’ve been interested in using a program but am using a distribution like Debian or CentOS, I have run up against version constraints on dependencies. The current version of QEMU, say, has a feature I’d like to use, so I try to install it; but then it turns out it needs four or five libraries with a version later than what is provided in the distribution package repositories, and building modern versions of those libraries reveals other updates that they need — it’s frustrating!

Little Blue Linux is not always completely up-to-date, since new versions of packages are released all the time, but it’s reasonably close. A couple of times a year, I go through the packages that make up the base LBL system and update the blueprints to use the latest stable version of everything. So I’ve never run into a situation where I have to upgrade a bunch of other packages to be able to use a modern version of something else.

(To be completely honest, this isn’t as much of a selling point as it could be — because, although there are not ancient or obsolete versions of any package in LBL, there are a huge number of packages for which CBL simply has no blueprints, and you’ll have some work to do to use them at all. But it suits my purposes just fine!)

1.5. Cross-Architecture Build Process

As I mentioned in the Introduction and Overview, I want to be sure that every part of the final system was built from source code, and that no binary code was simply copied from the original host system to the final system. The best way I’ve thought of to make sure of this is to make sure that none of the code from the host system will actually work on the eventual LB Linux system that we are building here. We do that by building everything for a different kind of computer than the one we start the build on. The very first part of the CBL process constructs a cross-toolchain: a compiler and related tools that run on one type of computer — conventionally, this is referred to as the "host" system — but create programs and libraries that will work on some completely different type of computer architecture: the "target" system.

The CBL process itself breaks down naturally into two different parts: the part that you run on the host system, and the part that you run on the target system. I sometimes refer to those as the two "sides" of the CBL process, the "host side" and the "target side."

The idea of using a cross-toolchain, to start with one kind of computer and use it to build a system that works on a different kind of computer, is so fundamental to CBL that I named the process after it. Within that constraint, though, there is a lot of flexibility in how to perform the CBL build. The host can be a physical machine, like an Intel x86-architecture notebook computer or ARM-architecture chromebook, or it can be a virtual machine emulated by a program like QEMU. There are advantages and disadvantages to both options. The target, similarly, can be a physical computer or a QEMU virtual machine; and, again, there are benefits and drawbacks to both. Any of those combinations can work — you can use a physical computer as the host system, then move all the pieces you built there to a different physical computer to finish the build, or you can use a virtual machine as the host system and a different virtual machine as the target system (either or both of which can be emulated computers running a different architecture than the actual physical computer you’re using), or anything else you can think of.

The main benefits of using QEMU — for the host or target or both — are, first, that you don’t need to have a real computer with the architecture you want to use for that side of the CBL process; and, second, that it’s easy to automate the entire build process, since you don’t have to move a physical storage device from one computer to another, or press any actual power buttons, or anything like that.

The main disadvantages of using QEMU are, first, that emulated systems are a lot slower than real computer systems, so the build process takes a long time; and, second, QEMU is sometimes not as stable as a real computer. When running an ARM64-target emulator on a 64-bit Intel notebook computer, QEMU sometimes crashes during the final system glib build, for example. In many cases this appears to be caused by the limited system resources (especially memory) available on the emulated system. When using QEMU to emulate a computer (as opposed to using it to run a virtual machine of the same architecture as the host system), I primarily emulate ARM systems, because QEMU for ARM can emulate a computer with any amount of RAM by using the virt machine type.

In theory, you can follow the CBL process with any kind of computer as the host or target platform, as long as they are supported by the GNU toolchain programs and the Linux kernel, but it seems as though every different CPU architecture presents idiosyncrasies that require additional work to support. That means that if you go outside the host/target pairings that I use for developing and testing CBL, you will probably need to do some additional work.

The physical computer systems that I use are 64-bit x86 (aka Intel- or AMD-architecture) computers, 32- and 64-bit ARM systems, and 32-bit MIPS systems — because those are the types of computers I have handy.

1.6. About This Specific Build

The canonical form of Cross-Building Linux is a set of "blueprint" files that are available in a publicly-accessible git repository.^[2] If you’re reading this as a book or web page, it was produced from those blueprint files by the litbuild program,^[3] from repository commit 5c12463f9fb55511cc0c03be193a372e345dbbf6.

Every time litbuild produces the CBL book, it configures it for a specific type of build, with a particular kind of host system and target system. It also includes instructions on how to launch the target-side build in QEMU if those are relevant, and so on. If this book does not describe the type of build you want to do, all you need to do is obtain the CBL blueprint files and the litbuild program, set some environment variables as described in the Configuration Parameters and Default Values section, and use litbuild to generate a new version of the book.

For this CBL build, the host system is aarch64-unknown-linux-gnu and the target system is x86_64-cbl-linux-gnu. The final system will be called autointel.lblinux.org. Most of the work will be done in the /home/lbl/work/build directory, so the storage device where that directory lives should have enough free space for the build. Fifty gigabytes is probably enough; a hundred definitely is.

If any of that looks wrong — or if any of the other parameters in Configuration Parameters and Default Values are not set the way you want them to be — change the configuration and generate a new book!

1.7. How Long Is This Going To Take?

Building an entire operating system from source code is a lot of work, so it’s reasonable to wonder how much time you’ll be investing in the project. As with so much else in CBL, it depends. Emulators are much slower than real computers; slow CPUs are (obviously) slower than fast ones.

The default build — which uses a real 64-bit x86 computer for the host build and an emulated 64-bit ARM computer (running on the same physical computer) for the target build takes less than an hour for the host side and almost three days for the target side.

More commonly, I use the qemu-to-qemu build style,^[4] which does more work (it executes the Rebuild the Packages Whose Tests Were Skipped appendix after completing the base build), and the entire process takes about three and a half days. Repeating the same qemu-to-qemu build in the other direction, with an emulated ARM computer as the host system and a virtualized (but not emulated!) x86 computer as the target, the host side takes about twenty hours and the target side takes a little less than four; the full process including rebuilding packages with tests takes about twenty seven hours. (Those timings might be unnecessarily long, because I’m typically doing both builds at the same time, and using the computer for other things while they are running.)

If you want to get things to run more quickly, the best thing to do is to get a second computer with a different CPU architecture, so you can avoid emulation entirely. I have an Orange Pi 5 Plus with 32 GiB of RAM and an NVMe storage device;^[5] when I use it as the host system for an aarch64-to-x86_64 build, the entire qemu-to-qemu process takes about eight and a half hours, and when I use it as the target system for an x86_64-to-aarch64 build, the same process takes about sixteen hours. Excluding the extra work done by the qemu-to-qemu build, and just considering the basic CBL process, those take six and twelve hours, respectively. To me that seems pretty good, considering that we’re building a complete operating system from source code!

2. Configuration Parameters and Default Values

2.1. How configuration parameters work

The CBL build can be adjusted and tuned in a variety of ways using the configuration parameters described in this section. To override any parameter from its default value, you can set an environment variable with its name to a new value (or simply modify this blueprint).

The default values here are appropriate for a build taking place on a 64-bit Intel or AMD computer, building a 64-bit ARM target system, and using a QEMU-emulated virtual machine for the target system. (The defaults work regardless of whether the host system is a physical computer or QEMU virtual machine.)

Each sub-section below discusses a related set of configuration parameters.

2.2. Setting The Host And Target Architectures

These parameters determine what type of build is done and what options are used to control aspects of that build.

Parameter: HOST: Value: aarch64-unknown-linux-gnu (default: x86_64-unknown-linux-gnu)

As described in the overview, the CBL build process always uses a cross-toolchain, and produces a build of Little Blue Linux that will run on a different kind of computer than the initial host system. The HOST parameter should be set to the "triplet" of the system where the host side of CBL (that is, the cross-toolchain itself and the target-system "scaffolding" programs built using it) will be constructed. You can read more about "triplets" in the Constructing a GNU Cross-Toolchain section or in the documentation of GNU Autoconf (as of Autoconf 2.71, triplets are described in section 14).

The easiest way to get the correct value here is simply to run the script config.guess, found in the sources for GCC or Autoconf, on whatever computer or virtual machine you’re going to use for the host side of the CBL build.

Parameter: TARGET: Value: x86_64-cbl-linux-gnu (default: aarch64-cbl-linux-gnu)

TARGET is the other parameter used to control cross-compilation. It should be set to the triplet of the target system, whatever computer or virtual machine will be booted into using the scaffolding. The sample configuration files provided as part of CBL may be helpful in figuring out what you should set this to. We conventionally set the second component of the triplet to cbl when following the CBL process.

Parameter: TARGET_GCC_CONFIG: Value: not set (default: --enable-standard-branch-protection)

When configuring GCC for the target system, it may be useful or necessary to specify some set of flags. You can consult the GCC installation instructions and the gcc section for more details about the options available to you.

Parameter: HOST_GCC_CONFIG: Value: not set (default)

It’s sometimes useful or necessary to specify some set of configuration flags when configuring GCC for the host system (that is, for the GCC build done in Trustworthy Host-System Programs, if those are being built). This works just like TARGET_GCC_CONFIG, but for the initial native GCC.

Parameter: TARGET_GMP_CONFIG: Value: not set (default)

Similarly to GCC, the GMP library may need to have some extra configuration flags specified — so that it knows what ABI to build for, for example.

Parameter: KERNEL_ARCH: Value: x86 (default: arm64)

Different packages or programs have different ways of referring to CPU architectures. As mentioned earlier, the GNU toolchain refers to "triplets," which you can read about in Constructing a GNU Cross-Toolchain; the CPU architecture is the first component of the triplet. The Linux kernel has its own naming convention for CPU architectures, which in many (but not all) cases is the same as the CPU field in the triplet.

The default value for the KERNEL_ARCH parameter is an example where the naming convention differs. The GNU toolchain refers to the 64-bit ARM architecture as aarch64 (for "ARM Architecture, 64-bit words"), but the Linux kernel calls it arm64.

Another example is MIPS. The Linux kernel has a single architecture, mips, that is used for all MIPS variants (big-endian and little-endian, with both 32-bit and 64-bit word lengths), but each variant has a different value for the CPU component of the triplet: "mips," "mipsel", "mips64," and "mips64el."

Parameter: TARGET_EXPECTED_MACHINE_NAME: Value: Advanced Micro Devices X86-64 (default: AArch64)

To verify that the cross-toolchain is working as expected, CBL compiles a simple program and then inspects the resulting binary to see whether it is built for the correct kind of machine. If the binary doesn’t indicate that it’s built for the machine type specified by this parameter, the build will halt so you can inspect the situation and see what’s going on.

Parameter: KERNEL_CONFIG: Value: x86_64_defconfig (default: defconfig)

The Linux kernel has approximately a jillion different configuration elements. These determine the hardware devices and features that will be supported by the linux kernel produced by the build. Starting from scratch isn’t necessarily a good idea; luckily, we don’t have to do that, because the kernel is distributed with a set of default configuration files for every supported CPU architecture. This parameter sets the default configuration file that will be used as a starting point for the CBL build.

Parameter: KERNEL_TARGET: Value: bzImage (default: Image.gz)

The linux build process is sometimes inconsistent between CPU architectures. One place this manifests is the name of the kernel file that is produced by the build. It is sometimes vmlinux, sometimes vmlinuz, sometimes bzImage, sometimes Image.gz — As far as I can tell, it’s left to the discretion of whoever is maintaining that architecture within the kernel, and of course everyone has their own preferences.

Parameter: TARGET_SWAP_DEVICE: Value: /dev/sdb (default: /dev/vdb)

If the target-side build has a storage device available for virtual memory, it can be specified as TARGET_SWAP_DEVICE and will be used if such a device is found. If the device doesn’t exist, it won’t cause any problems, though; and if the target does exist but has a filesystem or partition table or anything like that on it, it won’t be touched.

Parameter: TARGET_SYSTEM_CFLAGS: Value: -O2 -fomit-frame-pointer -mtune=native (default: -O2 -fno-omit-frame-pointer)

When building the programs and libraries that will comprise the final system, it is generally desirable to set the CFLAGS (for C) and CXXFLAGS (for C++) environment variables to a common value so that optimization flags are used consistently across the entire system. TARGET_SYSTEM_CFLAGS provides a way to do that.

The presence of the -fno-omit-frame-pointer flag deserves some additional comment. A frame pointer is a pointer to a stack frame; if GCC is told to store frame pointers, it uses a specific CPU register to store a pointer to the stack frame when making function calls. That register is then unavailable for other purposes, which can make code larger and less efficient, but facilitates "unwinding" the stack. This can be useful when trying to diagnose exceptions. The default behavior of GCC was to include frame pointers until GCC 8, and then changed to omit frame pointers.

Since the default build style for CBL targets 64-bit ARM architecture, it is important to set the default CFLAGS to include frame pointers: The AArch64 ABI was designed with the presumption that frame pointers would always be present. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521 for some detail on this.

If you’re doing a CBL build for a different target, and you don’t plan to debug programs using gdb or something similar, you might want to omit -fno-omit-frame-pointer (and possibly add -fomit-frame-pointer) so that the resulting programs and libraries are a little bit smaller and faster.

I normally include an option like -mcpu=native so that everything on the final system is built with optimizations appropriate for the machine where the build is running. With GCC 14, though, I’ve seen that lead to issues in the final system GCC build, so I exclude that flag from the default value here.^[6]

Parameter: TARGET_SYSTEM_MAKEFLAGS: Value: -j10 (default: -j$(nproc))

The make program is used by most projects to automate their build and installation processes. Among many other things, make supports parallelism in builds: if you are using a computer with multiple CPUs and sufficient memory to support several simultaneous compiler processes, you can run make with a -jN command line option, or set the environment variable MAKEFLAGS to include such an option, and make will run up to N processes concurrently. This can speed up builds enormously!

For CBL, the degree of parallelism in build processes should not necessarily be the same on the host system and the target systems, because they may have very different hardware resources available. So to configure the number of concurrent build processes on the host system, simply set MAKEFLAGS as normal when running the host-side scripts. To configure the number of concurrent build processes on the target system, use this parameter!

For target builds that will run in a QEMU emulator, you may want to check whether QEMU supports multi-threaded emulation for the relevant guest architecture — this is the MTTCG feature that you can read about at https://wiki.qemu.org/Features/tcg-multithread. Unless MTTCG is supported by QEMU for that emulation target, there’s no point in specifying any parallelism higher than -j1, because QEMU won’t actually run multiple emulated CPUs simultaneously.

This parameter should generally align to the TARGET_QEMU_CPUCOUNT parameter, if the target will be a QEMU virtual machine. Because virtual machines can be run with any number of virtual CPUs, the default value for this parameter is simply $(nproc), which uses the nproc program from GNU coreutils to determine the current number of CPUs.

Parameter: ENABLE_TARGET_NETWORK: Value: true (default)

The default presumptions made by CBL are that the target system has a wired ethernet interface, that there is a DHCP server available, and that networking should be enabled when the target system is booted. If any of those isn’t the case, set ENABLE_TARGET_NETWORK to any value other than true — in that case, you’ll need to set up networking yourself after the build completes.

Parameter: ENABLE_TARGET_RNGD: Value: true (default)

Similarly, the default presumption for CBL systems is that there is some source of cryptographically-strong random numbers that can be used by the rngd program to keep the kernel’s entropy pool full. (See rng-tools if you don’t know what any of that means.) If that’s not the case, you can set ENABLE_TARGET_RNGD to some non-true value.

Parameter: TARGET_BRIDGE: Value: manual (default: qemu)

There’s more than one way to get from the host side of the CBL build process to the target side. Each of these is defined in a blueprint named target-bridge-NAME (where NAME can be whatever you like); the one that is used for a particular CBL build is the one named in this parameter.

The default CBL process uses the real host computer system for the first part of the build, and an emulated QEMU virtual machine for the target. That’s what the qemu target bridge does. Another option is to use QEMU virtual machines for both the host and target systems; the qemu-to-qemu-build blueprint describes one way to do that.

Parameter: TARGET_MACHINE: Value: normal (default)

Under most circumstances, nothing extraordinary or unusual needs to be done to support particular target devices. However, some single-board computers like the Raspberry Pi require extensive modifications to the Linux kernel to work properly. To support those computers without unnecessarily modifying the kernel for other systems, the blueprint for the Linux kernel has blocks that can be enabled based on the TARGET_MACHINE parameter.

If you want to build LB Linux for one of those devices, set this parameter appropriately.

2.3. Target System QEMU-related Parameters

Different parameters are required depending on whether the target system is a real computer or a virtual machine. The parameters in this section are used whenever the target system is a QEMU virtual machine.

If you’re not using a QEMU virtual machine for the target, most of these are irrelevant — with one exception, TARGET_QEMU_ARCH, as noted in its description.

Parameter: TARGET_QEMU_ARCH: Value: x86_64 (default: aarch64)

The most fundamental of the QEMU-related parameters is QEMU’s name for the target CPU architecture. Like the Linux kernel, the way that QEMU refers to machine architectures is often the same as the CPU field in the triplet — this is the case for 64-bit ARM machines, which both the GNU toolchain and QEMU call "aarch64" (for "ARM Architecture, 64-bit").

Even when the target system is going to be a physical computer rather than a virtual one, it’s important to specify the correct value for TARGET_QEMU_ARCH — QEMU is used to validate that the cross-toolchain works properly, even if it’s not used for the target-side build.

Parameter: TARGET_QEMU_MACHINE: Value: q35,accel=kvm,usb=off (default: virt)

For most architectures, QEMU can emulate a variety of different machines. This parameter lets you select from those. This selection relates to the kernel configuration you start with, which is specified with the KERNEL_CONFIG parameter.

The best documentation for what machine types are supported for different architectures is on the QEMU wiki: https://wiki.qemu.org/Documentation/Platforms is the top-level URL as of October 2023. You can also run the QEMU full-system-emulator program (qemu-system-x86_64 or qemu-system-aarch64 or whatever) with the argument -machine help to get a list of the machines it supports.

Parameter: TARGET_QEMU_CPU: Value: host (default: cortex-a76)

Similarly to the machine argument, QEMU can emulate a variety of CPUs; and you can get a list of the options here with -cpu help. In many cases you don’t really need to specify a CPU because there will be a default value that works fine, but this is not the case with the virt machine that is used in the default configuration.

If the target QEMU system will be run as a native virtual machine rather than an emulator — that is, if the actual computer is an x86_64 machine, and you’re doing a build in an x86_64 virtual machine so you can use the computer for other things while the build is running — you can specify -cpu host to tell QEMU not to emulate a processor at all, and simply act as a hypervisor. You’ll probably also need to specify some type of acceleration, like -accel kvm, in that case.

Parameter: TARGET_QEMU_CPUCOUNT: Value: 10 (default)

The QEMU full-system emulators can provide multiple CPU cores to the guest virtual machine. This may or may not actually be helpful in terms of performance — as mentioned earlier, when talking about TARGET_SYSTEM_MAKEFLAGS, this has historically only been helpful when QEMU is running as a hypervisor, not emulating a different machine architecture, because the TCG code generator that converts guest CPU instructions into host system instructions only ever operated in a single thread. Recent versions of QEMU have supported multi-threaded code generation (the "MTTCG" feature) for some architectures, which provides real parallelism within emulated machines. This speeds up builds dramatically for some packages, up to the limit of parallelism that QEMU emulated machines will accept.

Obviously, it’s a bad idea to set this parameter higher than the number of CPU cores that the system actually has!

Parameter: TARGET_QEMU_RAM_MB: Value: 32768 (default: 8192)

QEMU allows you to define the amount of RAM that will be made available to a virtual machine. The CBL process puts a fair amount of stress on the target system, and the amount of memory available to the compiler — especially for C++ builds — has a huge impact on the reliability of the process as a whole. The default value of 8 GiB works pretty well, but when the host system has more memory, I always give the target as much as I can, up to about 24 GiB.

The virt machine type, available when using the ARM emulators, allows as much memory as you’d like to allocate.

Parameter: TARGET_QEMU_DRIVE_PREFIX: Value: sd (default: vd)

The emulated hardware in QEMU virtual machines is not the same for all architectures and machines. Depending on what hardware is emulated, storage devices might show up as sd (SCSI) devices, hd (IDE or ATAPI) devices, vd (virtual) devices, or possibly something else entirely. This driver-defined prefix must be used when launching QEMU so that the Linux kernel can find the root filesystem, and is also used in QEMU builds when creating partition tables and filesystems.

The way drive prefixes are used in CBL correspond to a historical convention for the way that device files have been named: SCSI disks, for example, are referred to as /dev/sda, /dev/sdb, and so on (appending a letter to the drive prefix to refer to the whole volume); the device files for partitions on a disk simply add a partition number to the whole-volume block file, like /dev/sda1, /dev/sda2, and so on, for /dev/sda. This convention is not always used, though; the convention for NVMe storage is for the devices to be named /dev/nvme0n1, /dev/nvme1n1, and so on; and for partitions on those devices to be /dev/nvme0n1p1, /dev/nvem0n1p2, and so on.

That means that if you’re doing a CBL build using NVMe storage, or some other type of storage that uses a different naming convention than the historical one, you’ll need to tweak the blueprints that refer to the TARGET_QEMU_DRIVE_PREFIX parameters.

Parameter: TARGET_SERIAL_DEV: Value: ttyS0 (default: ttyAMA0)

When building the target system in a QEMU virtual machine, the normal graphics console provided by QEMU is not used. Instead, we take advantage of QEMU’s ability to map the standard input and standard output of the virtual machine process to a simulated serial device. The first serial device on most Linux systems is /dev/ttyS0, but for ARM computers it might show up as /dev/ttyAMA0 instead.

2.4. Target Boot Configuration

Similarly to TARGET_BRIDGE, there’s more than one way to make a GNU/Linux system bootable, and so there are multiple blueprints for doing that. These parameters are used to select which blueprint to use, and (for those that need additional parameters) configure how it should work.

Parameter: BOOTLOADER: Value: grub (default: manual)

The BOOTLOADER parameter selects the blueprint that will be used to set up the boot loader for the target system. The actual blueprint that will be used for this is named setup-bootloader- followed by this parameter; in the case of this specific build, for example, the blueprint is setup-bootloader-grub.

As with the target bridge, the default option here, manual, means you’re on your own when it comes to making the target system bootable — the setup-bootloader-manual blueprint does not do anything.

Parameter: BOOT_DEVICE: Value: /dev/sda (default: not set)

If a boot loader is being installed, it’s a good idea to set BOOT_DEVICE to the name of the block special device that will be used by the boot loader. For example, the GRUB boot loader is typically installed on the first sector (also known as the "boot sector") of a storage device, where it can be found and loaded by the BIOS.

2.5. Target system name and login details

A non-root user account is always created during the CBL process. These parameters control the details that will be used for that account. It’s almost certainly a good idea to override them with values you prefer!

Parameter: LOGIN: Value: lbl (default)

This parameter controls the login name for the non-root user. It’s a good idea to change this to your preferred login name. (The parameter name USER would be more idiomatic, but litbuild uses environment variables to override the default value for configuration parameters, and the bash shell always sets USER to the current user name — so using USER here would conflict with that.)

Parameter: LOGIN_FULL_NAME: Value: A Little Blue User (default)

The UNIX user database has a "comment" field that, for accounts used by actual human users, is conventionally used for the full name of the user. Again, it’s a good idea to change this to your own full name.

Parameter: DOMAIN_NAME: Value: lblinux.org (default: example.org)

A domain name — preferably one that you control, defaulted here to example.org — will be used in various places throughout system setup and configuration.

Parameter: HOST_NAME: Value: autointel (default: cbl)

The final target system will set its hostname to whatever is specified by this parameter (in the DOMAIN_NAME domain).

2.6. Directories Used For the Build Process

The rest of the parameters govern where different parts of the build artifacts will reside. You can set these however you like.

Parameter: QEMU_IMG_DIR: Value: /home/lbl/work (default: /tmp/cblwork)

When targeting a virtual machine, the disk image files used by the QEMU emulator will be created in this directory.

Parameter: CROSSTOOLS: Value: /home/lbl/work/crosstools (default: /tmp/cblwork/cross-tools)

This sets the directory into which the cross-toolchain will be installed.

Parameter: HOSTTOOLS: Value: /usr (default)

This sets the directory into which the Trustworthy Host-System Programs will be installed, if they are being built. If these are not needed, this can be left at the default value of /usr — or, if the host system has QEMU installed in a different location, whatever location that is.

Parameter: SYSROOT: Value: /home/lbl/work/sysroot (default: /tmp/cblwork/sysroot)

The "sysroot" framework is used for the cross-toolchain, and will contain the root filesystem that will be used by the target system. You can read much more about this in the various GCC sections. Initially, during the host stage of the CBL build, this will only contain a single subdirectory, /scaffolding. Absolutely everything else will be created in the target stage of the build.

Parameter: TARFILE_DIR: Value: /home/lbl/materials (default: /tmp/cbl-materials)

The source code for the software packages that make up the CBL system is distributed in files created by the tar program. tar stands for "Tape ARchive" — a term left over from bygone days, when magnetic tapes were the primary mechanism for moving large amounts of data from one system to another. Even though tapes aren’t commonly used any more, tar files, generally compressed, are still used as the primary distribution format for source code on UNIX-ish systems.

This parameter sets the location where CBL will look for all of the source package archives needed during the build.

Parameter: PATCH_DIR: Value: /home/lbl/materials (default: /tmp/cbl-materials)

Sometimes, the source code package that is distributed by a project needs to be modified or adjusted before it is built. This is generally done using the patch utility, and the files that specify the modifications that need to be made are called "patch files."

This parameter sets the location where CBL will look for all of the patch files needed during the build.

Parameter: WORK_SITE: Value: /home/lbl/work/build (default: /tmp/cblwork/build)

When building software from source code, you need to unpack the source code somewhere and then configure, build, and (sometimes) test it before installing it to its final destination directory. The WORK_SITE parameter specifies where all that activity will be done; the name comes from the construction metaphor that litbuild uses. It should be considered a transient or temporary directory, and can be deleted after the build is complete. The full CBL process can use around fifty gigabytes of storage space, so to be on the safe side, define WORK_SITE as some location with at least that much free space available.

Parameter: LOGFILE_DIR: Value: /home/lbl/work/logs (default: /tmp/cblwork/logs)

Everything printed to standard output and standard error throughout the CBL build process will be written to log files; if something goes wrong, the main way to figure out what happened is to look at the log files.

This parameter specifies the location where log files are written during the host side of the CBL process. Log files written during the target side of the build will be on the target system, of course.

Parameter: SCRIPT_DIR: Value: /home/lbl/work/scripts (default: /tmp/cblwork/scripts)

SCRIPT_DIR specifies the location where litbuild will write the bash scripts that automate the build story — the "tangle" side of the literate build system.

Parameter: DOCUMENT_DIR: Value: /home/lbl/work/docs (default: /tmp/cblwork/docs)

DOCUMENT_DIR specifies the location where litbuild will write an AsciiDoc document that tells the build story — the "weave" side of the literate build system.

3. Packages and How To Build Them

The GNU system is a collection of packages. Each package includes some set of programs, libraries, configuration files, and other data files of innumerable types; collectively, all those files make up the system.

This is a pretty basic concept and you should feel free to skip ahead if you already knew that!

When people talk about a Linux system (or, equivalently, a GNU/Linux system), they’re talking about a collection of software packages that have been assembled in a particular way, guided by some policy decisions about how to fit those things together. There are some common elements among all of these systems: they use the Linux kernel to manage hardware resources and provide services to userspace programs; there is some init program that runs those userspace programs (and, in many cases, keeps system programs running); there is a core set of userspace programs that you can reasonably expect to find on the system, like the bash shell and the core GNU utilities… aside from those basic elements, though, there is considerable variation in how different systems are set up, which packages are available, the mechanism used to set up additional programs, which init program is used, how the filesystem is arranged, all can vary widely from one system to another.

Many people and organizations provide easy-to-install distributions (sometimes called "distros") of those packages, policy decisions, and so on, and this is how the vast preponderance of GNU/Linux systems are set up — so much so that people generally talk about these systems in terms of which distribution was installed: RedHat, or Debian, or one of the hundreds of derivatives of those systems, or one of the newer independent distributions like Arch or Void Linux. There are a surprisingly large number of distributions! You can find a timeline of many of the distributions and the relationships between them on the Internet, perhaps here; if that link doesn’t work, try doing an Internet search for "GNU/Linux distribution timeline." Alternatively, spend some time looking at the https://distrowatch.com/ site — they track releases and other activity on a lot of distributions.

The CBL process describes a GNU/Linux distribution, as well: if you follow the process described in this book, you will wind up with a Little Blue Linux system. If you modify the CBL process, then you’ll wind up with your very own distribution — a derivative of Little Blue, just as Ubuntu is a derivative of Debian.

But I digress. All of these systems are, primarily, a collection of software packages, each of which is maintained and released by a person or project team. Generally, these packages are released by their respective project teams as tar archive files containing the package source code and other files. Most of the work that goes into making a GNU/Linux distribution is in taking those release archive files and setting them up as part of the system, then repackaging the result so that it’s easy for users of the distribution to set it up as well.

That process — setting up a new package so you can use it on your system — generally consists of four steps:

You configure the package for your specific system, with some set of configuration settings to control how it will be built and where its files will eventually wind up;
then you compile the package source code into executable programs and libraries;
then you run a suite of automated tests to verify that the program was built successfully and is working as expected; and, finally,
you install the package files into the system so that it is available for use.

Sometimes one or more of these stages is missing for a package — for example, a program may not have a test suite, or a package might be written in a language that is primarily interpreted, like perl or python, so there may not be a compile step — but that sequence of build stages is common enough that the instructions you’ll find here always frame the process of setting up packages in terms of those steps.

3.1. The GNU Build System

Many of the packages that constitute the basic Little Blue system — including almost all of the most fundamental components, like the C standard library and software construction tools — are part of the GNU system created and maintained by the Free Software Foundation. These packages are generally designed to be constructed using the GNU Build System, which includes the autotools for configuration and make for running all the commands that actually compile, test, and install the package.

So many of the packages built during the CBL process use this build system that it forms the default sequence of steps used to set up software packages. If you look at the source blueprints that define the CBL process, you may notice that package blueprints often omit some or all of the commands used to set up the package. That’s because the default sequence of steps can be used for them:

The package is configured using ./configure --prefix=/usr, to use default configuration settings for everything except the location where the package files will be installed (the default for this is usually /usr/local, for reasons we won’t go into here);
the package is compiled by simply running make, which causes the default target to be built;
the test suite is run with make check; and
the package files are installed with make install.

It’s common for packages to be set up using one or two commands that differ from the defaults, so sometimes you’ll see that there’s an explicit definition for the configuration commands, or the test commands, or something like that.

3.2. Executing the CBL Process Manually, Part One

CBL is designed and intended for automation — if you take the source blueprints for CBL, set environment variables for all the configuration parameters you want to override, and run the litbuild program on them, you’ll wind up with a shell script you can run to kick off at least the host side of the build; depending on which "target bridge" you use, the conclusion of that process might automatically kick off the entire second half of the build as well.

The reason for the focus on automated builds is simple: people are, generally, bad at being consistent. That means that any time you want to be able to repeat a process several times consistently and without mistakes, automation is a practical necessity — and I definitely want to be able to repeat the CBL process many times! I sometimes run through the CBL process multiple times in a week.

On the other hand… you may have a different preference. If, for whatever reason, you prefer to type all the commands yourself, or copy and paste them from a PDF or web browser, you can do that. This section has a few tips for that approach.

Several parts of CBL set environment variables that control or influence how things are built. Those enviroment variables are scoped by the section structure of the process, so (for example) when you start the Constructing a GNU Cross-Toolchain section, you should set all the environment variables defined in that section (and explicitly unset any variables that say they should not be defined at all) before you start building any of the packages in that section. But when you’ve finished that part of the process, you should start a new shell process without those variables set.

Aside from those environment variables, every section in CBL either has some commands to run — which are hopefully pretty straightforward — or sets up a package. If you’re following the process manually, here’s how to set up each package:

Unpack the source tarfile. The archive file should always unpack into a new directory; cd to that directory.
If the blueprint specifies any In-Tree Packages, you should unpack the source tarfile for each of those packages, and move the resulting directory to the location specified in the blueprint.
If the blueprint specifies any Patches, you should apply those to the source tree using git apply.
If the blueprint specifies a Build Directory, you should create it and cd to it before proceeding.
Supposing all of that worked without errors, proceed by running all the Configuration commands, then the Compilation commands, then (optionally) the Test commands, and finally the Installation commands.

If anything goes wrong at any stage… well, that will give you a taste of why it’s a lot of work to create and maintain a GNU/Linux distribution. Operating systems are complicated! Things break all the time, and you have to spend time and effort figuring out whether it’s a real issue that has to be addressed or an ephemeral problem that will go away if you just restart whatever process failed.

In some cases, blueprints in CBL define files that should be written to the system at some specific location. These are sometimes written piecemeal throughout the text of a section, interspersed with a narrative explaining why the file contains what it does, but at the end of each such section, you’ll always find the complete text of these files. These should just be copied verbatim to the system, before running any of the commands defined in that section.

This whole process is exactly what the scripts produced by litbuild do automatically, up until the package-users framework is installed. At that point, the process you should use to build packages manually will also change — and we’ll talk more about that when we get there!

4. Patches in Cross-Building Linux

In CBL, we strongly prefer to stay as close as possible to the latest stable released version of every package. Sometimes that’s not feasible or practical, though, for various reasons; in case you’re not familiar with the idea of "patch" files, I’ll talk a bit about what we do in those circumstances.

If you know all about patches, you might want to skip ahead!

UNIX systems have, since time immemorial,^[7] included a userspace program called diff, whose purpose is to find differences between two files or two directory trees. This is really handy in all kinds of circumstances, as you can probably imagine! Any time you want to know what changed in a file, as long as you have a copy of the original version, you can use diff to find exactly what lines are different between the old and new versions. diff also provides options to include lines of context around the changes, and… really, lots of other things; you can read all about its capabilities with man diff or info diff.

Larry Wall, best known for creating the perl programming language, wrote a program that is sort of the reciprocal or inverse of diff: the patch program. If you use diff to find all the differences between two sets of files, and save the diff output to a file, you can then take that output file and feed it to the patch program, and patch will take the first set of files and transform it into the second set of files.

This is really helpful when you want to distribute modifications to source code efficiently! Rather than creating an archive file with the entire modified source code directory, you can use diff to capture all of the differences between the original and modified versions of the source code; then you can distribute the output of the diff program however you need to. Anyone who has both the original version of the source code and the diff output can use patch to reproduce your modified version of the code.

Since the patch program is used to apply these changes, it’s common to refer to the output of diff as "patches" or "patch files," and it’s common to refer to the process of applying those files as "patching."

Note

In Cross-Building Linux, we don’t actually use the patch program to apply patches! The diff and patch programs can only handle changes to text files, not files in binary formats like images or sound files or things like that. The git version control system can handle patches that include changes to binary files, in addition to normal text diffs, and in some cases, the patches used in Cross-Building Linux include this style of binary diffs. These are created with git format-patch rather than diff, and applied with git apply rather than patch. For consistency, we apply all patches that way.

In CBL, we use a few different types of patches, described below. In most cases, we consider these patches to be a part of the CBL project itself, so they are maintained in a git repository you can find at https://codeberg.org/FreeSA/cbl-patches, as well as being available in the CBL file repository at http://files.freesa.org.

4.1. Miscellaneous tweaks or fixes

Sometimes, packages just don’t work the way that we would like them to. For example, when cross-compiling the GNU binutils for some host/target pairs, GCC issues a string truncation warning when compiling gas/config/tc-i386.c. This causes the binutils build to crash, since it’s set to treat all warnings as errors. Modifying a snprintf call to use the correct formatting spec (%hhx rather than %x) resolves the problem, but the binutils maintainers don’t want to simply make that change because of concerns that it might have a negative impact on some of the (many) architectures that binutils supports.

In this kind of situation, CBL applies a patch to make the necessary change.

4.2. Kernel configurations

The Linux kernel has to be configured to include support for whatever features and hardware device drivers are needed, and omit support for features and device drivers that are not desired. This is a pretty complicated process and can be hard to automate.

The configuration system provides a way to start with a named configuration rather than the default settings for an architecture; if this feature is used, a file from arch/*/configs is used to override the default settings.

CBL provides kernel configurations for some specific cases — for example, a configuration for kernels intended to run on EC2 instances in the Amazon Web Services cloud. These default configuration files are added to the Linux source tree via patches.

All of this is described more thoroughly in the Linux sections, so you can consult those for more information.

4.3. Gnulib updates

The GNU project maintains a repository of source code that is intended to be used in other software packages. This repository is called "Gnulib." However, unlike other packages referred to as "libraries," like libxml2 or libgcrypt, Gnulib is not compiled into a library of functions that are then dynamically linked into other programs at runtime. Rather, the expectation is that files from the Gnulib repository will be copied into the source tree of other projects.

This sometimes presents challenges.

An example of this is release 2.28 of the glibc package. This release of glibc removes some obsolete and deprecated header files — an example is libio.h — that are not a part of the C standard library but were part of earlier glibc releases. These header files are not referenced by Gnulib — but until sometime in 2018, they were. Since other packages, like m4 and gzip, still include those old versions of the Gnulib files, those packages won’t build on systems that use glibc 2.28 or later. At least, not until new versions of those packages, using modern versions of the Gnulib components, are released! And some packages are not released very often: as I write this, for example, it’s October of 2023, and the most recent version of m4 is 1.4.19, released in May of 2021 — over two years ago.

When I encounter this situation, my practice is simply to compose a patch by copying in the newest version of whatever Gnulib source files have compilation issues.

4.4. Branch-update patches

The most common — and least objectionable — type of patch used in CBL is the "branch-update" patch.

All large and complicated software packages, like GCC and the GNU C library, have bugs. Some of those bugs are, inevitably, severe. A common practice for project teams is to maintain bugfix branches in their source repositories for at least the most recent few releases of their package(s); as bugs are found and fixed, these changes are back-ported from the current main line of development to these bugfix branches.

It’s my practice to pull in all the updates from these bugfix branches from time to time, and apply those as patches so that the CBL system is as stable and bug-free as possible.

Any time you see a package with a patch called branch-update- along with a year-month-day datestamp in its name, that’s what it is: a compilation of all changes from the upstream project’s bugfix branch, as of whatever date is indicated by the patch file name.

Unlike other CBL patches, these branch-update patches are not tracked in the cbl-patches git repository, although they are present in the CBL file repository. The rationale for this is that it’s trivial to reproduce these patches from the upstream project’s version control repository; all you have to do is obtain that repository and then use a command like git diff glibc-2.28..remotes/origin/glibc-2.28/master to produce a current branch-update patch for glibc 2.28.

4.5. Vendored-Dependency Patches

Almost all packages depend on other packages in one way or another. The most obvious dependency for packages on GNU/Linux systems is the C standard library (in most cases glibc), which is needed by almost every C program. But most packages written in C require other packages in addition to the standard library. As examples, gcc depends on binutils, and cmake depends on curl, expat, zlib, bzip2, and xz.

In the C and C++ language ecosystem, there is no automated way to download and manage other packages, so there is a general tendency to avoid adding dependencies unless there is a compelling reason to do so. That is very convenient for the CBL approach of having a separate blueprint for each package, and explicitly naming dependencies in each blueprint so that litbuild can figure out the complete set of scripts (and documentation sections) it should write based on the blueprint named as its target.

The tools provided with many programming languages — golang, perl, python, and ruby all leap to mind — make it substantially easier to manage dependencies. For example, in a ruby program, it’s common practice to include a file called Gemfile that specifies all of the ruby packages needed by that program; the program bundler (which, conveniently, comes bundled with the ruby language) can be used to download and install all of those pacakges, along with any packages those dependencies rely on, as long as an Internet connection is available. As you might imagine, when a language makes it very easy to manage external dependencies, this sometimes results in projects that have a great many dependencies. In some cases, projects written in these languages depend on over a hundred other modules. This does not fit well with the CBL way of doing things: writing a blueprint for a new package that is actually desired or needed is not much of a burden, but it does not seem like a good use of time or energy to write a hundred or more additional blueprints for all of the other packages it needs. There are a number of alternative approaches that can be used in situations like this. Here, we’re going to discuss one of those approaches, which leverages the idea of dependency "vendoring."

Easily adding, downloading and installing dependencies is a great convenience when developing software or extending it to have new features. It becomes less convenient when integrating multiple programs — each of which may depend on some common subset of external packages, but possibly on different versions of those packages. It can also be frustrating (I speak from personal experience on this!) when automated systems expect to be able to download and install dependency packages at the time that software is being deployed, since the upstream source for those packages may happen to be unavailable at the precise time you want to deploy a new version. To work around these problems, languages that support this kind of easy dependency management generally provide a mechanism for copying the correct version of all external packages directly into a subdirectory of the project code. A common convention is to copy these external packages into a subdirectory called "vendor" (since the contents of that subdirectory were not produced locally, but rather obtained from external "vendors"), and the practice of doing this is referred to as "vendoring" dependencies.

For packages that have so many dependencies that it’s simply not reasonable to write blueprints for them all, a practice that I sometimes use is to vendor those dependencies so that a copy of them is present in the project source directory structure, then create a single "vendored-dependencies" patch for that modification. These patches are generally quite large! For example, the source code archive for version 5.30.0 of the image package is less than 500 kilobytes in size, compressed. A patch that adds all of its dependencies (of which there are a hundred and seventy one in total) is about five and a half megabytes — over eleven times larger than the package source code per se. This approach is not by any means elegant, but it is the least-bad approach I’ve found.

The Host-Side Build

5. Preparing For The Build

Before we can start the CBL build process, there are a couple of things we need to make sure of.

5.1. Required Host-System Programs

First, and most importantly, it’s important to make sure you have all the tools you need on the host system. Many GNU/Linux systems are missing one or more of the programs necessary for CBL, or provide a version of those programs that won’t work properly for one reason or another. So, although it’s not an intrinsic part of the build process per se, CBL includes the Trustworthy Host-System Programs appendix for building the programs that we’ve found to cause problems later on.

The basic set of requirements from the host system include modern versions of the GNU toolchain (GCC, binutils) and build system (make, the autotools, and so on), the QEMU emulator, and the lzip compression program. The file program is also necessary, and must be the same version that is used in the CBL process. If you’re unsure of whether you have everything you need, please do check the Trustworthy Host-System Programs appendix and make sure you have the things built there.

If you’re following the CBL process on a LB Linux system, this is probably unnecessary — all of the packages built in the host-prerequisites appendix are also built in the CBL process and are part of the basic LB Linux system. The only gotcha in that case is the file package, which can only be cross-compiled on a system that has the same version of file installed already. So if you’re using an LB Linux system but the version of file installed there is older than the one that the CBL process currently uses, you’ll need to upgrade that package before you can proceed.

5.2. Final Preparations For The Build

The whole first part of the CBL process — the part that runs on the host system — will need to find the trusted host system programs (if they were built) and the cross-toolchain programs, so we should make sure they are on the PATH. We also need to ensure that shared libraries installed as part of those packages can be found by those programs — if you’re not clear on what that last part means and don’t feel like being patient, you can skip ahead to A Word About The Dynamic Linker, where shared libraries are discussed.

Environment variable: PATH: /home/lbl/work/crosstools/bin:/usr/bin:$PATH
Environment variable: LD_LIBRARY_PATH: /usr/lib:$LD_LIBRARY_PATH

Litbuild provides a feature that allows the scripts it generates to be re-run if the build crashes partway through, by recording which scripts have completed successfully and skipping them in future runs. To activate this feature, we define an environment variable LITBUILDDBDIR.

Environment variable: LITBUILDDBDIR: /home/lbl/work/crosstools/litbuilddb

Now, at last, we’re ready to start the CBL build process.

Building a GNU/Linux system from the ground up is like constructing a building. At least, that’s the metaphor I had in mind while I was designing the CBL process and writing this book.

When constructing a building, it’s sometimes useful to start by assembling a scaffolding around the build site. Then you can climb onto the scaffolding and use it as a support and framework while you construct the building you actually want. Once the final building is complete, you can tear down and discard the scaffolding — it’s not important in and of itself, only as a means to an end.

That’s what we do in the CBL process: we use the cross-toolchain to construct a set of programs and libraries that we can boot into, as an ephemeral "scaffolding" framework from which we can build the actual target system. We build that scaffolding in such a way that it sits alongside the final system as we build it; the scaffolding components won’t conflict with anything that will eventually form part of the final Little Blue Linux system, because everything related to those components will be self-contained within a top-level /scaffolding directory. After the build is complete, we’ll delete /scaffolding.

6. Constructing a GNU Cross-Toolchain

This section describes how to build a GNU cross-toolchain that runs on computers with one CPU architecture — for example, x86_64 — and constructs programs that will run on a different CPU architecture, like MIPS or ARM. As usual in the CBL process, we’re going to use modern versions of all the toolchain components; in this case, that means:

Package Versions:

binutils 2.44
gcc 15.1.0
glibc 2.41
gmp 6.3.0
isl 0.27
linux 6.13.5
mpc 1.3.1
mpfr 4.2.2

6.1. Toolchain Basics

A "toolchain" is the set of all of the programs and libraries required to transform source code into executable programs that you can actually run. It’s called a "chain" because there are multiple programs involved: each program takes some kind of input file and produces some kind of output file, which then becomes the input for the next program in the chain. Each program is like a link in the chain. (When you think about it that way, it’s not really the best metaphor. It’s really a lot more like an assembly line! But "toolassemblyline" sounds awful.)

In this case, we are building a C and C++ toolchain: it will be able to construct executable programs from C and C++ source code.

The toolchain being built here consists of: a preprocessor, which handles include directives and macro calls and things like that; a compiler, which takes C or C++ source code and translates it into assembly language code; an assembler, which takes that assembly code and translates it into binary object code; and a linker, which combines object code produced by the assembler with additional object code contained in libraries and produces executable programs. The process also requires an implementation of the C and C++ standard libraries, which contain a large collection of functions that the linker uses when producing programs. In many cases, the implementation of those functions involves making system calls, which are basically functions provided by the operating system kernel; because the standard libraries make use of system calls, the process of building a toolchain also requires the kernel header files that specify what system calls are available and how to invoke them.

In the GNU toolchain, the preprocessor is cpp and the compilers are gcc (for C source code) and g++ (for C++ source code). All of these — along with the standard C++ library — are contained in the GNU Compiler Collection (gcc) distribution. The assembler and linker are as and ld, and are provided by the GNU binutils ("binary utilities") distribution. The C standard library is distributed separately as the glibc (short for "GNU libc") package, and the kernel header files are part of the Linux kernel distribution.

Since glibc is a rather large library — over a hundred megabytes of source code, producing shared library files that are in the megabytes — it is not very well-suited to building programs that must fit in a compact space such as the flash chip that holds the firmware in wireless routers. For those programs, an alternative C library, such as musl or uClibc, may be more appropriate. Toolchains using those C libraries can be produced using a variant of these instructions.

6.2. Cross-Toolchains

A "cross-toolchain" is much like a normal, or "native," toolchain. The difference is that a cross-toolchain runs on a computer of one type (the "host" system) but builds programs that will run on a different type (the "target" system). For the CBL project, for example, we use common Intel-processor computers to build software that will run on an ARM-architecture CPU. That way we can be reasonably certain that the final system really is entirely built from source, and no binary code was simply copied from the original build system to the target system: code from the build system simply can’t run on the target system, so if any binary code from the host system winds up on the target system, it won’t work at all.

Faux Cross-Toolchains

This same process can be used to build a "faux" cross-toolchain: a set of tools that runs on (for example) Intel computers and builds programs that will also run on Intel systems, but using the cross-toolchain build process instead of building a native toolchain.

That may seem pointless, but this can be a useful technique if you want to be pretty sure that you are building all system components from source code and that those components are entirely independent of the original build system. This is also a handy technique if you’re starting with a non-multilib 64-bit system and want to build 32-bit programs, or something of the sort.

You can also use this approach simply to test that the cross-toolchain build process works properly without needing a second computer or an emulator to execute programs built using it.

CBL is not designed to use "faux" cross-toolchains because I prefer to be really sure that everything is getting built entirely from source code.

To build a GNU cross-toolchain, we pass --host, --target, and --build options to the configure scripts (produced using the autotools programs of the GNU build system) of the toolchain components.

As you may recall from Configuration Parameters and Default Values, the values assigned to these options are called "target triplets," a term that also comes from the GNU build system. A triplet is a string with multiple components that are separated by hyphens. Historically, the triplet has had fields for CPU, manufacturer, and operating system (e.g., mipsel-pc-gnu for a little-endian MIPS CPU, PC hardware, and the GNU operating system); more commonly, these days, the OS field is subdivided into two fields, "kernel" and "system," so for all intents and purposes the target triplet winds up having four components: cpu-manufacturer-kernel-os (e.g., mipsel-pc-linux-gnu). That means that the term "target triplet" is kind of obsolete and misleading, and it would make more sense to refer to them as "target quadruplets." But sometimes history wins over accuracy and clarity.

The manufacturer field is basically free-form; in many cases it’s just set as pc for IBM PC-architecture systems, or left as unknown or none. In CBL we set the manufacturer to cbl, because it’s shorter than unknown and is distinctive: any time you see a triplet like arm-cbl-linux-gnu, you can be fairly confident it was produced using the CBL procedure.

The manufacturer field can be omitted entirely, but that makes the whole situation much more complex and ambiguous: for example, in the triplet arm-linux-gnu, is linux the manufacturer or is it part of the os field? The GNU build system includes a script called config.sub specifically to take triplet strings and figure out what they mean. My advice is: always specify triplets as cpu-manufacturer-kernel-os.

The other components of the triplet — cpu, kernel, and OS — sometimes trigger specific behavior, especially during GCC builds. It’s a good idea to review the "Host/Target specific installation notes for GCC" section of the GCC build instructions when choosing a triplet for your system.

The GNU build system includes a script, config.guess, that tries to figure out what the host triplet is. Generally, this should be used as the HOST configuration parameter in CBL. You can always find an up-to-date copy of config.guess, and the related script config.sub mentioned above, in the GCC source code distribution.

Different combinations of those configure directives, host, target, and build, are used for different toolchain components and at different points in the CBL process. What they mean is:

build: the system where the toolchain components are built
host: the system where the toolchain components will run
target: the system where the resulting artifacts will run

When build, host, and target are all different, it’s called a "Canadian Cross." We’re not sure why. In CBL, we don’t build any Canadian Crosses. During the CBL process, we:

build a native compiler unless we already have one that we can definitely trust; for this one, of course, we don’t need to specify host, build, or target;
use the trusted native compiler to build a cross-toolchain (at this point, build and host are both that of the initial system, and target is the target system type);
use that cross-toolchain to build a target-native toolchain as well as a collection of programs that will be needed on the target system (at this point, build is the initial system, and host and target are both the target system type); and finally
boot into the minimal target system userspace we constructed in the previous step, and use the target-native toolchain built there to construct a new, testable, native toolchain (for which we again don’t need to specify build, host, or target).

Once the cross-toolchain is built, most packages built using it will be configured with build set to the host computer and host set to the target computer.

6.3. The CBL Sysroot Toolchain

The cross-toolchain built here uses the sysroot framework. The idea behind a sysroot toolchain is simple: a directory on the host system (the "sysroot" directory) is set up to contain a subset of what will eventually become the root filesystem of the target system: /bin, /lib, /usr/lib, /etc, that sort of thing. Header files and libraries will be used only from the sysroot location, not from the normal host system locations.

This is fine for most cross-compiling purposes, but it’s not quite perfect for our purposes in CBL.

Remember that the whole point of this cross-toolchain is to build just enough of a GNU/Linux system that it can be booted — the kernel, and a minimal set of userspace programs that we’ll then use to construct the final system. Those components, the "scaffolding," won’t be used any longer than is necessary. This is because we don’t want to rely on the programs produced by the cross-toolchain. By preference, for any program that is distributed with a test suite, we want to run the test suite before we trust it works properly! When you’re cross-compiling programs, you can’t easily run the tests.

So once we have the target system booted, we only use the scaffolding to build the first parts of the final system. As we build those, we install them into the canonical filesystem locations where they belong: the standard directories /bin, /lib, /usr, and so on. To make that process as straightforward as possible, and avoid any interference between the scaffolding and the final system components, we want all the scaffolding to wind up in a /scaffolding subdirectory of the root filesystem. If all of the ephemeral stuff is self-contained within /scaffolding, then obviously it won’t conflict with the final system programs as we build and install them.

Setting up everything so that it’s self-contained within a non-standard location also makes it easier to ensure that our final system doesn’t still rely on any of the components there: once the full system build is complete, we can simply delete the /scaffolding directory and we’ll be left with just the Little Blue Linux system.

When building a toolchain, it’s important to simplify the execution environment as much as possible; any unnecessary compiler or linker flags can cause things to break. Don’t worry about optimizing anything for this stage, either: remember, everything we’re building at this point is ephemeral, and we’ll be throwing it away as soon as we can.

Environment variable: PATH: /home/lbl/work/crosstools/bin:$PATH
Environment variable: LC_ALL: POSIX
Environment variable: CFLAGS: (should not be set)
Environment variable: CXXFLAGS: (should not be set)
Environment variable: LDFLAGS: (should not be set)
Environment variable: LD_LIBRARY_PATH: /usr/lib

This is as good a point as any to discuss the dynamic linker and the way it works.

6.4. A Word About The Dynamic Linker

In most cases, executable programs on Linux systems are linked against shared libraries rather than static libraries. That means that programs don’t contain a copy of the binary code for library functions they invoke; instead, programs have references to those library functions, and have a list of the shared libraries that are expected to contain the implementation for those functions.

Whenever a program is executed, those references to library functions obviously must be resolved — that is, the shared libraries are searched for all the functions needed by the program (and, since those functions might call other library functions as well, libraries are searched for those functions as well, and so on recursively), and linked together so that all of the necessary functions are available. This resolution is done by the dynamic linker, also known as the dynamic loader and, sometimes, as the program interpreter. This is a program included with the GNU C library — or whatever other C library is being used, like musl — and is conventionally installed at /lib/ld.so or /lib/ld-linux.so.

The way that the dynamic linker finds shared library files is a bit complicated, and that complexity is at the root of a lot of the issues that can come up during the CBL build process, so let’s talk about it a bit!

6.4.1. The tl;dr

Here’s a summary of the basics, for anyone who doesn’t need all the grueling details:

There are a bunch of standard system directories that are always used when looking for shared library files — directories like /lib and /usr/lib. If you add a directory to /etc/ld.so.conf, it basically becomes one of those system directories.
If you have library files in a different directory, you can get the linker to look there by setting an environment variable, LD_LIBRARY_PATH.
If, when you’re building a program, you know that the program will need some shared libraries and those libraries will not be in one of the standard system locations, you can make those directories a part of the library search path for that program by giving ld the argument -rpath /whatever/dir:/another/dir. This sets an RPATH for the program, which overrides LD_LIBRARY_PATH: the RPATH will be used before the directories in LD_LIBRARY_PATH are checked.
If you want to do something like RPATH, but you want to make it easy for people who run the program to override the library search path, you can add the ld option --enable-new-dtags in addition to the -rpath option. This will cause ld to set a RUNPATH, which has a different precedence than RPATH; RUNPATH is checked for shared library files after LD_LIBRARY_PATH is checked.

This all matters for CBL because the build process for some of the necessary packages will result in programs that can’t find their shared libraries unless we use an RPATH or RUNPATH; and the build process for other packages sets an incorrect RPATH or RUNPATH that can cause problems unless we remove it.

Usually, ld isn’t executed directly by build processes, but is instead invoked by the gcc driver program. To get gcc to pass an option along to ld, you can give it the option -Wl, which should be followed by additional words separated by commas; gcc will replace the commas with spaces and pass the resulting arguments on to ld. So to set an RPATH of /some/dir and /another/dir, you can give gcc the argument -Wl,-rpath,/some/dir:/another/dir.

An alternative that might work is to set the environment variable LD_RUN_PATH to the desired RPATH before linking — the ld documentation suggests this will work, but I haven’t tried it.

6.4.2. The grueling details

This might make your eyes glaze over a bit.

A lot of this discussion pertains only to programs in the Executable and Linkable Format, commonly abbreviated as ELF. This is the only program format used in modern GNU/Linux systems, but you should be aware that there are other executable program formats, like a.out and COFF, and a lot of the details here don’t apply to those formats.

ELF programs consist of an ELF header followed by some number of segments of various types. If you’d like to examine the full structure of an ELF program, you can use the program readelf, which is part of the GNU binutils package.

When a program is executed, the Linux kernel looks in its .interp section to find the path for its interpreter — in our case, this will be the full path of ld.so (the dynamic linker). That interpreter then looks in dynamic segments of the program for the names of shared library files that it needs, and tries to find them. If any shared library can’t be found, the dynamic linker will terminate with an error.

There’s a bit of additional complexity: shared library files are also in ELF format and can also declare that they need shared libraries. This can result in a chain of library dependencies, potentially a lengthy one! When the dynamic linker is trying to find a library, its behavior is partly determined by which object is having its references resolved: the original program, or one of the libraries it depends on, or one of their libraries, and so on. Whichever ELF file is having its references resolved is called the "loading object" here.

The rest of this section describes the procedure used by the dynamic linker to locate shared library files.

Shared library names can contain slash characters, although they usually do not. (I don’t even know how to get ld to create a program with library dependencies that specify a full path.) If a shared library name does contain any slash characters, it is treated as a relative or absolute path and the dynamic linker will only look for the library at that path.

In the common case, when a shared library is specified without any slash characters, the dynamic linker looks for it in a variety of locations. The algorithm it uses is poorly documented and there’s a lot of contradictory information about it on the web.^[8] After looking around for quite a while, I found a helpful blog post on qt.io that pointed me to the _dl_map_object function in the glibc file elf/dl-load.c, and a review of that code revealed the following algorithm:

If the loading object has a RUNPATH, skip ahead to the LD_LIBRARY_PATH step. Otherwise:
1. Look in the RPATH of the loading object, if any.
2. Consider the thing that loaded the loading object. If it has a RUNPATH, skip ahead to the LD_LIBRARY_PATH step. If not, look in its RPATH.
3. Continue doing this recursively up the loading chain (skipping ahead if you find a RUNPATH, and otherwise looking for libraries in RPATH) until you reach the end of the loading chain. This will normally be the program being executed, but could also be a shared library loaded using the dlopen function.
Look in the LD_LIBRARY_PATH environment variable.
Look in the RUNPATH of the loading object, and once again look up through the loading chain until you reach the end.
Look in the locations found in /etc/ld.so.cache, which is generated from /etc/ld.so.conf using the ldconfig program.
Finally, look in the default directories defined when glibc was compiled; when using the standard build process, this means /lib and /usr/lib.

As soon as the shared library is found in any of those locations, that’s the file that will be used. If it’s not found in any of those locations, the dynamic linker will give up and crash with an error message. And, in case this was not obvious, in all the things with a name that contains PATH — LD_LIBRARY_PATH, RPATH, RUNPATH — you can specify any number of directories separated by colons, just like the PATH environment variable.

A historical note: RPATH has been around longer than RUNPATH. RUNPATH was implemented, along with all the logic about skipping RPATH when a RUNPATH is present, because people realized that when someone specifies an LD_LIBRARY_PATH it’s usually because they really do know what they want the dynamic linker to do, and it’s rude to override that desire at program compilation time.

When you give ld the -rpath option by itself, it just creates an RPATH in the resulting program or library. When you add the option --enable-new-dtags, it still creates an RPATH (in case you run the program with an old dynamic linker that doesn’t understand RUNPATH), but it also creates a RUNPATH so that modern dynamic linkers will ignore the RPATH and use LD_LIBRARY_PATH by preference.

As a reward for reading this far, here’s one last option you can use if none of the above suits your needs: before it does anything else, the dynamic linker loads any .o object files, or .so or .a library files, that are named in the environment variable LD_PRELOAD or in the /etc/ld.so.preload file. Any functions defined in those files are used in preference to any other function definitions found later in the process. That lets you override individual function definitions, if you want to override some part of the program without replacing an entire library with a different version.

6.5. binutils

Name	GNU binary utilities
Version	2.44
Project URL	http://www.gnu.org/software/binutils/
SCM URL	git://sourceware.org/git/binutils-gdb.git
Download URL	https://ftp.gnu.org/gnu/binutils/
Patches	`binutils-2.44-branch-updates-20250425.patch`

Name

GNU binary utilities

Version

2.44

Project URL

http://www.gnu.org/software/binutils/

SCM URL

git://sourceware.org/git/binutils-gdb.git

Download URL

https://ftp.gnu.org/gnu/binutils/

Patches

binutils-2.44-branch-updates-20250425.patch

6.5.1. Overview

The GNU binary utilities package contains a plethora of programs and libraries that can be used to produce, manipulate, and otherwise operate on (compiled and assembled) object files. I’ll briefly describe them all here, but don’t worry! Not only will there not be a test on any of this, you usually won’t even need to invoke any of these programs manually. The gcc driver program will invoke as and ld as necessary to do its work, and several of the other utility programs are similarly used internally during the build process of other system components but you won’t need to use them yourself.

The most important binutils programs are as, the assembler, which transforms assembly source code (.s files) into binary or "object" code (.o files); and ld, the loader or link editor (ld is usually just called the linker, but it’s hard to see why it’s called ld without knowing the other terms), which combines multiple object files into an executable program.

Starting with release 2.44, there are two separate distribution tar archives available: binutils and binutils-with-gold. In addition to the traditional ld linker, which uses the binary file descriptor (BFD) library, the "with gold" distribution also includes an additional linker called "gold." Gold doesn’t use BFD and only works for binaries that are in the "Executable and Linkable Format," aka "ELF". Since that’s the only binary format commonly used in LB Linux, it might be preferable to use gold. I’m not actually sure what the benefits are. The BFD linker is always available as ld.bfd, and the gold linker is available as ld.gold if it is present at all; additionally, one or the other of those programs has a hard link so it can also be executed simply as ld.

Caution

In CBL, we use the binutils-with-gold distribution; but because the naming convention we use for source code archives is to use the package name plus version number, we need to repackage the binutils-with-gold-X.YY tar file as binutils-X.YY. Since that file is also available at the package download location, this could easily be confusing!

The other programs in this package are:

addr2line translates program address locations to filenames and line numbers, which can be helpful during debugging.
ar can be used to create, modify, and extract files from archives.
c++filt converts the mangled function names found in compiled C++ programs back to the original un-mangled names.
gprof lets you run programs with instrumentation that tells you how much time is spent in different parts of the code, which can be helpful when optimizing programs.
nm lists symbols found in object files.
objcopy can translate object files to various alternative formats.
objdump is a disassembler; it can convert binary files into a canonical assembly language.
ranlib generates indexes for archive files.
readelf shows information about ELF-format object files.
size displays sections of an object or archive, along with their sizes.
strings prints out printable character sequences found in binary files.
strip discards symbols or other unnecessary data from object, library, or program files, which reduces their size but makes them much harder to debug.

There are a couple of shared libraries used by those programs and available to others, as well: libbfd and libopcodes. All of the utility programs are documented much more thoroughly in man pages and the binutils info file.

As I mentioned earlier, you don’t need to run any of those programs directly because the gcc driver program streamlines the simple case of building executables — to compile a "Hello, World" program, you just need to run gcc hello.c, and let the gcc driver program run as to assemble compiled source into object files, ld to link multiple object files together to produce an executable, and maybe other programs if it needs to for some reason.

The downside of using a driver program is that it can make complex builds (when, for example, specific options need to be passed to the assembler, compiler, and linker) a lot more complicated and fussy — as you’ll see, from time to time, during the CBL process.

Building From The Source Code Repository

The source code for this package is in the same repository as the GNU Debugger, gdb. I believe this is a historical artifact of the olden times, when the primary source code management system used for those and other packages was CVS, but, for whatever reason, it is still the case today.

If you want to build binutils from the git repository, it’s therefore a good idea to extract just the source code corresponding to that package from the combined repository. Conveniently, the repository contains a script that will do that for you.

After you’ve checked out the specific revision you want to build, simply run ./src-release.sh binutils-with-gold and you’ll wind up with a tar archive file binutils-$VERSION.tar in the top directory of the source tree.

You can similarly construct tarfiles for the gas, gdb, or sim packages by passing those arguments to the src-release.sh script. (I don’t know why you’d want to build a standalone gas package, since gas is the GNU assembler and is part of the binutils package. And I don’t know what sim is at all. But the options are there, regardless.) You can also build a binutils release archive that does not contain gold, by specifying the binutils package target.

Patch:

binutils-2.44-branch-updates-20250425.patch

The combined-source-repository situation described a moment ago also complicates the construction of branch-update patches a little bit: we always produce branch-update patches by running git diff commands against the source repository, and sometimes the resulting patch for binutils includes changes to files that are not part of the binutils distribution — which means they don’t apply cleanly. We adjust for this by applying the patch, fixing any issues we find as a result, and then using the corrected patch here.

6.5.2. binutils (gnu-cross-toolchain phase)

Build Directory	`../build-binutils-2`

Build Directory

../build-binutils-2

Binutils, like some other parts of the toolchain, should be built in a separate directory from the source code. As with other components that we build several times, CBL puts each distinct build in a separate location, so that each build is entirely independent of, and unaffected by, the other builds.

Build Directory: ../build-binutils-2

We build a cross-binutils using the "sysroot" framework (which you’ll read more about shortly). That framework isn’t particularly well-documented, but the important thing at this point is that we need to specify the configure options --with-sysroot and --with-build-sysroot to inform the build machinery of the desired sysroot directory.

All of the toolchain components should be installed in the same filesystem location. The CROSSTOOLS parameter lets you specify that location — in this case, /home/lbl/work/crosstools.

Configuration commands:

${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/crosstools \
  --build=aarch64-unknown-linux-gnu --host=aarch64-unknown-linux-gnu --target=x86_64-cbl-linux-gnu \
  --with-sysroot=/home/lbl/work/sysroot --with-build-sysroot=/home/lbl/work/sysroot \
  --disable-nls --enable-shared --disable-multilib \
  --with-lib-path=/home/lbl/work/sysroot/scaffolding/lib \
  --enable-64-bit-bfd
make configure-host

Some of the warning messages present in GCC 8 and later present problems when compiling the binutils. We can tweak the generated Makefiles so those warnings won’t be converted to errors.

Configuration commands:

sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=stringop-truncation@' bfd/Makefile
sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=stringop-truncation@' gas/Makefile
sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=format-overflow@' binutils/Makefile

Compilation commands:

make

Test commands:

(none)

Installation commands:

make install

6.6. gmp

Name	GNU Multiple Precision arithmetic library
Version	6.3.0
Project URL	https://gmplib.org/
SCM URL	https://gmplib.org/repo/
Download URL	https://gmplib.org/download/gmp/

Name

GNU Multiple Precision arithmetic library

Version

6.3.0

Project URL

https://gmplib.org/

SCM URL

https://gmplib.org/repo/

Download URL

https://gmplib.org/download/gmp/

6.6.1. Overview

GMP, the GNU Multi-Precision arithmetic library, is a component in — or perhaps it is more properly called a dependency of — the GNU toolchain. It allows arithmetic operations to be performed with levels of precision other than the standard integer and floating-point types. Applications can use GMP to provide arithmetic with thousands or millions of digits of precision if that’s what they need. GMP also provides support for rational-number arithmetic, as well as integer and floating-point.

If you’re really interested in high-precision floating-point arithmetic, you might want to look into MPFR rather than GMP! The GMP people say it’s much more complete.

GMP has been needed by the Fortran GCC front-end for some time, but starting with release 4.3.0 of GCC it is needed for C (and C++) as well.

MPC, another dependency of GCC, requires a GMP built with C++ support, so we need to specify that at configure time.

This package is often built in-tree as part of GCC, rather than separately — that’s especially true when the only reason you’re building GMP is because GCC requires it. When using an in-tree build, this blueprint is pretty much irrelevant. However, as of 2015-09-27, there’s an issue with in-tree builds of GMP in some cross-toolchain builds, so for CBL we build it separately.

6.6.2. gmp (gnu-cross-toolchain phase)

This is not necessary, because the system or host-prerequisites version of GMP can be used just fine by the cross-compiler. If things break down here and you haven’t built the Trustworthy Host-System Programs, you should probably do that.

Configuration commands:

(none)

Compilation commands:

(none)

Test commands:

(none)

Installation commands:

(none)

6.7. mpfr

Name	MPFR library
Version	4.2.2
Project URL	https://www.mpfr.org/
SCM URL	(unknown)
Download URL	https://www.mpfr.org/mpfr-current/#download
Dependencies	gmp

Name

MPFR library

Version

4.2.2

Project URL

https://www.mpfr.org/

SCM URL

(unknown)

Download URL

https://www.mpfr.org/mpfr-current/#download

Dependencies

gmp

6.7.1. Overview

MPFR is a library for arbitrary-precision floating-point arithmetic. It stands for "Multiple-Precision Floating-point Rounding," I think, but that’s not really clear from their site so I might be wrong. It uses GMP internally, but provides any level of precision (including very small precision) and provides the four rounding modes from the IEEE 754-1985 standard.

Like GMP, MPFR is a component in, or dependency of, the GNU toolchain. It has been needed by the Fortran GCC front-end for some time, but starting with release 4.3.0 of GCC, MPFR is needed for C and C++ as well. GCC uses MPFR to pre-calculate the result of some mathematical functions when those functions have constant arguments, and produces the same results regardless of the math library or floating point engine used on the runtime system. This occurs in what is called the GCC "middle-end," which is kind of a silly name since it’s not an end.

This package is often built in-tree as part of GCC, rather than separately. However, as of September 2015, in-tree builds of the dependencies have some issues in certain circumstances, so in CBL we step away from the in-tree build facility altogether.

6.7.2. mpfr (gnu-cross-toolchain phase)

As with GMP, this is not necessary because the system or host-prerequisites version of MPFR can be used.

Configuration commands:

(none)

Compilation commands:

(none)

Test commands:

(none)

Installation commands:

(none)

6.8. mpc

Name	GNU Multiple Precision Complex library
Version	1.3.1
Project URL	https://www.multiprecision.org/
SCM URL	(unknown)
Download URL	https://www.multiprecision.org/mpc/download.html
Dependencies	gmp, mpfr

Name

GNU Multiple Precision Complex library

Version

1.3.1

Project URL

https://www.multiprecision.org/

SCM URL

(unknown)

Download URL

https://www.multiprecision.org/mpc/download.html

Dependencies

gmp, mpfr

6.8.1. Overview

MPC is a C library for arbitrary-precision arithmetic on complex numbers providing correct rounding. It can be thought of an extension to the MPFR library.

Like GMP and MPFR, MPC is a component in, or dependency of, the GNU toolchain. I haven’t been able to find any description of what features of MPC are actually needed by the GNU toolchain, but MPC is a hard build-time dependency of GCC.

If your system already has MPC installed, this step can be skipped.

Like the other GCC library dependencies, this package is often built in-tree as part of GCC. As mentioned earlier, though, this sometimes introduces problems, so CBL doesn’t take advantage of the in-tree build machinery provided by GCC.

6.8.2. mpc (gnu-cross-toolchain phase)

As with GMP, this is not necessary if there is a system or host-prerequisites version of MPC available.

Configuration commands:

(none)

Compilation commands:

(none)

Test commands:

(none)

Installation commands:

(none)

6.9. isl

Name	Integer Set Library
Version	0.27
Project URL	http://isl.gforge.inria.fr/
SCM URL	git://repo.or.cz/isl.git
Download URL	https://libisl.sourceforge.io/
Dependencies	gmp

Name

Integer Set Library

Version

0.27

Project URL

http://isl.gforge.inria.fr/

SCM URL

git://repo.or.cz/isl.git

Download URL

https://libisl.sourceforge.io/

Dependencies

gmp

6.9.1. Overview

The project homepage describes ISL as a library for manipulating sets and relations of integer points bounded by linear constraints. I’m not sure what that means. It sounds like math.

Unless you have some specific reason to want ISL for one of your own projects, the main benefit it provides is as an optional dependency of GCC: the "graphite loop optimizations" in GCC (whatever those are) require ISL to be available.

6.9.2. isl (gnu-cross-toolchain phase)

As with GMP, this is not necessary if there is a system or host-prerequisites version of ISL available.

Configuration commands:

(none)

Compilation commands:

(none)

Test commands:

(none)

Installation commands:

(none)

6.10. linux

Name	Linux kernel
Version	6.13.5
Project URL	http://www.kernel.org/
SCM URL	https://git.kernel.org/
Download URL	http://www.kernel.org/
Patches	`linux-6.13.5-aws-ami-config-1.patch`

Name

Linux kernel

Version

6.13.5

Project URL

http://www.kernel.org/

SCM URL

https://git.kernel.org/

Download URL

http://www.kernel.org/

Patches

linux-6.13.5-aws-ami-config-1.patch

6.10.1. Overview

The kernel is Linux per se — the foundation of the operating system. Often, when people say "Linux," they mean the entire operating system that lets them use a computer; properly, though, Linux is just the kernel. Many — perhaps most — of the other components that make up the full operating system are pieces of the Free Software Foundation’s GNU project, which stands for "GNU’s Not UNIX"; almost all of the program-construction tools that form the foundation of the system are GNU components.

Linux or GNU/Linux?

This fact — that the foundation of the system consists of components from the GNU system — is the basis of the Free Software Foundation’s suggestion that the operating system as a whole should be called "GNU/Linux": it’s a combination of the software that makes up the GNU system and the Linux kernel.

That’s not a bad point, and so generally when I am speaking of the whole system I tend to refer to it either as GNU/Linux or sometimes as Linux/GNU (because, you know, symmetry). But this process is called "Cross-Building Linux" rather than "Cross-Building GNU/Linux" simply because I think it sounds nicer.

The job of the kernel — this is a critically important point and I repeat it a lot — is to initialize and manage all of the hardware on the computer (including CPUs, memory, and I/O devices) and provide services to userspace processes. That’s a big job, but it’s also a limited one! Everything you do with your computer is the responsibility of userspace processes.

The way that Linux performs its job — more generally, the way that UNIX kernels work — is conceptually very simple: when it is executed on a computer, Linux does some hardware initialization, mounts the root filesystem, and then starts a single userspace process. That process has process ID (PID) 1, and is conventionally called init.

The init process is responsible for starting all other userspace programs and getting the machine into a usable state; after the kernel starts init, it gets out of the way and waits for that process, or for other userspace programs, to request its services.

The complexity in the Linux kernel emerges mostly from the huge variety of hardware that it supports and the many layers of functionality that can be built into it. If you unpack the Linux source code, you’ll find that the whole thing adds up to about (as of the 6.6 kernel) 1477 megabytes in total. Of that, almost ten percent (144 megabytes) is specific to the twenty-six different CPU architectures that Linux supports, and about two-thirds (971 megabytes) is device drivers. That means that for any given system, a large majority of the code that makes up the Linux kernel won’t ever be used.

In CBL, the kernel is kept as pristine as possible. In most cases, the only patches we apply to it are intended to add new default configuration settings — which is more convenient than setting dozens of configuration options manually.

Patch:

linux-6.13.5-aws-ami-config-1.patch

6.10.2. linux (gnu-cross-toolchain phase)

The way that programs make requests of the kernel is by invoking kernel functions known as "system calls." The Linux kernel sources include header files that define all of the system calls it makes available to userspace programs.

Userspace programs don’t usually invoke system calls themselves (although they can, and some do). Instead, they invoke library functions — particularly the ones defined in the C standard library, which on GNU/Linux systems is usually the GNU libc, glibc, but might be musl or uClibc or something else altogether. Those library functions invoke system calls as needed to accomplish their work.

In this step, we’re not actually building the Linux kernel; all we’re doing is installing the header files that specify those system calls. These header files are then used by the C library and other userspace libraries and programs to invoke system calls as needed.

The Linux source tree also includes a number of other, private, header files — these define data structures and functions that should be used only within the kernel itself, and are not intended to be visible to userspace programs. The Makefile target we use here, headers_install, installs only the public header files, not these private internal headers.

The makefile target mrproper puts the source tree into a completely pristine state. The name is a reference to the Proctor & Gamble cleaning product that people in the United States know as "Mr. Clean"; in many parts of Europe, it is marketed under the brand name "Mr. Proper."

Configuration commands:

make mrproper

Compilation commands:

(none)

Test commands:

(none)

The install_headers target deletes the entire target directory before it does the installation. This is inconvenient, because we might have header files there that we want to retain (such as from the binutils build). To avoid any issues, we’re going to install the headers to a temporary location and copy them to the real target location from there.

Notice that we’re actually installing the headers into the scaffolding directory underneath the sysroot directory. That’s common to the entire host-side portion of CBL: Everything that will be conveyed to the target system is under the scaffolding location. That way, when the final system is completed, we can get rid of /scaffolding and be fairly confident that everything remaining has been built entirely from source and (if it continues to work) has no dependencies on the scaffolding programs and libraries.

Installation commands:

make ARCH=x86 INSTALL_HDR_PATH=_dest headers_install
install -dv /home/lbl/work/sysroot/scaffolding/include
cp -rv _dest/include/* /home/lbl/work/sysroot/scaffolding/include
rm -rf _dest

6.11. gcc

Name	GNU Compiler Collection
Version	15.1.0
Project URL	http://gcc.gnu.org/
SCM URL	git://gcc.gnu.org/git/gcc.git
Download URL	https://ftp.gnu.org/gnu/gcc/
Patches	`gcc-15.1.0-branch-updates-20250425.patch` `gcc-15.1.0-fix-relocation-headers-1.patch` `gcc-15.1.0-fix-missing-rpath-1.patch`
Dependencies	gmp (gnu-cross-toolchain phase), mpfr (gnu-cross-toolchain phase), mpc (gnu-cross-toolchain phase), isl (gnu-cross-toolchain phase), binutils (gnu-cross-toolchain phase)

Name

GNU Compiler Collection

Version

15.1.0

Project URL

http://gcc.gnu.org/

SCM URL

git://gcc.gnu.org/git/gcc.git

Download URL

https://ftp.gnu.org/gnu/gcc/

Patches

gcc-15.1.0-branch-updates-20250425.patch
gcc-15.1.0-fix-relocation-headers-1.patch
gcc-15.1.0-fix-missing-rpath-1.patch

Dependencies

gmp (gnu-cross-toolchain phase), mpfr (gnu-cross-toolchain phase), mpc (gnu-cross-toolchain phase), isl (gnu-cross-toolchain phase), binutils (gnu-cross-toolchain phase)

6.11.1. Overview

GCC is the GNU Compiler Collection. This is the single most important package in CBL, even including the Linux kernel itself: without a compiler, you can’t build any software. We use GCC to bootstrap everything, including itself.

Unfortunately, GCC is also probably the most complex package we need to build, and it has a very complex configuration and build process because it is tied so intricately to other packages. This is particularly true of glibc, the C standard library we use in CBL, which also has a complex build process of its own!

On the plus side, if you can get GCC built and working properly, you’re past the biggest hurdle of building a complete GNU/Linux system entirely from source.

The GCC installation process is documented in the "Installation" manual, found in the source distribution in the INSTALL directory. If you have problems getting GCC built, that’s an excellent resource for figuring out what’s going wrong. If you want to get a better understanding of the GCC configuration and build process — how it actually works — read the "Source Tree Structure and Build System" section of the GCC Internals document, found in the source distribution in gcc/doc/gccint.info.

The most important compilers in the collection, for purposes of bootstrapping a system, are the C and C++ compilers; but GCC also includes compilers for D, Fortran, Go, Ada, Objective-C, and probably other languages as well. And it supports a huge number of machine architectures! Since CBL relies absolutely on having support for at least two types of computer for every build, that support for many architectures is probably the most important aspect of GCC for our purposes.

6.11.2. `gcc`: The driver program

The program you usually invoke to build C programs, gcc, is not actually a compiler. It’s just a driver that knows how to invoke other programs, using rules called "spec strings" that tell it exactly what other programs it needs to invoke, and what command-line arguments it should provide, to turn source code into an executable program. (We’ll talk more specifically about spec strings in Adjusting the GCC specs.)

I find it really helpful to keep that in mind throughout toolchain construction, so I’ll expand on that point: gcc just invokes other programs. Some of those programs — the C preprocessor cpp, cc1 (the C compiler itself), and internal utility programs like collect2 (which is sort of a first-stage linker, used to set up calls to constructors and other initialization routines as a program starts to run) — are part of the GCC package. Others, like the as and ld programs, are distributed separately (in the case of as and ld, that package is GNU binutils).

To get from source code to an executable C program, gcc actually performs a series of steps:

First, it invokes cpp to pre-process the source, include header files, resolve macros, and things like that;^[9]
then it invokes cc1 to transform the pre-processed source into assembly language code;
then it invokes as to transform the assembly code into object code;
and finally it invokes ld to combine all the bits of object code, along with code from libraries, into an executable program.footnote[Again, this is conceptually true but lacks technical precision. In reality, gcc invokes collect2, which does some other stuff and then executes the actual ld program.

You can find out exactly what commands gcc is running by giving it the -v (for "verbose") command line argument. That’s a handy trick when things are going wrong and you’re not sure why!

The term "compilation" — which is the job of the compiler — actually refers only to the second of those steps: source code is compiled into assembly code. That means it’s not precisely correct to say that you’re "compiling" source code into an executable program! That’s a common conversational shorthand, but it masks the complete story about what’s going on. It would be more precise to say that you’re "pre-processing, compiling, assembling, and linking" source code to produce an executable program.

On the other hand, that’s a lot of words, and it’s seldom really important to keep the complete story in mind. This is probably why the shorthand term is so popular.

6.11.3. How C and C++ Programming Languages Work in GNU/Linux

Note

In Cross-Building Linux, when we first discuss the package that provides a programming language, we include a section like this one that describes some high-level details about how that language works in GNU/Linux systems generally, and in Little Blue Linux in particular. (Please keep in mind, though, that I am not an expert in every programming language, so some of these sections are very limited! If you want to know more about any language, it’s a good idea to go to the project homepage and seek more details there.) That seems a little strange in the case of C and C++, since those are so intrinsic to the system: the packages that make up the whole foundation of the system are implemented in C or C++, and a great deal of the base LB Linux system is present entirely to support them. But I think it’s a good idea to establish the convention now and maintain it consistently throughout.

Here, I’m focusing mainly on the C programming language. C and C++ are very closely related, to the point that the earliest C++ compiler didn’t produce binary object code at all; it compiled C++ programs into C source code, which was then compiled into binary code by a C compiler. A lot of the details about how C works apply exactly the same in C++; the differences are mostly things like the naming convention used for source files^[10] and the program used to invoke the compiler (g++ rather than gcc).

I’m not discussing other languages, like Objective-C or Fortran, at all, because I don’t know anything about how they work!

C source code files generally have the extension .c. Along with these, you’ll often find header files, typically with the extension .h; these contain function declarations, macro definitions, constant definitions, and other things like that. These files can contain preprocessor directives, which are evaluated and executed by the C preprocessor before the code is compiled. Commonly, C source files contain #include directives that cause the preprocessor to copy the content of header files into the C file as part of this evaluation.

In addition to the header files that are included as part of the source code for C packages, UNIX systems have system header files that can be included by any program that needs them. These files can typically be found in the /usr/include directory and its subdirectories.

In addition to executable programs, packages can install libraries. These are basically collections of related functions that can then be used in other programs. These libraries are commonly installed in the /lib and /usr/lib directories (and their subdirectories). Sometimes, libraries are installed into variants of those directories, which can be set up on systems that support more than one kind of CPU instructions. For example, the AMD64 CPUs that are used in modern Intel and AMD systems support both 64-bit and 32-bit instruction sets, and GCC can produce both 64-bit and 32-bit object code. If you have one of these computer systems, you may want to have both 64-bit and 32-bit programs. In that cases, 64-bit programs will need to be linked against 64-bit versions of libraries, and 32-bit programs will have to be linked against 32-bit versions, but both versions of the libraries will have the same filename. To allow these "multilib" systems, libraries will be installed in directories like /usr/lib32 and /usr/lib64 rather than simply /usr/lib.

In most cases, when a package installs a library in any of those directories, it also installs header files (into /usr/include) that specify how to call the functions in that library.

There are two different kinds of library files. Both are used when a program (or sometimes another library) are linked. Static libraries have the extension .a; when a program is linked against a static library, the binary code for methods invoked by that program is actually copied from the library into the program. Shared libraries, with the file extension .so, are the other kind. When a program is linked against a shared library, a reference to the library is written into the program, but the code for library functions is not. The benefit of this is that, no matter how many programs you have that use a common function like memcpy or printf, only one copy of the function code needs to be present on the system, in the shared library that defines them. Programs with such library references are called "dynamically linked," and when they are run, the program interpreter is responsible for tracking down the shared libraries it uses and finding the code for functions within them. We’ll talk more about how it does that later on!

One library present in the Little Blue Linux system is simply called "the C standard library." The functions in that library are defined as a part of the C language, and almost every C program uses some of those functions. (For example, the "Hello, World" program that is traditionally the first C program written by those learning the language uses the printf function defined in the standard library.)

This package does not include an implementation of the standard library.^[11] In the GNU system (and in Little Blue Linux), the standard library is provided by the glibc package — which also provides the program interpreter that resolves shared library references in dynamically-linked programs. Although glibc includes both static and shared versions of libraries, it does not really work well with statically-linked programs. So the vast majority of executable programs found on GNU/Linux systems are dynamically linked.

After being compiled and linked, executable C programs are typically installed into directories like /bin, /sbin, /usr/bin, and /usr/sbin. The same directories also contain all of the other programs that are part of the system, regardless of whether their source code is written in C.

Many of these topics are discussed in more detail elsewhere in CBL. The others are discussed in more detail somewhere, but you might have to do some searching! In any case, that’s a basic summary of how C programs work on LBL — and other GNU/Linux — systems.

Dependencies: gmp, mpfr, mpc, isl.

A few external dependencies were added to gcc in release 4.3: the GMP, MPFR, and MPC libraries are required for all compiler builds, and a few optimizations — the graphite loop optimizations — are only available if the Integer Set Library (ISL) is available. The graphite loop optimizations are not critically important, but there’s no reason not to include them if it’s not difficult to do.

Sources for the required — and optional — libraries can be included in the GCC sources in directories named gmp, isl, mpc, and mpfr; if they are, then they’ll be built automatically along with gcc. There’s actually a fairly large number of packages that the build machinery for GCC detects and incorporates into builds automatically. This is convenient, but restrictive: in-tree builds are only reliable when specific versions of the dependency libraries are present, and for CBL I prefer to use the latest stable release of everything.

There are also some cross-compilation scenarios in which the in-tree library builds actually do the wrong thing and produce a compiler that will not work right; that’s another reason that we avoid them here.

Patch:

gcc-15.1.0-fix-relocation-headers-1.patch

There’s a problem that was introduced in GCC 6.1 involving hard-coded paths to C++ header files. This prevents them from being found by the scaffolding compiler (which lives at a different filesystem location when the target system is booted). We can apply a patch to work around that.

Patch:

gcc-15.1.0-fix-missing-rpath-1.patch

When specifying locations for the dependency libraries (GMP, MPFR, etc), the specified library location should be used both at build time (with an -L linker directive) and at run time (with an -rpath directive). This is not always done by the normal GCC build process, but we can patch it easily to add that behavior.

Caution

The next few paragraphs discuss issues related to the way the dynamic linker works. If you’re not familiar with its operation, it might be a good idea to review the section A Word About The Dynamic Linker — or just skip ahead a bit, if you’re not interested in the details of a linking problem and the reason we apply this patch to work around it.

Even though the math libraries needed by GCC (GMP, ISL, MPC, and MPFR) have just been built here and GCC is configured to find them in their correct locations, some of the programs that make up GCC are built without an RPATH or RUNPATH, so the dynamic loader will look for those libraries at runtime in the normal host system library directories. This was filed as GCC bug 84153, but as of the last time I looked at it, the GCC maintainers don’t consider it a problem.

To be fair, this really is not usually much of a problem! The issue appears when the version of the library used in CBL is a major version later than the version installed on the host system. In that case, the GCC we’re building in this section won’t work because it won’t be able to find the version of the libraries it depends on.

This is the sort of situation that LD_LIBRARY_PATH is intended to resolve, but there’s a gotcha: in some of the host-scaffolding builds, the native toolchain is used to build some programs that are run as part of the build process itself, and the LD_LIBRARY_PATH set in the environment for the cross-build is unset when building those native programs.

That means that to get the this native gcc to be reliable, we need to set a RUNPATH or RPATH to tell the dynamic loader where to look for shared libraries. The way you normally do this is to set an LDFLAGS environment variable with a value like -Wl,-rpath,/usr/lib when running the configure script, so that ld would be told to build programs with that RPATH, but it turns out the GCC build doesn’t use the LDFLAGS linker arguments when constructing some of the programs that need the dependency libraries, like cc1, cc1plus, and lto1.

We can work around that by patching the configure script so that any time it’s given arguments like --with-gmp or --with-gmp-lib, the flags it collects will include a -Wl,-rpath option along with the -L option. That’s what this patch does.

6.11.4. gcc (gnu-cross-toolchain-minimal phase)

Environment	`CPPFLAGS_FOR_TARGET`: `--sysroot=/home/lbl/work/sysroot` `LDFLAGS_FOR_TARGET`: `--sysroot=/home/lbl/work/sysroot`
Build Directory	`../build-gcc-2`

Environment

CPPFLAGS_FOR_TARGET: --sysroot=/home/lbl/work/sysroot
LDFLAGS_FOR_TARGET: --sysroot=/home/lbl/work/sysroot

Build Directory

../build-gcc-2

Dependencies: gmp (gnu-cross-toolchain phase), mpfr (gnu-cross-toolchain phase), mpc (gnu-cross-toolchain phase), isl (gnu-cross-toolchain phase), binutils (gnu-cross-toolchain phase).

Bootstrapping GCC as part of a cross-toolchain is tricky. You can build the compiler per se without a working standard library (libc), but that compiler won’t be able to produce executable programs: GCC can only create programs if it has access to a set of C runtime object files that it expects to be provided by the C library — and, realistically, it also needs a C library to compile programs because essentially all C programs invoke standard library functions.

Obviously, we can’t compile the C library without a compiler! So there’s a chicken-and-egg bootstrapping problem.

To work around that cycle of dependencies, we’re going to start by building just the parts of the compiler we really need at this point: a plain cross-compiler, and a minimal version of the libgcc support library that the compiler needs in order to function.^[12]

GCC can most easily be built as part of a cross-toolchain by using the "sysroot" framework — and, in fact, the GCC developers don’t support any other method for creating cross-toolchains. To perform a sysroot build, you specify the configure options --with-sysroot and --with-build-sysroot; and when building GCC, set the environment variables LDFLAGS_FOR_TARGET and CPPFLAGS_FOR_TARGET to specify --sysroot as well.

Environment variable: CPPFLAGS_FOR_TARGET: --sysroot=/home/lbl/work/sysroot
Environment variable: LDFLAGS_FOR_TARGET: --sysroot=/home/lbl/work/sysroot

…At least, that’s what Carlos O’Donell said in a comment on GCC bug 35532, and it seems to work. The documentation on sysroot builds is not particularly easy to find — or at least, it wasn’t when this was written. (If you know where sysroot builds are documented, please tell me!)

The sysroot concept is pretty clever, in fact. The basic idea is, when you build a cross-toolchain, you give it a local directory path — a directory that exists on the build system — and tell the cross-toolchain that that local directory will eventually become the root filesystem directory on the target system. The cross-toolchain then knows to look for system header files and shared library files and so on in the sysroot location.

CBL uses the standard sysroot approach, in a slightly-nonstandard way: everything is installed not into the sysroot directory per se, but into a subdirectory of the sysroot called /scaffolding. That way, when we finally get booted into the target system, the root filesystem will be empty, except for the /scaffolding directory: everything we create after that, and install outside of /scaffolding, will become part of the final system. The /scaffolding directory itself, and everything in it, will then be deleted.

Depending on the target, GCC has a variety of options that control how it operates. Generally, these can all be specified with command-line arguments beginning with -m. For many of these options, default values can also be specified when GCC is being configured.

For example, for many targets, GCC can build programs with a variety of application binary interfaces (ABIs) — this is the machine-code-level interface between programs, libraries, and the operating system; it defines things like the way that registers are used when invoking functions.

An example of a CPU that supports multiple ABIs is the 64-bit x86 architecture, which is called x86_64 or amd64. As mentioned earlier, these processors can run programs in 64 bit mode (with 64-bit pointers and all AMD64 processor features enabled), 32 bit mode (32-bit pointers and only i686 processor features enabled — in this mode, fewer CPU registers are available to programs, for example), or a hybrid "x32" mode (32-bit pointers but all AMD64 processor features enabled). You can specify the ABI to which GCC will compile by using an -m32, -m64, or -mx32 command-line argument.

You can also override the default ABI when configuring GCC by specifying a --with-abi configure directive; or, for some target architectures, other options, like --with-multilib-list or --enable-targets or probably a combination of those things. The options available, and how they work, evolved organically over time, so the situation gets pretty confusing. For example, the way you tell an x86 GCC to generate code for the 64-bit ABI is to use the configuration directive --with-abi. There is also a runtime command line option -mabi, but it doesn’t override that selection; it only selects between the sysv calling convention used by UNIX-ish systems and the ms convention used by Microsoft Windows.

This confusion is characteristic of the GCC configuration and usage options: a lot of the options that are available, and their meaning, depends on what architecture is targeted by the compiler, and the same options or terms can mean very different things for different targets.

If you’re trying to figure out what options are available for a specific target architecture, you have a couple of options. The installation manual lists some of them, mostly in the "configuration" section. The GCC manual also has information in section 3.18 ("Machine-Dependent Options" — if you’ve built a set of trusted host tools specifically for the CBL build, the correct info file can be found at /usr/share/info/gcc.info). You can also look at the configuration script, gcc/config.gcc, to see what options are accepted for each type of target. And, finally, after building GCC for the target architecture, you can run gcc --help=target to see what options are available and what values they can have; and you can also find the actual compiler, cc1 (for C) or cc1plus (for C++) under the libexec/gcc/x86_64-cbl-linux-gnu/$VERSION directory, and run it with the --help command-line argument to see what options are enabled and disabled.

In addition to the selection of ABI, for many target types GCC can be instructed to optimize for specific CPUs or CPU families. The command-line arguments that control this behavior, like the ABI arguments, are target-specific — look for -mtune, -march, -mcpu, and things like that.

In CBL, any set of these configure directives can be specified for the cross-toolchain in the TARGET_GCC_CONFIG parameter. It’s generally a good idea to specify default values for all of the options supported by the target platform. (If you don’t want to set any options at all, you can set the parameter to an empty string.)

GCC is generally built with a large number of libraries included. Some of those fail in some circumstances — for example, x86 CPUs can’t build the libquadmath library when the C library being used is uClibc (or, at least, that was the case some time ago, the last time I tried); and the libsanitizer library fails to build when compiling for 64-bit Sparc machines. The easiest way to work around those problems at this stage is simply to disable those libraries, and, considering that the cross-toolchain we’re building here is just going to be used to build the ephemeral scaffolding programs, that seems like a reasonable approach here. So if any libraries cause the build to fail, try adding an appropriate --disable directive to TARGET_GCC_CONFIG.

Build Directory: ../build-gcc-2

You might notice that, in this build, we’re using the same host-system builds of the various arithmetic dependency libraries as we used for the host-prerequisite GCC (or the same ones that are used for the native system GCC, if you’ve skipped the host-prerequisites). It’s totally unnecessary to build them again for this GCC — the cross-compiler we’re building here will only run on the host system, so the target architecture is irrelevant to it.

To build the minimal libgcc, we specify the configuration options --without-headers and --with-newlib. This is a bit sloppy — the first of those options is the only one that should be necessary — but the last time I tried building a minimal compiler without the newlib directive, it didn’t work.

Once this minimal compiler is finished, we’ll use it to build the C library; and then we can use the files provided by the C library to go back and build a full, functional GCC compiler.

Configuration commands:

${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/crosstools \
  --build=aarch64-unknown-linux-gnu --host=aarch64-unknown-linux-gnu --target=x86_64-cbl-linux-gnu \
  --with-sysroot=/home/lbl/work/sysroot --with-build-sysroot=/home/lbl/work/sysroot \
  --disable-decimal-float --disable-libgomp --disable-libmudflap \
  --disable-libssp --disable-multilib --disable-nls --disable-shared \
  --disable-threads --enable-languages=c,c++ --with-newlib \
  --without-headers \
  --with-gmp=/usr --with-mpfr=/usr \
  --with-mpc=/usr --with-isl=/usr

The only targets we build at this point are all-gcc, which produces the plain compiler, and all-target-libgcc, which produces the minimal libgcc needed by that compiler.

Compilation commands:

make all-gcc all-target-libgcc

Test commands:

(none)

Installation commands:

make install-gcc install-target-libgcc

6.12. glibc

Name	GNU C standard library
Version	2.41
Project URL	https://sourceware.org/glibc/
SCM URL	git://sourceware.org/git/glibc.git
Download URL	https://ftp.gnu.org/gnu/libc/
Patches	`glibc-2.41-branch-updates-20250425.patch`
Dependencies	linux, gcc (gnu-cross-toolchain-minimal phase)

Name

GNU C standard library

Version

2.41

Project URL

https://sourceware.org/glibc/

SCM URL

git://sourceware.org/git/glibc.git

Download URL

https://ftp.gnu.org/gnu/libc/

Patches

glibc-2.41-branch-updates-20250425.patch

Dependencies

linux, gcc (gnu-cross-toolchain-minimal phase)

6.12.1. Overview

glibc is the C standard library produced as part of the GNU project. It contains an implementation for all of the functions that are assumed to be available in C programs — like printf and so on. It also provides a dynamic loader, which almost all programs use to find shared libraries at runtime, and a few miscellaneous utility programs.

Since all C and C++ userspace programs (except the Linux kernel itself) link against the C library, glibc is by far the most deeply-embedded component of a GNU/Linux system. Upgrading the Linux kernel can be a relatively trivial operation compared to upgrading the C library installed on a computer.

There are alternative C libraries that can be used instead of glibc. However, glibc is the C standard library used by the vast majority of GNU/Linux systems; using an alternative library like musl or uClibc-ng as the primary C library may cause problems somewhere down the line.

6.12.2. glibc (gnu-cross-toolchain phase)

Build Directory	`../build-glibc-1`

Build Directory

../build-glibc-1

Dependencies: linux, gcc (gnu-cross-toolchain-minimal phase).

This builds a sysroot libc (for the target architecture), configured to be installed into the /scaffolding subdirectory of the sysroot.

That might be a little bit opaque, so let’s break it down a little bit. As mentioned earlier, the sysroot framework is all about setting up a path on the host system that will eventually become the root filesystem on the target system. And, also as mentioned earlier, the goal in the first stage of the CBL build process is to set up a kernel and minimal userspace for the target system, with the entirety of that userspace contained in a /scaffolding subdirectory rather than using the conventional filesystem paths (like /bin and /lib and the /usr directory structure and all the rest of that stuff you’re used to seeing).

What we’re building here is the libc that’s going to be used to build all the scaffolding programs and libraries — the stuff that we’re cross-compiling — so we want it to be contained in the scaffolding directory, and found by the scaffolding programs at runtime in that location. Hence, it’s a sysroot libc, with an extra prefix to move it from its usual normal /lib and /usr/lib directories to the /scaffolding/lib directory.

That means we configure it with a prefix of /scaffolding, but then when we install it we tell it that the root of the installation location is the sysroot directory /home/lbl/work/sysroot.

Simple, right?

If you ever want to set up a completely standard sysroot toolchain, by the way, it works pretty much the same way as this but you specify --prefix=/usr and --with-headers=$SYSROOT_PATH/usr/include. There is some magic in the glibc configuration or build machinery related to the --prefix directive: if you specify a prefix of /usr, the bits of glibc that are conventionally installed in /lib will be put there, rather than in /usr/lib.

Build Directory: ../build-glibc-1

Since this glibc is built for the target machine architecture, a number of tests run by the configure script won’t work right. The way we work around that, for glibc as with everything else that uses the GNU build system, is by setting the correct values in a config.cache ahead of time.

An option that we’re not using here is --enable-kernel, which can limit the amount of compatibility code built into glibc to support old Linux kernel versions. Since the glibc being built here is temporary and will be discarded in its entirety, saving a little bit of space here is kind of pointless. That option also makes the build more fragile, since the kernel version that will be checked by the code is the host system kernel; we don’t want to make any more presumptions about the host system than we must.

Configuration commands:

echo "libc_cv_forced_unwind=yes" > config.cache
echo "libc_cv_c_cleanup=yes" >> config.cache
echo "libc_cv_gnu89_inline=yes" >> config.cache
echo "libc_cv_ctors_header=yes" >> config.cache
echo "libc_cv_ssp=no" >> config.cache
echo "libc_cv_ssp_strong=no" >> config.cache
BUILD_CC="gcc" CC="x86_64-cbl-linux-gnu-gcc" AR="x86_64-cbl-linux-gnu-ar" \
  RANLIB="x86_64-cbl-linux-gnu-ranlib" CFLAGS="-g -O2" \
  ${LB_SOURCE_DIR}/configure --prefix=/scaffolding \
  --host=x86_64-cbl-linux-gnu --build=aarch64-unknown-linux-gnu \
  --disable-profile --enable-add-ons --with-tls --with-__thread \
  --with-binutils=/home/lbl/work/crosstools/bin \
  --with-headers=/home/lbl/work/sysroot/scaffolding/include \
  --cache-file=config.cache

The glibc build process uses the makeinfo program to create the documentation, and the texinfo source file specifies a document encoding of UTF-8. When using some versions of perl, this leads to problems — it’s unclear why; perhaps this is because makeinfo only works right with UTF-8 documents when being run with UTF-8 localization settings, or maybe something else is going wrong.

Regardless of exactly what triggers the issue, there’s an easy way to work around it: just remove the @documentencoding directive from the libc manual source file.

Configuration commands:

sed -i -e '/^@documentencoding UTF-8$/d' \
  ${LB_SOURCE_DIR}/manual/libc.texinfo

Compilation commands:

make

Test commands:

(none)

Installation commands:

make install_root=/home/lbl/work/sysroot install

6.13. gcc (gnu-cross-toolchain phase)

For an overview of gcc, see gcc.

Environment	`CPPFLAGS_FOR_TARGET`: `--sysroot=/home/lbl/work/sysroot` `LDFLAGS_FOR_TARGET`: `--sysroot=/home/lbl/work/sysroot`
Build Directory	`../build-gcc-3`

Environment

CPPFLAGS_FOR_TARGET: --sysroot=/home/lbl/work/sysroot
LDFLAGS_FOR_TARGET: --sysroot=/home/lbl/work/sysroot

Build Directory

../build-gcc-3

Dependencies: glibc.

Now that we have a C library installed, we can finally do a full GCC build. So now we’ll enable multi-threaded code and some of the runtime libraries we turned off previously.

Build Directory: ../build-gcc-3
Environment variable: CPPFLAGS_FOR_TARGET: --sysroot=/home/lbl/work/sysroot
Environment variable: LDFLAGS_FOR_TARGET: --sysroot=/home/lbl/work/sysroot

Configuration commands:

${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/crosstools \
  --build=aarch64-unknown-linux-gnu --host=aarch64-unknown-linux-gnu --target=x86_64-cbl-linux-gnu \
  --with-sysroot=/home/lbl/work/sysroot --with-build-sysroot=/home/lbl/work/sysroot \
  --disable-multilib --disable-nls --enable-languages=c,c++ \
  --enable-__cxa_atexit --enable-shared --enable-c99 \
  --enable-long-long --enable-threads=posix \
  --with-native-system-header-dir=/scaffolding/include \
  --with-gmp=/usr --with-mpfr=/usr \
  --with-mpc=/usr --with-isl=/usr

Fairly early in the build process, a cross-compiler version of gcc is built and installed as xgcc, for use in later build steps. Then later on, when building libgcc.so and various other libraries that are part of GCC, xgcc runs the cross-binutils ld. Unfortunately, xgcc doesn’t know to tell ld to look in the /scaffolding directory to find the startup files (crti.o and so on). xgcc also insists on looking for libraries and startup files in the variant lib directories used by the multilib scheme, even though we configure GCC with --disable-multilib; it doesn’t appear that there’s any way to coerce the GCC build system not to use those multilib directories.

A workaround that sometimes helps is to add the correct directory using the LDFLAGS_FOR_TARGET and CFLAGS_FOR_TARGET options. That’s no good in this case, though: many of the libraries in gcc use the libtool script to do all their compilation and linking, and libtool ignores the LDFLAGS and CFLAGS we set.

So to get this build completed, we use a kludge: we create symlinks from /home/lbl/work/sysroot/scaffolding/lib to all the locations where xgcc might expect to find the libraries.

If the build crashes and gets restarted, the ln commands will fail. That doesn’t matter, so we temporarily tell the shell to proceed rather than terminating if an error occurs.

Compilation commands:

set +e
ln -s /home/lbl/work/sysroot/scaffolding/lib /home/lbl/work/sysroot/lib
ln -s /home/lbl/work/sysroot/scaffolding/lib /home/lbl/work/sysroot/lib32
ln -s /home/lbl/work/sysroot/scaffolding/lib /home/lbl/work/sysroot/lib64
ln -s /home/lbl/work/sysroot/scaffolding/lib /home/lbl/work/sysroot/libx32
set -e
make AS_FOR_TARGET="x86_64-cbl-linux-gnu-as" LD_FOR_TARGET="x86_64-cbl-linux-gnu-ld"

Test commands:

(none)

Installation commands:

make install
rm -f /home/lbl/work/sysroot/lib*

6.14. Adjusting the GCC specs

6.14.1. Overview

As mentioned earlier, in gcc: The driver program, the gcc program we think of as a compiler really just runs other programs, and it uses a bunch of directives called "spec strings" to determine what programs to run and what options to give them. The format of specs strings is documented in the GCC documentation in section 3.15, "Specifying Subprocesses and the Switches to Pass to Them." Spec strings don’t have the most readable structure — I find it helpful to think of them as being written in a domain-specific language, because to me they look as much like line noise as complicated regular expressions do — but sometimes there’s no better way to figure out what is going on than to read the specs, and there is often no better way to adjust the behavior of gcc than to modify the specs it is using.

We have to do this — modify the specs — a few times throughout the CBL process, primarily because we need to control how gcc runs ld to link programs.

You can see the spec strings that gcc will use by running gcc with the -dumpspecs option. The default specs are built in to gcc, but you can provide your own specs to override the default behavior by using the -specs= command-line argument or by creating a specs file and putting it in a specific location in the filesystem.

We’re going to do the latter. The basic process is to first dump the specs file to the location where gcc will look for it:

gcc -dumpspecs > $(dirname $(gcc -print-libgcc-file-name))/specs

Then we can modify it however we need to, and gcc will use the modified version.

6.14.2. Adjusting the GCC specs (gnu-cross-toolchain phase)

Dependencies: gcc (gnu-cross-toolchain phase).

Caution

If you look at the structure of the toolchain directory structure, you’ll see that there are a couple of different ways you can refer to the programs in it. First, in the bin directory, there are a bunch of programs prefixed with the target-triple (e.g. x86_64-cbl-linux-gnu-gcc); second, in x86_64-cbl-linux-gnu/bin, the same set of programs exist without a prefix. You can use either one, but there’s a gotcha: if you put /home/lbl/work/crosstools/x86_64-cbl-linux-gnu/bin in your PATH, driver programs like gcc and g++ sometimes won’t be able to find some of the other programs they rely on, like cc1 and cc1plus. Those programs are in libexec/gcc/x86_64-cbl-linux-gnu/VERSION, and for some reason they can be found when the driver executable is in bin but sometimes not when it’s in x86_64-cbl-linux-gnu/bin.

That means that unless you put the path to the directory containing cc1 in your PATH as well, you might wind up getting mysterious error messages like gcc: error trying to exec 'cc1': execvp: No such file or directory when you try to compile programs. If that happens to you, just add the cc1 directory to your PATH, or use the full x86_64-cbl-linux-gnu-gcc program name to compile programs.

Not to belabor the point, but remember that the whole purpose of this cross-toolchain is to let us build the scaffolding that will then allow us to build the final target system entirely from source code.

While we’re using the scaffolding tools to build the final system, we want to do a native build of everything, including glibc, so we want the scaffolding to be independent of any filesystem locations that will still be present in the final system. In particular, this includes /lib and /usr/lib. That way, once we’re done doing the target system build, we can delete the scaffolding directory altogether and be confident that there are no lingering host-system artifacts or ephemera on it.

When the standard GNU toolchain builds an executable, it almost always links it against the dynamic link library (sometimes called the "dynamic loader" or "program interpreter"; this is something like ld-linux.so.2 or ld.so.1, and is conventionally found in a top-level directory called /lib or a multilib variant like /lib32 or /lib64).^[13] That’s normally fine, but we want the scaffolding to be entirely independent of /lib. So we need to adjust our cross-toolchain so that the programs it builds look in the /lib directory under the scaffolding location for their libraries, including the dynamic link library. This is done by modifying the GCC specs file.

Commands:

x86_64-cbl-linux-gnu-gcc -dumpspecs | \
  sed -e 's@/lib/ld@/scaffolding/lib/ld@g' \
  -e 's@/lib32/ld@/scaffolding/lib/ld@g' \
  -e 's@/libx32/ld@/scaffolding/lib/ld@g' \
  -e 's@/lib64/ld@/scaffolding/lib/ld@g' > \
  $(dirname $(x86_64-cbl-linux-gnu-gcc --print-libgcc-file-name))/specs

There’s another problem we need to work around, as well: gcc doesn’t provide any good way to tell it where to find some object files that need to be linked into every program: crti.o, crti1.o, and crtn.o. These are provided as a part of glibc, so like the rest of glibc they were installed into /home/lbl/work/sysroot/scaffolding, specifically into its /lib subdirectory. But, of course, that’s not a location where gcc normally expects to find them. So at this point if you were to try compiling a "Hello World" program with your cross-toolchain, it would complain that the cross-ld can’t find crt1.o or crti.o.

You can tell exactly what is going wrong by repeating the compile with -v, to get gcc to print out all the commands it’s running: cc1 to compile the code, then as to assemble it, then collect2 to link it. And apparently collect2 is running ld, which is producing the error message. (That’s weird, but you get used to weird stuff when you’re trying to figure out toolchain problems. There’s also no explicit execution of cpp; that’s because the C pre-processor is actually implemented in the libcpp library and is invoked as function calls to that library by the cc1 compiler program.)

Once you have the actual command line that’s failing, you can try adjusting it to see if there’s an easy way to get it to work. For example, the command line just names the startup files without any path components at all. After fussing around for a while looking for pleasant alternatives, the best solution I found was to specify the filenames with absolute paths. So that’s what we’re going to do in the specs file: replace every occurrence of the bare filename with absolute paths, for all the object files that appear in the scaffolding/lib directory.

Commands:

for FILE in crt1 crti crtn gcrt1 Mcrt1 Scrt1; do \
  sed -i -e "s@\\b$FILE.o\\b@/home/lbl/work/sysroot/scaffolding/lib/$FILE.o@g" \
  $(dirname $(x86_64-cbl-linux-gnu-gcc --print-libgcc-file-name))/specs; \
done

We can now verify that the cross-toolchain is able to build programs successfully, and is set up to link against the sysroot glibc in the /scaffolding directory, by compiling any simple program (like "Hello World") and then running readelf on it — there will be a line in the program headers section that says "Requesting program interpreter:" and should contain the path to the dynamic link library in the scaffolding location.

6.15. Verify that a toolchain works properly

6.15.1. Overview

After building a significant toolchain component, it’s a good idea to make sure that it works as intended. This is a simple smoke-test: it just compiles a "Hello, World" program and then inspects it to make sure it was built as expected and runs properly.

6.15.2. Verify that a toolchain works properly (gnu-cross-toolchain phase)

Environment variable: LD_LIBRARY_PATH: /home/lbl/work/crosstools/lib:$LD_LIBRARY_PATH
Environment variable: PATH: /home/lbl/work/crosstools/bin:$PATH
Dependencies: gcc.

This verifies that the cross-toolchain and emulator work properly: look at the machine type and dynamic linker location for a compiled program, and then make sure that it runs in the userspace emulator. This proves that the cross-toolchain and QEMU were properly built with a compatible target architecture and so on.

File /home/lbl/work/build/hello.c:

#include <stdio.h>
int main(void)
{
    printf("Hello, QEMU Emulated x86_64-cbl-linux-gnu World!\n");
    return 0;
}

This is compiled with:

Commands:

x86_64-cbl-linux-gnu-gcc /home/lbl/work/build/hello.c -o /home/lbl/work/build/hello

To verify that it’s linked properly, use readelf.

Commands:

x86_64-cbl-linux-gnu-readelf -a /home/lbl/work/build/hello | tee \
  /home/lbl/work/build/program_info
grep 'Machine:' /home/lbl/work/build/program_info | grep \
  'Advanced Micro Devices X86-64'
grep 'interpreter: /scaffolding/lib' /home/lbl/work/build/program_info

The Machine line should indicate the target architecture, rather than the host architecture, and the program interpreter requested by the program should be under the /scaffolding/lib directory (which is where the dynamic loader will be found once we’re booted into the target system). If either of those is not the case, the grep commands will fail, causing the CBL build process to abort.

One last thing we can usefully do at this point is verify that the user-mode QEMU emulator can actually run programs for the target architecture.

This is a bit tricky when running the dynamically-linked program we just built, because it is expecting to find the program interpreter at /scaffolding/lib but in fact it’s actually at that location under the sysroot directory. Luckily, the user-mode QEMU emulator can be told where to find library files, using the -L command line argument or the QEMU_LD_PREFIX environment variable.

Commands:

qemu-x86_64 -L /home/lbl/work/sysroot /home/lbl/work/build/hello | \
  grep 'Hello, QEMU Emulated x86_64-cbl-linux-gnu World'

This will produce a friendly greeting, or — if something goes wrong — will, again, cause the build process to abort.

(If you’d like, you can try running the hello program outside of QEMU as well; that should produce an error message like "cannot execute binary file." And, again, if it doesn’t, that means something is horribly wrong!)

6.15.3. Complete text of files

6.15.3.1. /home/lbl/work/build/hello.c

#include <stdio.h>
int main(void)
{
    printf("Hello, QEMU Emulated x86_64-cbl-linux-gnu World!\n");
    return 0;
}

7. Ensuring isolation from the host system

It’s possible for the build process of some of the scaffolding programs to find things on the host system and try to compile or link against them. This doesn’t work, of course, because the scaffolding programs are for the target machine architecture, and that’s incompatible with the host system architecture. Unfortunately, it still causes the build to fail. We can prevent that by setting up some programs that will isolate us from the host system.

This really should not be needed! Everything we’re building in the scaffolding is being cross-compiled, and packages should never try to compile or link against any host-system libraries when they’re being cross-compiled. But I ran into an issue with this at least once, and setting up a small guard against this kind of build-system bug is not hard to do.

7.1. pkgconf

Name	Improved pkg-config
Version	2.4.3
Project URL	https://github.com/pkgconf/pkgconf
SCM URL	(unknown)
Download URL	https://github.com/pkgconf/pkgconf/tags
Patches	`pkgconf-2.4.3-run-autogen-1.patch`

Name

Improved pkg-config

Version

2.4.3

Project URL

https://github.com/pkgconf/pkgconf

SCM URL

(unknown)

Download URL

https://github.com/pkgconf/pkgconf/tags

Patches

pkgconf-2.4.3-run-autogen-1.patch

7.1.1. Overview

pkg-config is a program that makes it easy to find things like installed libraries and header files. Many programs use pkg-config in their build processes to find out if the libraries they depend on are present on the build system, and to find out what linker and include directives should be used to compile and link against them.

The original pkg-config program can be found at pkg-config.freedesktop.org. At some point in its development, its developers decided to use functions from the glib library; unfortunately, glib uses pkg-config to find its own dependencies (really, just zlib — glib doesn’t have a lot of dependencies), which introduced a cyclic dependency. That doesn’t cause any really intractable problems — a couple of environment variables can be defined when building glib so that it doesn’t have to use pkg-config to find zlib — but cyclic dependencies are always kind of horrifying: not to sound like a broken record, but the idea with CBL is to start with a minimal set of binaries and a pile of source code and turn that into a whole system. (At some point after the glib dependency was introduced, pkg-config started requiring itself, so that it can find and link against the glib library, unless additional environment variables are provided.)

The situation has since been resolved, but it was resolved by bundling glib along with pkg-config. This isn’t a particularly elegant solution: the pkg-config distribution is about 11.5 megabytes of code, unpacked, and about 9.5 megabytes of that is glib.

A different approach was taken in a fork of pkg-config, "pkg-config-lite," which includes just the snippet of glib that is needed by pkg-config: that’s better, but still not ideal.

This brings us to pkgconf, a completely separate implementation of the pkg-config program. It has no external dependencies, doesn’t bundle any third-party code within its own distribution, and also has a design that I find preferable to the original pkg-config (it internally builds a directed acyclic graph of dependencies, rather than building an in-memory database of all known pkg-config files at runtime and then resolving dependencies from that database). So that’s what we use in LB Linux.

Patch:

pkgconf-2.4.3-run-autogen-1.patch

As distributed, the pkgconf program is missing a lot of files that are normally produced by the GNU autotools; the conventional way to build it is to start by running the autogen.sh script provided with the package distribution. However, one of the places the pkgconf program is built is as part of the host prerequisites; if that’s being done, it’s probably not a great idea to rely on the autotools being present at prerequisite build time. We can instead simply patch the source distribution to create the files that would normally be created via autogen.sh.

7.1.2. pkgconf (host-isolation phase)

This is built here because pkg-config is already present on the host system (as a prerequisite, if nothing else), and it probably knows about a lot of host system libraries. While building the scaffolding, we need to ensure that their build processes don’t find (and try to link against) any of those host system libraries. Since the host system libraries are for a different machine architecture, this would cause build failures. We can do that by putting a new version of pkg-config, configured only to look in the scaffolding directory, at the head of the PATH while building them.

We could probably get by with a simple shell script that always just says "I couldn’t find anything!" when asked for dependencies. On the other hand, it takes around ten seconds to build pkgconf and set it up, so why not just do that?

Configuration commands:

find . -exec touch -r README.md {} \;
./configure --prefix=/home/lbl/work/crosstools \
  --with-pkg-config-dir=/home/lbl/work/sysroot/scaffolding/lib/pkgconfig \
  --with-system-libdir=/home/lbl/work/sysroot/scaffolding/lib \
  --with-system-includedir=/home/lbl/work/sysroot/scaffolding/include

Compilation commands:

make

Test commands:

(none)

Installation commands:

make install
ln -sf pkgconf /home/lbl/work/crosstools/bin/pkg-config

In addition to the pkg-config symlink, we create a symbolic link x86_64-cbl-linux-gnu-pkg-config so that any build process that tries to find dependencies using a target-system pkg-config program will find ours.

Installation commands:

ln -sf pkgconf /home/lbl/work/crosstools/bin/x86_64-cbl-linux-gnu-pkg-config

8. Construction of a minimal bootable userspace

In this section, we’re going to use our shiny new cross-toolchain to build all of the programs and libraries we’ll need to get to a working target-architecture userspace. We call these components, collectively, the "scaffolding" because we’re going to use them as a kind of staging area and framework from which we can construct the final CBL system.

It’s useful to keep that purpose in mind! These programs will provide just enough of a userspace environment that, once we’ve got the target system booted, we’ll be able to build the final system components using these as a foundation. That means we’ll need, basically, all the stuff that we’ve already been using from the host system — the programs that let us build programs — and we also need programs that will let us work with partition tables, filesystems, and other low-level operating system concerns.

It’s also useful to keep in mind that everything here is ephemeral. As soon as we get the target system booted, we’ll use these scaffolding programs to build all of the stuff that make up the final CBL system, and then we’re going to throw them away.

Dependencies: Ensuring isolation from the host system.

8.1. About the Scaffolding

To maintain a hard line of separation between the scaffolding and the final system components, we’re building all of this stuff so that it installs into a directory called /scaffolding. When we set up the root filesystem for the target, it’s only going to have that one top-level directory! Then, as the scaffolding programs are used to construct the final system components, those final-system programs will be installed to normal system directories — /bin, /usr, and so on — and will be used in preference to the scaffolding programs that they replace.

In most cases, the scaffolding components use the GNU build system. That means they can be configured to expect that they will live in /scaffolding, but then be installed with a DESTDIR of the sysroot directory. That way, they actually get installed to /home/lbl/work/sysroot/scaffolding (which is exactly where they need to be on the host system), but think they’re installed to a top-level /scaffolding directory — which is where they actually will live once we boot into the target device.

Building everything with a very simple environment is still a good idea.

Environment variable: PATH: /home/lbl/work/crosstools/bin:$PATH
Environment variable: LC_ALL: POSIX

Some of the scaffolding pieces install libraries and headers. We want those to be visible to the rest of the scaffolding, so CFLAGS and LDFLAGS are not as empty as they have previously been.

Environment variable: CFLAGS: -I/home/lbl/work/sysroot/scaffolding/include
Environment variable: CXXFLAGS: -I/home/lbl/work/sysroot/scaffolding/include
Environment variable: LDFLAGS: -L/home/lbl/work/sysroot/scaffolding/lib

To use the cross-toolchain for these builds, we need to define a bunch of additional environment variables. Many of the scaffolding programs use the GNU build system, and therefore consult these environment variables to determine how to invoke toolchain programs.

Environment variable: CC: x86_64-cbl-linux-gnu-gcc
Environment variable: CXX: x86_64-cbl-linux-gnu-g++
Environment variable: AR: x86_64-cbl-linux-gnu-ar
Environment variable: AS: x86_64-cbl-linux-gnu-as
Environment variable: RANLIB: x86_64-cbl-linux-gnu-ranlib
Environment variable: LD: x86_64-cbl-linux-gnu-ld
Environment variable: STRIP: x86_64-cbl-linux-gnu-strip

We’re also going to use the cross-toolchain options build, host, and target for most of the components we’re building in this section. This time, --build is going to be the host system; --host and --target are going to refer to the target system.

8.2. Create Symbolic Links For Scaffolding Lib Directories

Some target architectures have a "multilib" feature, and the installation process for many packages insists on installing library files into a variety of different directories (like lib32 and lib64) to support this feature — even when multilib is disabled, as we try to do throughout the CBL process. The issue with this is, not all of the multilib directories are on the default library path known by the dynamic loader; this leads to errors in some package builds.

To ensure that all library files wind up in the lib directory per se rather than a multilib variant directory, we use a ugly but simple and effective kludge: we create symbolic links to ensure that all library files are placed directly into /scaffolding/lib. (In the target-side build, we’ll do something similar for the /lib and /usr/lib directories: this is done in Write the Scaffolding Init Script.)

The set of symbolic links needed here might expand as additional targets are added to the set that CBL can handle.

Commands:

mkdir -p /home/lbl/work/sysroot/scaffolding/lib
ln -s lib /home/lbl/work/sysroot/scaffolding/lib32
ln -s lib /home/lbl/work/sysroot/scaffolding/lib64
ln -s lib /home/lbl/work/sysroot/scaffolding/libx32

8.3. attr

Name	Filesystem Extended Attributes programs
Version	2.5.2
Project URL	http://savannah.nongnu.org/projects/attr
SCM URL	(unknown)
Download URL	http://download.savannah.nongnu.org/releases/attr/

Name

Filesystem Extended Attributes programs

Version

2.5.2

Project URL

http://savannah.nongnu.org/projects/attr

SCM URL

(unknown)

Download URL

http://download.savannah.nongnu.org/releases/attr/

8.3.1. Overview

Most Linux filesystems support "extended attributes" — arbitrary name/value pairs that can be associated with files or directories. These can be used for any purpose; for example, you might attach a "user.file_encoding" extended attribute to a text file, if it’s encoded unusually.

One of the primary uses of extended attributes is to implement access control lists or capabilities.

The attr package provides programs that allow extended attributes to be viewed and modified.

8.3.2. attr (host-scaffolding-components phase)

Since the configuration repository preserves extended attribute metadata as well as basic owner and mode, and since the package-users build script sets a user.package_owner attribute on all files that are installed by packages, we need the getfattr and setfattr commands early in the target system build.

Configuration commands:

./configure --prefix=/scaffolding \
  --build=aarch64-unknown-linux-gnu --host=x86_64-cbl-linux-gnu

Compilation commands:

make

Test commands:

(none)

Installation commands:

make DESTDIR=/home/lbl/work/sysroot install

8.4. ncurses

Name	GNU new curses library
Version	6.5
Project URL	https://invisible-island.net/ncurses/
SCM URL	(unknown)
Download URL	https://invisible-island.net/archives/ncurses/
Patches	`ncurses-6.5-branch-updates-20250419.patch`

Name

GNU new curses library

Version

6.5

Project URL

https://invisible-island.net/ncurses/

SCM URL

(unknown)

Download URL

https://invisible-island.net/archives/ncurses/

Patches

ncurses-6.5-branch-updates-20250419.patch

8.4.1. Overview

Ncurses ("new curses") is a library that provides terminal control features so programs can provide advanced text-based user interfaces. It’s a free-software version of a similar library (which was simply called "curses") that was developed at Berkeley.

There are lots of programs, even just in the scaffolding we’re setting up, that use ncurses. bash and vim are among them.

Patches are available in a version subdirectory of the overall package download location — for example, patches for ncurses 6.2 are in the "6.2" subdirectory of the ncurses package directory. If there’s a README file in that directory, follow the instructions in it. If not, just fetch all the patch files (and, optionally, GPG signature files) and apply those in order. When I update ncurses for CBL, I fetch all the patches available at that point, verify their GPG signatures, and compile them into a single branch-update patch, available in the file repository at files.freesa.org. If you don’t want to make sure you’re using the absolute latest version, you can just use that.

Patch:

ncurses-6.5-branch-updates-20250419.patch

8.4.2. ncurses (host-scaffolding-components phase)

When configuring the scaffolding ncurses, we specify --without-ada, since Ada isn’t built in the CBL process. We also specify --enable-overwrite so that the header files are installed into /home/lbl/work/sysroot/scaffolding/include rather than an ncurses subdirectory of it, because that makes it easier for other programs to find the headers, and --with-build-cc so that the native tools built as part of the ncurses build process use the native compiler instead of the cross-compiler. We also specify --disable-stripping because otherwise the install program might try to strip binaries using the host system strip rather than the cross-toolchain strip.

Configuration commands:

./configure --prefix=/scaffolding --with-shared \
  --build=aarch64-unknown-linux-gnu --host=x86_64-cbl-linux-gnu --enable-overwrite \
  --without-debug --without-ada --with-build-cc=gcc --disable-stripping \
  --with-versioned-syms --with-trace --disable-widec

Compilation commands:

make

Test commands:

(none)

Installation commands:

make DESTDIR=/home/lbl/work/sysroot install

8.5. bash

Name	GNU Bourne Again SHell
Version	5.2.37
P

Name

GNU Bourne Again SHell

Version

5.2.37

Cross-Building Linux: the Little Blue Linux build process

Introduction and Overview

1. About Little Blue Linux

1.1. S6-Based Init System

1.2. Package Users

1.3. Version Control For Configuration Files

1.4. Modern Versions of Everything

1.5. Cross-Architecture Build Process

1.6. About This Specific Build

1.7. How Long Is This Going To Take?

2. Configuration Parameters and Default Values

2.1. How configuration parameters work

2.2. Setting The Host And Target Architectures

2.3. Target System QEMU-related Parameters

2.4. Target Boot Configuration

2.5. Target system name and login details

2.6. Directories Used For the Build Process

3. Packages and How To Build Them

3.1. The GNU Build System

3.2. Executing the CBL Process Manually, Part One

4. Patches in Cross-Building Linux

4.1. Miscellaneous tweaks or fixes

4.2. Kernel configurations

4.3. Gnulib updates

4.4. Branch-update patches

4.5. Vendored-Dependency Patches

The Host-Side Build

5. Preparing For The Build

5.1. Required Host-System Programs

5.2. Final Preparations For The Build

6. Constructing a GNU Cross-Toolchain

6.1. Toolchain Basics

6.2. Cross-Toolchains

6.3. The CBL Sysroot Toolchain

6.4. A Word About The Dynamic Linker

6.4.1. The tl;dr

6.4.2. The grueling details

6.5. binutils

6.5.1. Overview

6.5.2. binutils (gnu-cross-toolchain phase)

6.6. gmp

6.6.1. Overview

6.6.2. gmp (gnu-cross-toolchain phase)

6.7. mpfr

6.7.1. Overview

6.7.2. mpfr (gnu-cross-toolchain phase)

6.8. mpc

6.8.1. Overview

6.8.2. mpc (gnu-cross-toolchain phase)

6.9. isl

6.9.1. Overview

6.9.2. isl (gnu-cross-toolchain phase)

6.10. linux

6.10.1. Overview

6.10.2. linux (gnu-cross-toolchain phase)

6.11. gcc

6.11.1. Overview

6.11.2. gcc: The driver program

6.11.3. How C and C++ Programming Languages Work in GNU/Linux

6.11.4. gcc (gnu-cross-toolchain-minimal phase)

6.12. glibc

6.12.1. Overview

6.12.2. glibc (gnu-cross-toolchain phase)

6.13. gcc (gnu-cross-toolchain phase)

6.14. Adjusting the GCC specs

6.14.1. Overview

6.14.2. Adjusting the GCC specs (gnu-cross-toolchain phase)

6.15. Verify that a toolchain works properly

6.15.1. Overview

6.15.2. Verify that a toolchain works properly (gnu-cross-toolchain phase)

6.15.3. Complete text of files

6.15.3.1. /home/lbl/work/build/hello.c

7. Ensuring isolation from the host system

7.1. pkgconf

7.1.1. Overview

7.1.2. pkgconf (host-isolation phase)

8. Construction of a minimal bootable userspace

8.1. About the Scaffolding

8.2. Create Symbolic Links For Scaffolding Lib Directories

8.3. attr

6.11.2. `gcc`: The driver program