99

Replacing x86 firmware with Linux and Go [LWN.net]

 6 years ago
source link: https://lwn.net/SubscriberLink/738649/81007748bf15c1e5/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Replacing x86 firmware with Linux and Go

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

The Intel Management Engine (ME), which is a separate processor and operating system running outside of user control on most x86 systems, has long been of concern to users who are security and privacy conscious. Google and others have been working on ways to eliminate as much of that functionality as possible (while still being able to boot and run the system). Ronald Minnich from Google came to Prague to talk about those efforts at the 2017 Embedded Linux Conference Europe.

He began by noting that most times he is talking about firmware, it is with his coreboot hat on. But he removed said "very nice hat", since his talk was "not a coreboot talk". He listed a number of people who had worked on the project to "replace your exploit-ridden firmware with a Linux kernel", including several from partner companies (Two Sigma, Cisco, and Horizon Computing) as well as several other Google employees.

The results they achieved were to drop the boot time on an Open Compute Project (OCP) node from eight minutes to 20 seconds. To his way of thinking, that is "maybe the single least important part" of this work, he said. All of the user-space parts of the boot process are written in Go; that includes everything in initramfs, including init. This brings Linux performance, reliability, and security to the boot process and they were able to eliminate all of the ME and UEFI post-boot activity from the boot process.

Describing the mess

The problem, Minnich said, is that Linux has lost its control of the hardware. Back in the 1990s, when many of us started working with Linux, it controlled everything in the x86 platform. But today there are at least two and a half kernels between Linux and the hardware. Those kernels are proprietary and, not surprisingly, exploit friendly. They run at a higher privilege level than Linux and can manipulate both the hardware and the operating system in various ways. Worse yet, exploits can be written into the flash of the system so that they persist and are difficult or impossible to remove—shredding the motherboard is likely the only way out.

He used to give a talk with the title: "If you trust your computer, you're crazy", due to all of that proprietary code running on our systems. He hopes that this talk will give folks ways to deal with some of those problems, "so we can stop being crazy and maybe get a little sane".

[x86 operating systems]

He showed one of his slides [PDF] (above) that described the seen and unseen operating systems running on an x86 system. Ring 0 is Linux and, because "we ran out of ring numbers", hypervisors like Xen are ring -1, but below that are rings that are running code that you don't have access to, sometimes on processors you don't even know are part of the system. Ring -2 has a kernel and a half kernel; it consists of UEFI, which is the full kernel, and system management mode (SMM), which traps to 8086 16-bit mode, thus the "half" designation. Those control everything about the CPU and are invisible to the rings above. Every time you close the lid of your laptop, or do certain other things, SMM traps to classic 8086 mode; "that should make you happy", he said sarcastically.

Ring -3 is "the one that has people really worried". It runs MINIX 3 and is where the ME runs. It is the cause of the "year of MINIX 3 on the desktop", he joked, since there are more systems with the ME than any of Linux, macOS, or Windows.

There is no common code between the systems running in ring -2 and ring -3 as far as he knows, but they both have a wide range of capabilities. Both have IPv4 and IPv6 networking stacks, filesystems, drivers for various devices (disk, network, USB, ...), and web servers. The ME needs filesystems because it can be used to reimage the system; in fact, Minnich said, it can reimage the system even if the power is turned off as long as it is plugged into the wall and the network.

There is a whole raft of components that make up the ring -3 ME, many of which he does not understand. For example there are components named "full network manageability", "regular network manageability", and "manageability", as well as the "outbreak containment heuristic". He pointed to a Master's thesis [large PDF] from Vassilios Ververis about ten years ago that looked at many different flaws in the ME. It is rather depressing, Minnich said, since it showed that almost every part of the ME could be attacked; some of those bugs still have not been fixed.

[Ronald Minnich]

He referenced the headline of a Wired article about an ME exploit ("Intel Fixes a Critical Bug That Lingered for 7 Dang Years") that he thought was funny. Less funny was the bug itself that allowed a zero-length password to be sent to the web server to give administrator access to systems with the ME. Since the bug was present for seven years, that adds up to around a billion systems, he said, and he strongly doubts that all of those have been patched with a firmware update.

He moved on to the half OS in ring -2. SMM was originally meant to handle power management on DOS systems; it can take over the system out from under ring 0 when certain events (system management interrupts or SMIs) occur. There are a lot of SMI exploits and, once SMM is enabled, it cannot be turned off. It takes 8MB of memory away from the rest of the system for its purposes. SMM is "a good way to maintain vendor control over you", he said.

The other thing running in ring -2 is UEFI; both it and SMM run on the main CPU. UEFI is "an extremely complex kernel"; vendors are writing code for the kernel, but they don't understand all of the rules, so they make mistakes. The result is that "there are big, giant holes that people can drive exploits through". The UEFI security model, as far as he can see, is obscurity.

There are tons of exploits for UEFI, he said. Because UEFI is updated by handing off bits of UEFI code to the UEFI kernel, he is worried that exploits will persistently infect that process, such that it will claim to update itself, but not do so. That only leaves the shredder.

He summarized by reiterating what he had just described: 2.5 hidden OSes with network stacks, web servers, and other capabilities. These OSes have bugs that can persist across power cycles and reinstalls and those bugs have been exploited in the past. His old talk used to end here with a question: "Are you scared yet?"

Fixing the mess

"So how do we fix this mess?", he asked. Some people say to switch to AMD processors, but that is not really a solution now. Ryzen is touted to be open, but that is not truly the case, there are still closed parts. So the project is focusing on Intel x86 processors and has the goal of reducing the scope of the 2.5 OSes. The project is called "non-extensible reduced firmware" (NERF), partly because the team believes the "extensible" in UEFI is harmful. Apparently, there is no overall web page for NERF itself, though some of the components Minnich talks about do have web pages. [Update: As noted in the comments, there is a NERF web page.]

The idea behind NERF is to reduce the harm that the firmware is capable of. In addition, there is an effort to make what the firmware is doing more visible. It does this by removing almost all of the runtime components from the firmware; the "almost" refers to the ME, which is hard to kill completely, he said. If you completely remove the ME, your node probably will not boot, but NERF has taken away the ME's web server and IP stacks. The UEFI IP stack and other drivers have also been removed. Beyond that, the self-reflash capability for ME and UEFI has been removed, so Linux manages all flash updates.

The NERF components are a de-blobbed ME ROM and a UEFI ROM that has been reduced to its most basic parts; in addition, SMM has been disabled or vectored to Linux where that is needed. On top of that runs a Linux kernel with a Go-based user space (u-root). He noted that the project is particularly interested in any Go programmers who want to contribute to just that piece.

They would prefer to remove the ME entirely, but that simply is not an option. If you remove it, the system may not boot, power on, or, if it does power on, it may shut down again in 30 minutes. But there is some good news: the ME has multiple components and most of them can be removed.

He pointed to the me_cleaner project that will process an ME ROM to remove most of it. For example, on the MinnowMax, 5MB of the 8MB flash was used for the ME, but that was reduced to 300KB by me_cleaner. So you only need 300KB of the ME to boot Linux and that gets rid of all of the stuff you really don't want the ME to be doing anyway. The ME reduction is working for MinnowMax and a number of other boards, he said.

If you "get into the game early enough", and they believe their Linux kernel does, SMM can be completely disabled. As far as they can tell, there is no requirement to run SMM; it is mostly there for "value add" by the vendors, which is just a way to try to lock people into their platform. If it ever becomes an issue for some hardware, though, there are ways to vector the SMIs to the kernel. The theme is to keep Linux in control, he said.

UEFI is "huge and extremely complex", but there are a lot of mistakes made in the implementation of it. Some interrupts, including memory-error-detection interrupts, still need to be routed to UEFI, though. They want to remove the opportunities for UEFI drivers to put in exploits by making it non-extensible. He showed "an eye chart" of all of the different services that UEFI provides; he noted that it looks like a kernel, because it is, and said that "it is a sizable fraction of the size of Linux".

Next up, he showed the standard UEFI boot process; it starts with two phases (security or SEC and pre-EFI initialization or PEI) that are completely proprietary and will never be released by the vendors, he said. Beyond that, though, the next phase, which is called the driver execution environment (DXE), has a well-defined interface that multiple components (DXE core, drivers, boot manager, ...) conform to.

The boot manager is responsible for starting up the operating system. When you see the screen that allows choosing what to boot on a UEFI system, that is the boot manager. What they have done is to replace the boot manager with a Linux kernel that conforms to the DXE interface.

On the OCP node system that was being demonstrated elsewhere at the conference, booting the Linux kernel took 20 seconds from power-on; the Go-based user space does a DHCP query, a wget for the server kernel, and then a kexec into the new kernel, which takes an additional three seconds. There are plans to replace the DXE core component with something that is open source and knows more about how to boot Linux; that should reduce the boot time even further, Minnich said.

He does not believe that we will ever get access to that early boot code (SEC and PEI) for UEFI. Even for Chromebooks running coreboot, that piece is a binary blob. The best we can do, he said, is to replace the pieces at that well-defined interface, which is what has been done. In addition, the goal was to get rid of all the UEFI runtime services, which has been accomplished.

As part of his Heads project, Trammel Hudson has put together some Makefiles and the like to create a NERF image. That can be used with a custom kernel and initramfs to replace as much of UEFI as possible. They have had good results on a Dell server, the MinowMax, and the OCP nodes.

Using Linux makes the firmware easier to work with, Minnich said. Normally, there are lots of fiddly, hardware-specific pieces that need to be changed in the firmware, but using the DXE interface makes a lot of those problems go away. He expected that different kernels would be required for the different systems, but he has been using the same kernel on the MinnowMax, which is a small system, and the OCP node, which is a rather large system.

The user-space piece is all written in Go, which is generally more trusted than C within Google, he said. The 5.9MB initramfs contains all of the source code for the user space, all of the Go compiler and package sources, and a Go toolchain. The commands are built on the fly, as they are needed, which usually takes around 200ms per command; once they are built, it is "nearly instantaneous" (1ms) for them to run. From a security angle, that's good because all of the source is available to be examined.

For cases where there is not sufficient space for an initramfs of that size or enough CPU power to do even a fast compile step on the way to booting, there is another mode for the u-root Go commands. It is like BusyBox, in that there is one binary that is linked to a bunch of different command names; this mode uses the Go abstract syntax tree package to rewrite the commands as packages. That reduces the footprint to 2MB, which is useful on systems with less flash space.

There are some implications of the u-root work that has Minnich thinking about booting for desktop systems. With u-root, there are no scripts or unit files to deal with, there is simply a single program that boots the system, which leads to "things coming up really fast". It is more understandable for him and makes the boot process faster. There is a project at Google, called NiChrome, that can bring up a Chromebook all the way from power-up to X11 and a browser in five seconds.

Go is a compiled language, but it is often used for scripting. Minnich uses it that way "all the time"; he stopped writing Bash scripts years ago in favor of Go. It is "easier and more reliable" to write scripts in Go.

He concluded by saying that he is hoping to see companies ship hardware with NERF and u-root in 2018. Companies want to have firmware that they understand, he said; they also want it to boot quickly and be secure. In the Q&A, Minnich was asked about secure boot and TPMs. Neither is supported currently, though there is a non-working verified boot program in u-root at this point. For TPM support, he thinks the project will follow what Chrome OS has done, rather than take the secure boot path.

He was also asked about the relationship of this work to coreboot. Minnich said that coreboot should always be preferred, but it has not been available for server platforms for 12 years. So he would suggest that developers "always use coreboot if you can", but if not, look at NERF.

Those interested can view the YouTube video of Minnich's talk.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for supporting my travel to Prague for ELC Europe.]


(Log in to post comments)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK