Kernels 101 – Let’s write a Kernel

Hello World,

Let us write a simple kernel which could be loaded with the GRUB bootloader on an x86 system. This kernel will display a message on the screen and then hang.

How does an x86 machine boot

Before we think about writing a kernel, let’s see how the machine boots up and transfers control to the kernel:

Most registers of the x86 CPU have well defined values after power-on. The Instruction Pointer (EIP) register holds the memory address for the instruction being executed by the processor. EIP is hardcoded to the value 0xFFFFFFF0. Thus, the x86 CPU is hardwired to begin execution at the physical address 0xFFFFFFF0. It is in fact, the last 16 bytes of the 32-bit address space. This memory address is called reset vector.

Now, the chipset’s memory map makes sure that 0xFFFFFFF0 is mapped to a certain part of the BIOS, not to the RAM. Meanwhile, the BIOS copies itself to the RAM for faster access. This is called shadowing. The address 0xFFFFFFF0 will contain just a jump instruction to the address in memory where BIOS has copied itself.

Thus, the BIOS code starts its execution. BIOS first searches for a bootable device in the configured boot device order. It checks for a certain magic number to determine if the device is bootable or not. (whether bytes 511 and 512 of first sector are 0xAA55)

Once the BIOS has found a bootable device, it copies the contents of the device’s first sector into RAM starting from physical address 0x7c00; and then jumps into the address and executes the code just loaded. This code is called the bootloader.

The bootloader then loads the kernel at the physical address 0x100000. The address 0x100000 is used as the start-address for all big kernels on x86 machines.

All x86 processors begin in a simplistic 16-bit mode called real mode. The GRUB bootloader makes the switch to 32-bit protected mode by setting the lowest bit of CR0 register to 1. Thus the kernel loads in 32-bit protected mode.

Do note that in case of linux kernel, GRUB detects linux boot protocol and loads linux kernel in real mode. Linux kernel itself makes the switch to protected mode.

What all do we need?

* An x86 computer (of course)
* Linux
* NASM assembler
* gcc
* ld (GNU Linker)
* grub

Source Code

Source code is available at my Github repository - mkernel

The entry point using assembly

We like to write everything in C, but we cannot avoid a little bit of assembly. We will write a small file in x86 assembly-language that serves as the starting point for our kernel. All our assembly file will do is invoke an external function which we will write in C, and then halt the program flow.

How do we make sure that this assembly code will serve as the starting point of the kernel?

We will use a linker script that links the object files to produce the final kernel executable. (more explained later) In this linker script, we will explicitly specify that we want our binary to be loaded at the address 0x100000. This address, as I have said earlier, is where the kernel is expected to be. Thus, the bootloader will take care of firing the kernel’s entry point.

Here’s the assembly code:

;;kernel.asm
bits 32			;nasm directive - 32 bit
section .text

global start
extern kmain	        ;kmain is defined in the c file

start:
  cli 			;block interrupts
  mov esp, stack_space	;set stack pointer
  call kmain
  hlt		 	;halt the CPU

section .bss
resb 8192		;8KB for stack
stack_space:

The first instruction bits 32 is not an x86 assembly instruction. It’s a directive to the NASM assembler that specifies it should generate code to run on a processor operating in 32 bit mode. It is not mandatorily required in our example, however is included here as it’s good practice to be explicit.

The second line begins the text section (aka code section). This is where we put all our code.

global is another NASM directive to set symbols from source code as global. By doing so, the linker knows where the symbol start is; which happens to be our entry point.

kmain is our function that will be defined in our kernel.c file. extern declares that the function is declared elsewhere.

Then, we have the start function, which calls the kmain function and halts the CPU using the hlt instruction. Interrupts can awake the CPU from an hlt instruction. So we disable interrupts beforehand using cli instruction. cli is short for clear-interrupts.

We should ideally set aside some memory for the stack and point the stack pointer (esp) to it. However, it seems like GRUB does this for us and the stack pointer is already set at this point. But, just to be sure, we will allocate some space in the BSS section and point the stack pointer to the beginning of the allocated memory. We use the resb instruction which reserves memory given in bytes. After it, a label is left which will point to the edge of the reserved piece of memory. Just before the kmain is called, the stack pointer (esp) is made to point to this space using the mov instruction.

The kernel in C

In kernel.asm, we made a call to the function kmain(). So our C code will start executing at kmain():

/*
*  kernel.c
*/
void kmain(void)
{
	const char *str = "my first kernel";
	char *vidptr = (char*)0xb8000; 	//video mem begins here.
	unsigned int i = 0;
	unsigned int j = 0;

	/* this loops clears the screen
	* there are 25 lines each of 80 columns; each element takes 2 bytes */
	while(j < 80 * 25 * 2) {
		/* blank character */
		vidptr[j] = ' ';
		/* attribute-byte - light grey on black screen */
		vidptr[j+1] = 0x07; 		
		j = j + 2;
	}

	j = 0;

	/* this loop writes the string to video memory */
	while(str[j] != '\0') {
		/* the character's ascii */
		vidptr[i] = str[j];
		/* attribute-byte: give character black bg and light grey fg */
		vidptr[i+1] = 0x07;
		++j;
		i = i + 2;
	}
	return;
}

All our kernel will do is clear the screen and write to it the string “my first kernel”.

First we make a pointer vidptr that points to the address 0xb8000. This address is the start of video memory in protected mode. The screen’s text memory is simply a chunk of memory in our address space. The memory mapped input/output for the screen starts at 0xb8000 and supports 25 lines, each line contain 80 ascii characters.

Each character element in this text memory is represented by 16 bits (2 bytes), rather than 8 bits (1 byte) which we are used to. The first byte should have the representation of the character as in ASCII. The second byte is the attribute-byte. This describes the formatting of the character including attributes such as color.

To print the character s in green color on black background, we will store the character s in the first byte of the video memory address and the value 0x02 in the second byte.
0 represents black background and 2 represents green foreground.

Have a look at table below for different colors:

0 - Black, 1 - Blue, 2 - Green, 3 - Cyan, 4 - Red, 5 - Magenta, 6 - Brown, 7 - Light Grey, 8 - Dark Grey, 9 - Light Blue, 10/a - Light Green, 11/b - Light Cyan, 12/c - Light Red, 13/d - Light Magenta, 14/e - Light Brown, 15/f – White.

In our kernel, we will use light grey character on a black background. So our attribute-byte must have the value 0x07.

In the first while loop, the program writes the blank character with 0x07 attribute all over the 80 columns of the 25 lines. This thus clears the screen.

In the second while loop, characters of the null terminated string “my first kernel” are written to the chunk of video memory with each character holding an attribute-byte of 0x07.

This should display the string on the screen.

The linking part

We will assemble kernel.asm with NASM to an object file; and then using GCC we will compile kernel.c to another object file. Now, our job is to get these objects linked to an executable bootable kernel.

For that, we use an explicit linker script, which can be passed as an argument to ld (our linker).

/*
*  link.ld
*/
OUTPUT_FORMAT(elf32-i386)
ENTRY(start)
SECTIONS
 {
   . = 0x100000;
   .text : { *(.text) }
   .data : { *(.data) }
   .bss  : { *(.bss)  }
 }

First, we set the output format of our output executable to be 32 bit Executable and Linkable Format (ELF). ELF is the standard binary file format for Unix-like systems on x86 architecture.

ENTRY takes one argument. It specifies the symbol name that should be the entry point of our executable.

SECTIONS is the most important part for us. Here, we define the layout of our executable. We could specify how the different sections are to be merged and at what location each of these is to be placed.

Within the braces that follow the SECTIONS statement, the period character (.) represents the location counter.
The location counter is always initialized to 0x0 at beginning of the SECTIONS block. It can be modified by assigning a new value to it.

Remember, earlier I told you that kernel’s code should start at the address 0x100000. So, we set the location counter to 0x100000.

Have look at the next line .text : { *(.text) }

The asterisk (*) is a wildcard character that matches any file name. The expression *(.text) thus means all .text input sections from all input files.

So, the linker merges all text sections of the object files to the executable’s text section, at the address stored in the location counter. Thus, the code section of our executable begins at 0x100000.

After the linker places the text output section, the value of the location counter will become
0x1000000 + the size of the text output section.

Similarly, the data and bss sections are merged and placed at the then values of location-counter.

Grub and Multiboot

Now, we have all our files ready to build the kernel. But, since we like to boot our kernel with the GRUB bootloader, there is one step left.

There is a standard for loading various x86 kernels using a boot loader; called as Multiboot specification.

GRUB will only load our kernel if it complies with the Multiboot spec.

According to the spec, the kernel must contain a header (known as Multiboot header) within its first 8 KiloBytes.

Further, This Multiboot header must contain 3 fields that are 4 byte aligned namely:

a magic field: containing the magic number 0x1BADB002, to identify the header.
a flags field: We will not care about this field. We will simply set it to zero.
a checksum field: the checksum field when added to the fields ‘magic’ and ‘flags’ must give zero.

So our kernel.asm will become:

;;kernel.asm

;nasm directive - 32 bit
bits 32
section .text
        ;multiboot spec
        align 4
        dd 0x1BADB002            ;magic
        dd 0x00                  ;flags
        dd - (0x1BADB002 + 0x00) ;checksum. m+f+c should be zero

global start
extern kmain	        ;kmain is defined in the c file

start:
  cli 			;block interrupts
  mov esp, stack_space	;set stack pointer
  call kmain
  hlt		 	;halt the CPU

section .bss
resb 8192		;8KB for stack
stack_space:

The dd defines a double word of size 4 bytes.

Building the kernel

We will now create object files from kernel.asm and kernel.c and then link it using our linker script.

nasm -f elf32 kernel.asm -o kasm.o

will run the assembler to create the object file kasm.o in ELF-32 bit format.

gcc -m32 -c kernel.c -o kc.o

The ’-c ’ option makes sure that after compiling, linking doesn’t implicitly happen.

ld -m elf_i386 -T link.ld -o kernel kasm.o kc.o

will run the linker with our linker script and generate the executable named kernel.

Configure your grub and run your kernel

GRUB requires your kernel to be of the name pattern kernel-<version>. So, rename the kernel. I renamed my kernel executable to kernel-701.

Now place it in the /boot directory. You will require superuser privileges to do so.

In your GRUB configuration file grub.cfg you should add an entry, something like:

title myKernel
	root (hd0,0)
	kernel /boot/kernel-701 ro

Don’t forget to remove the directive hiddenmenu if it exists.

Reboot your computer, and you’ll get a list selection with the name of your kernel listed.

Select it and you should see:

That’s your kernel!!

PS:

* It’s always advisable to get yourself a virtual machine for all kinds of kernel hacking. * To run this on grub2 which is the default bootloader for newer distros, your config should look like this:

menuentry 'kernel 701' {
	set root='hd0,msdos1'
	multiboot /boot/kernel-701 ro
}

* Also, if you want to run the kernel on the qemu emulator instead of booting with GRUB, you can do so by:

qemu-system-i386 -kernel kernel

Also, see the next article in the Kernel series:
Kernels 201 - Let’s write a Kernel with keyboard and screen support

References and Thanks

Kernel 101 – Let’s write a Kernel - Arjun Sreedharan

Kernels 101 – Let’s write a Kernel

Recommend

The best laptop ever made – Marco.org

How To Customize Firefox +57 User Interface – FOSS Post

GoalKicker.com – Free Programming Books

What’s the real deal with in-browser VPNs? | TechRadar

#195: Playwright with Arjun Attam

In conversation with Arjun Narayan, CEO, Materialize

Why Arjun Malhotra invested in HyperTrack’s location logistics cloud

Speakeasy JS - High Performance Data Compression in a Tiny Package (Arjun Barret...

Arjun Aleti (@arjunaleti) / FAUN

Ex-Google Safety Lead Arjun Narayan Discusses AI-Written News

About Joyk

Kernel 101 &ndash; Let&rsquo;s write a Kernel - Arjun Sreedharan

Kernels 101 – Let’s write a Kernel

Recommend

About Joyk

Kernel 101 – Let’s write a Kernel - Arjun Sreedharan