

zig.internals/internals.rst at master · mikdusan/zig.internals · GitHub
source link: https://github.com/mikdusan/zig.internals/blob/master/internals.rst
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Zig Compiler Internals
note:
Due to limitations of this article format we overload diff
syntax highlighting to
achieve a highlighting effect for code listings.
Consequently, highlighted lines will display an exclamation-mark at the beginning of each
line. This mark should be ignored.
1 Introduction
The Zig compiler is implemented mostly in C++ with some parts in Zig userland.
Long-term goals (in no particular order) for the compiler are as follows:
- become self hosting
- add a fast backend (non-optimizing machine code generator)
- add fine-grained incremental builds
- continue to improve safe-mode code generation
2 Abstract
This article aims to document various internal aspects of the Zig Programming Language and bootstrap newcomers interested in debugging/contributing to the project.
3 Compiler Pipeline
The Zig compiler architecture pipeline is as follows:
- consume Zig source code
- generate tokens (LEX)
- generate abstract syntax tree (AST)
- generate Src internal representation (SIR)
- generate Gen internal representation (GIR)
- generate LLVM internal representation (LLVM-IR)
- emit machine code
3.1 Generate LEX
Source code is consumed and tokens are generated by tokenizer.cpp .
3.2 Generate AST
Tokens are consumed and the AST is generated by parser.cpp .
3.3 Generate SIR
AST is consumed and SIR is generated by analyze.cpp and ir.cpp .
- execute comptime
- resolve comptime types
- apply result location semantics
Zig has two parts to its internal representation, SIR and GIR where the "S" in Src-IR indicates that it's coming from the source-side of the pipeline and the "G" in Gen-IR indicates that it's heading towards the machine code generation side.
Both SIR and GIR are colloqually known as IR.
3.4 Generate GIR
SIR is consumed and GIR is generated by ir.cpp .
3.5 Generate LLVM-IR
GIR is consumed and LLVM-IR is generated by codegen.cpp .
4 Reading IR
This section will briefly describe textual representation of IR for example source reduction.zig
:
export fn reduction() u64 { var i: u64 = 999; i += 333; return i; }
4.1 SIR
SIR listing for reduction.zig
:
fn reduction() { // (IR) Entry_0: #1 | ResetResult | (unknown) | - | ResetResult(none) #2 | ResetResult | (unknown) | - | ResetResult(none) #3 | ResetResult | (unknown) | - | ResetResult(none) #4 | Const | type | 2 | u64 #5 | EndExpr | (unknown) | - | EndExpr(result=none,value=u64) #6 | Const | bool | 2 | false #7 | AllocaSrc | (unknown) | 1 | Alloca(align=(null),name=i) #8 | ResetResult | (unknown) | - | ResetResult(var(#7)) #9 | ResetResult | (unknown) | - | ResetResult(none) #10 | Const | comptime_int| 2 | 999 #11 | EndExpr | (unknown) | - | EndExpr(result=none,value=999) #12 | ImplicitCast | (unknown) | 1 | @implicitCast(u64,999) #13 | EndExpr | (unknown) | - | EndExpr(result=var(#7),value=#12) #14 | DeclVarSrc | void | - | var i = #7 // comptime = false #15 | ResetResult | (unknown) | - | ResetResult(none) #16 | ResetResult | (unknown) | - | ResetResult(none) #17 | VarPtr | (unknown) | 2 | &i #18 | LoadPtr | (unknown) | 1 | #17.* #19 | ResetResult | (unknown) | - | ResetResult(none) #20 | Const | comptime_int| 2 | 333 #21 | EndExpr | (unknown) | - | EndExpr(result=none,value=333) #22 | BinOp | (unknown) | 1 | #18 + 333 #23 | StorePtr | void | - | *#17 = #22 #24 | Const | void | 2 | {} #25 | EndExpr | (unknown) | - | EndExpr(result=none,value={}) #26 | CheckStatementIsVoid | (unknown) | - | @checkStatementIsVoid({}) #27 | ResetResult | (unknown) | - | ResetResult(none) #28 | ResetResult | (unknown) | - | ResetResult(return) #29 | VarPtr | (unknown) | 1 | &i #30 | LoadPtr | (unknown) | 4 | #29.* #31 | EndExpr | (unknown) | - | EndExpr(result=return,value=#30) #32 | AddImplicitReturnType | (unknown) | - | @addImplicitReturnType(#30) #35 | TestErrSrc | (unknown) | 2 | @testError(#30) #36 | TestComptime | (unknown) | 3 | @testComptime(#35) #37 | CondBr | noreturn | - | if (#35) $ErrRetErr_33 else $ErrRetOk_34 // comptime = #36 ErrRetErr_33: #39 | SaveErrRetAddr | (unknown) | - | @saveErrRetAddr() #40 | Br | noreturn | - | goto $RetStmt_38 // comptime = #36 ErrRetOk_34: #41 | Br | noreturn | - | goto $RetStmt_38 // comptime = #36 RetStmt_38: #42 | Return | noreturn | - | return #30 }
Each line represents an SIR instruction in tabular format columns with columns as follows:
- debug-id which is unique to the function body
- trimmed C++
struct
name representing an instruction type - Zig type for the instruction as an expression
- reference count for the instruction
- syntax (string representation) of the instruction
Intermixed between instructions are basic-block labels in style <name>_<debug-id>:
4.2 GIR
GIR listing for reduction.zig
:
fn reduction() { // (analyzed) Entry_0: #16 | StorePtr | void | - | *#12 = 999 :12 | AllocaGen | *u64 | 2 | Alloca(align=0,name=i) #17 | DeclVarGen | void | - | var i: u64 align(8) = #12 // comptime = false #20 | VarPtr | *u64 | 2 | &i #21 | LoadPtrGen | u64 | 1 | loadptr(#20)result=(null) #26 | BinOp | u64 | 1 | #21 + 333 #27 | StorePtr | void | - | *#20 = #26 #33 | VarPtr | *u64 | 1 | &i #34 | LoadPtrGen | u64 | 1 | loadptr(#33)result=(null) #39 | Return | noreturn | - | return #34 }
GIR is very similar to SIR and reduced in number of instructions as many have already been consumed by the pipeline. Bear in mind a few things:
- the debug-ids from GIR have no correlation to those from SIR
- many SIR instructions are illegal in GIR
- all types are resolved
We should pause for a moment and examine why one of the instructions in column 1 looks different. Looking backwards from :12
we see that #16
is using #12
and it's an AllocaGen
. These are special - the :12
rather than #12
indicates that the previous instruction references it, but it is not code-generated right there in that position. Rather, all the AllocaGen
instructions are code-generated at the very beginning of a function before anything else.
5 Common IR Instruction Set
5.1 general
5.1.1 BinOp
IrInstructionBinOp
represents a binary operation.
syntax:
<BinOp> ::= <op1> <op_id> <op1>
op1
first operandop_id
one of: BoolOr, BoolAnd, CmpEq, CmpNotEq, CmpLessThan, CmpGreaterThan, CmpLessOrEq, CmpGreaterOrEq, BinOr, BinXor, BinAnd, BitShiftLeftLossy, BitShiftLeftExact, BitShiftRightLossy, BitShiftRightExact, Add, AddWrap, Sub, SubWrap, Mult, MultWrap, DivUnspecified, DivExact, DivTrunc, DivFloor, RemUnspecified, RemRem, RemMod, ArrayCat, ArrayMult, MergeErrorSetsop2
second operand
source-reduction → SIR:
export fn reduction(one: u64, two: u64) void { var a: u64 = one + two; }fn reduction() { // (analyzed) Entry_0: #10 | VarPtr | *const u64 | 1 | &one ! #11 | LoadPtrGen | u64 | 1 | loadptr(#10)result=(null) #14 | VarPtr | *const u64 | 1 | &two ! #15 | LoadPtrGen | u64 | 1 | loadptr(#14)result=(null) ! #17 | BinOp | u64 | 1 | #11 + #15 #20 | StorePtr | void | - | *#19 = #17 :19 | AllocaGen | *u64 | 2 | Alloca(align=0,name=a) #22 | DeclVarGen | void | - | var a: u64 align(8) = #19 // comptime = false #26 | Return | noreturn | - | return {} }
5.1.2 Const
IrInstructionConst
is a compile-time instruction.
syntax:
<Const> ::= <value>
value
comptime value
source-reduction → SIR:
export fn reduction() void { _ = true; }fn reduction() { // (IR) Entry_0: #1 | ResetResult | (unknown) | - | ResetResult(none) #2 | ResetResult | (unknown) | - | ResetResult(none) #3 | ResetResult | (unknown) | - | ResetResult(none) #4 | Const | *void | 1 | *_ #5 | ResetResult | (unknown) | - | ResetResult(inst(*_)) #6 | Const | bool | 1 | true #7 | EndExpr | (unknown) | - | EndExpr(result=inst(*_),value=true) ! #8 | Const | void | 2 | {} #9 | EndExpr | (unknown) | - | EndExpr(result=none,value={}) #10 | CheckStatementIsVoid | (unknown) | - | @checkStatementIsVoid({}) #11 | Const | void | 0 | {} #12 | Const | void | 3 | {} #13 | EndExpr | (unknown) | - | EndExpr(result=none,value={}) #14 | AddImplicitReturnType | (unknown) | - | @addImplicitReturnType({}) #15 | Return | noreturn | - | return {} }
5.2 terminators
5.2.1 Br
IrInstructionBr
unconditionally transfers control flow to another basic-block.
syntax:
<Br> ::= "goto" "$"<dest_block>
dest_block
branch to take
source-reduction → GIR:
export fn reduction(cond: bool) void { var a: u64 = 999; if (cond) { a += 333; } }fn reduction() { // (analyzed) Entry_0: #16 | StorePtr | void | - | *#12 = 999 :12 | AllocaGen | *u64 | 2 | Alloca(align=0,name=a) #17 | DeclVarGen | void | - | var a: u64 align(8) = #12 // comptime = false #20 | VarPtr | *const bool | 1 | &cond #21 | LoadPtrGen | bool | 1 | loadptr(#20)result=(null) #27 | CondBr | noreturn | - | if (#21) $Then_25 else $Else_26 Then_25: #30 | VarPtr | *u64 | 2 | &a #31 | LoadPtrGen | u64 | 1 | loadptr(#30)result=(null) #36 | BinOp | u64 | 1 | #31 + 333 #37 | StorePtr | void | - | *#30 = #36 ! #47 | Br | noreturn | - | goto $EndIf_43 Else_26: ! #50 | Br | noreturn | - | goto $EndIf_43 ! EndIf_43: #57 | Return | noreturn | - | return {} }
5.2.2 CondBr
IrInstructionCondBr
conditionally transfers control flow to other basic-blocks.
syntax:
<CondBr> ::= "if" "(" <condition> ")" "$"<then_block> "else" "$"<else_block>
condition
is evaluated as abool
then_block
branch taken ifcondition
==true
else_block
branch taken ifcondition
==false
source-reduction → GIR:
export fn reduction(cond: bool) void { var a: u64 = 999; if (cond) { a += 333; } else { a -= 333; } }fn reduction() { // (analyzed) Entry_0: #16 | StorePtr | void | - | *#12 = 999 :12 | AllocaGen | *u64 | 2 | Alloca(align=0,name=a) #17 | DeclVarGen | void | - | var a: u64 align(8) = #12 // comptime = false #20 | VarPtr | *const bool | 1 | &cond #21 | LoadPtrGen | bool | 1 | loadptr(#20)result=(null) ! #27 | CondBr | noreturn | - | if (#21) $Then_25 else $Else_26 ! Then_25: #30 | VarPtr | *u64 | 2 | &a #31 | LoadPtrGen | u64 | 1 | loadptr(#30)result=(null) #36 | BinOp | u64 | 1 | #31 + 333 #37 | StorePtr | void | - | *#30 = #36 #60 | Br | noreturn | - | goto $EndIf_56 ! Else_26: #44 | VarPtr | *u64 | 2 | &a #45 | LoadPtrGen | u64 | 1 | loadptr(#44)result=(null) #50 | BinOp | u64 | 1 | #45 - 333 #51 | StorePtr | void | - | *#44 = #50 #63 | Br | noreturn | - | goto $EndIf_56 EndIf_56: #70 | Return | noreturn | - | return {} }
5.2.3 Return
IrInstructionReturn
unconditionally transfers control flow back to the caller basic-block.
syntax:
<Return> ::= "return" "{}"
source-reduction → GIR:
export fn reduction() void {}fn reduction() { // (analyzed) Entry_0: ! #5 | Return | noreturn | - | return {} }
6 Compiler Building
6.1 Overview
- cmake
- compile common C++ sources
- compile
userland.o
C++ sources - link
zig0
stage0 compiler - compile
libuserland.a
Zig sources - link
zig
stage1 compiler
userland.o
This is a shim implementation of libuserland.a
and is completely implemented in C++.
All exported symbols must match libuserland.a
. zig0
links against but never makes
calls against the shim. All shims are implemented as panics.
zig0
Also known as the stage0 compiler.
It links against userland.o
and is a functionally limited compiler but is robust
enough to build libuserland.a
.
zig0
can build Zig source code, run tests and produce executables.
It can be debugged with a native debugger such as gdb
or lldb
.
But it cannot do things like zig0 build ...
because part of that functionality
is implemented in libuserland.a
.
During Zig compiler development it may be of use to develop against zig0
in an interative fashion.
Here is an example of using stage0 to emit IR and LLVM-IR:
$ _build/zig0 --override-std-dir std --override-lib-dir . build-obj reduction.zig --verbose-ir --verbose-llvm-ir
and a corresponding example of launching lldb
debugger:
$ lldb _build/zig0 -- --override-std-dir std --override-lib-dir . build-obj reduction.zig
libuserland.a
This is a support library implemented in Zig userland.
It replaces all shims from userland.o
with implementations.
zig
links against this library instead of userland.o
.
zig
Also known as the stage1 compiler.
It links against libuserland.a
and is a fully functional compiler.
It can be debugged with a native debugger such as gdb
or lldb
.
7 How-To: Common Tasks
7.1 iteratively build compiler
note: for stage1 replace zig0
with zig
:
using make
:
$ make -C _build zig0 $ _build/zig0 --override-std-dir std --override-lib-dir . version
using ninja
:
$ ninja -C _build zig0 $ _build/zig0 --override-std-dir std --override-lib-dir . version
7.2 debug compiler
note: for stage1 replace zig0
with zig
:
using gdb
:
$ _build/zig0 --override-std-dir std --override-lib-dir build-obj foobar.zig segmentation fault $ gdb --args _build/zig0 --override-std-dir std --override-lib-dir build-obj foobar.zig
using lldb
:
$ _build/zig0 --override-std-dir std --override-lib-dir build-obj foobar.zig segmentation fault $ lldb _build/zig0 -- --override-std-dir std --override-lib-dir build-obj foobar.zig
7.3 debug: print instruction source location
using lldb
:
(lldb) frame variable instruction (IrInstructionSliceSrc *) instruction = 0x0000000108156910 ! (lldb) p instruction->base.source_node->src() ~/zig/work/bounds1.zig:3:23
7.4 print IR listing
note: for stage1 replace zig0
with zig
:
$ _build/zig0 --override-std-dir std --override-lib-dir build-obj reduction.zig --verbose-ir
pro-tip: to reduce IR noise add this to reduction.zig
:
// override panic handler to reduce IR noise pub fn panic(msg: []const u8, error_return_trace: ?*@import("builtin").StackTrace) noreturn { while (true) {} }
7.5 configure for ninja
$ cd ~/zig/work $ mkdir _build $ cmake -G Ninja -S . -B _build -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/opt/zig -DCMAKE_PREFIX_PATH=/opt/llvm-8.0.1
7.6 behavior tests
These are language-fundamental tests like flow-control, types, alignment, pointers, optionals, slices, arrays. It is crucial the compiler can pass these tests after making internal changes.
direct
The most fine-grained way to run tests is via zig test ...
command.
Here we run unit tests for the while
flow-control:
_build/zig0 --override-std-dir std --override-lib-dir . test test/stage1/behavior/while.zig
1/20 test "while loop"...OK 2/20 test "static eval while"...OK 3/20 test "continue and break"...OK 4/20 test "return with implicit cast from while loop"...OK 5/20 test "while with continue expression"...OK 6/20 test "while with else"...OK 7/20 test "while with optional as condition"...OK 8/20 test "while with optional as condition with else"...OK 9/20 test "while with error union condition"...OK 10/20 test "while on optional with else result follow else prong"...OK 11/20 test "while on optional with else result follow break prong"...OK 12/20 test "while on error union with else result follow else prong"...OK 13/20 test "while on error union with else result follow break prong"...OK 14/20 test "while on bool with else result follow else prong"...OK 15/20 test "while on bool with else result follow break prong"...OK 16/20 test "break from outer while loop"...OK 17/20 test "continue outer while loop"...OK 18/20 test "while bool 2 break statements and an else"...OK 19/20 test "while optional 2 break statements and an else"...OK 20/20 test "while error 2 break statements and an else"...OK
and it can be restricted even further with simple filtering:
_build/zig0 --override-std-dir std --override-lib-dir . test test/stage1/behavior/while.zig --test-filter bool
1/3 test "while on bool with else result follow else prong"...OK 2/3 test "while on bool with else result follow break prong"...OK 3/3 test "while bool 2 break statements and an else"...OK All tests passed.
via build
When the compiler is able to compile build.zig
larger test suites can be used.
Here we run all the behavior tests with the following restrictions:
- skip repeating test against
--release-safe
and--release-fast
compiler modes - skip repeating test for non-native platforms (run for host only)
- test will still run for targets permutations such as freestanding, libc, single-threaded and multi-threaded.
- filter for tests with
break
in name
_build/zig0 build --override-std-dir std --override-lib-dir . test-behavior -Dskip-release -Dskip-non-native -Dtest-filter=break
8 Best Practices
8.1 Always direct stage0 to workspace
It is recommended to override std
and lib
dirs for zig0
.
zig build
functionality is responsible for completing a compiler install.
Since it is likely zig0
development involves writing tests and userland changes
those files cannot be installed until your development is able to progress to stage1.
$ _build/zig0 --override-std-dir std --override-lib-dir build-obj reduction.zig
8.2 Reduce and Reduce and Reduce Again
Whether tracking down a bug or investigating compiler internals it's a good idea to reduce exposure to unrelated things.
- Source related issues should be reduced as much as possible. Any superfluous source can easily lead to an unnecessary loss of clarity and wasted time.
- When tracking compiler segfaults try also to reduce the compiler environment:
- if crashing during
zig run
,zig test
orzig build
then tryzig build-obj
instead - file/directory permissions, including
zig-cache
if active (remember, there are 2 caches) - Make sure to identify where the segfault is coming from: userland or compiler?
- Sanity check dependencies of compiler: official build instructions
- if crashing during
Recommend
-
61
GitHub is where people build software. More than 27 million people use GitHub to discover, fork, and contribute to over 80 million projects.
-
13
Introduction reStructuredText (one word) is a plain-text markup language for writing technical documents, books, websites, and more. It is eas...
-
11
docs/drivers/zink.rst · master · Mesadocs: document new zink-flag Erik Faye-Lun...
-
12
Sphinx帮助我通过rst生成静态html Dec 23, 2016 windows下使用sphinx生成静态网页。 官方教程,特详细 http://www.sphinx-doc.org/en/stable/tutorial.html 参考IBM https://www.ibm.com/develo...
-
9
rst_tables 改进版 本文来自依云's Blog,转载请注明。 rst_tables 是一个用...
-
15
Permalink main
-
38
ansible-core 2.13 "Nobody's Fault but Mine" Release Notes v2.13.0
-
10
main ideas/main-vs-310.rst
-
23
The Linux Watchdog driver API Last reviewed: 10/05/2007 Copyright 2002 Christer Weingel <[email protected]> Some parts of this document are copied verbat...
-
5
Important notes This section provides information about security and corruption issues. Pre-1.1.11 potential index corruption / data loss issue A bug was discovered in our hashtable code, see issue #4829. The code is used...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK