Learn more about Israeli genocide in Gaza, funded by the USA, Germany, the UK and others.

Bootstrapping a C compiler

In “Reflections on Trusting Trust”, Ken Thompson shows that you can’t trust a program on your computer, even if you “compiled it from source” yourself, because your compiler is also untrusted. It’s not good enough to recompile your compiler, because for that you need another untrusted compiler. The gcc currently on your machine has a very long compilation ancestry, completely lost to the sands of time.

In response, people say “it’s turtles all the way down”. The saying implies the stack is infinite, but it’s not. A fix to the trust problem is to rebuild the stack of turtles yourself. Imagine a project called “BootstrapCC”. BootstrapCC begins with a handwritten ELF x86 program, b0. The program b0 has a tiny job: to transform ASCII hex into binary. b0 is our first compiler, and ensures that the rest of our source can be written in readable ASCII. b0 is small and trivial enough to be hand-verified.

BootstrapCC builds the binary b1 using b0, then builds b2 using both of those, and so on. Eventually, BootstrapCC produces bn, a compiler for the subset of C used by the Tiny C Compiler. From here, BootstrapCC can compile TCC, from which most other things in existence can be built. Along the way, BootstrapCC creates successively higher-level languages: an assembler, an IR, perhaps a stack-based language or a Lisp.

BootstrapCC does not rely on the lost 70-year lineage of modern programs. BootstrapCC does rely on some things, such as Intel hardware and the OS. But this is a much smaller footprint.

People are aware of the “trusting trust” paper, but they treat it as a curiosity. Some people are fanatical about compiling their programs themselves, but don’t take it to its logical conclusion. Lots of people love creating new programming languages, but all of them are written in some other language. There are even lots of people creating new compilers, but all of them need compiling with something else. BootstrapCC, sadly, is fictional, and I’m not aware of any project which performs this the bootstrapping process I described here.

Update!: BootstrapCC exists! It’s a wonderful-looking project called stage0, which describes itself thus:

This is a set of manually created hex programs in a Cthulhu Path to madness fashion. Which only have the goal of creating a bootstrapping path to a C compiler capable of compiling GCC, with only the explicit requirement of a single 1 KByte binary or less.

Additionally, all code must be able to be understood by 70% of the population of programmers. If the code can not be understood by that volume, it needs to be altered until it satisfies the above requirement.

Tagged .

Similar posts

More by Jim

Want to build a fantastic product using LLMs? I work at Granola where we're building the future IDE for knowledge work. Come and work with us! Read more or get in touch!

This page copyright James Fisher 2018. Content is not associated with my employer. Found an error? Edit this page.