This article explores compiler construction by building a simple compiler from scratch, covering all major phases of compilation.
Pipeline phases:
Lexing: Source → Tokens
Parsing: Tokens → AST
Semantic Analysis: Type checking
IR Generation: AST → Intermediate Representation
Optimization: Constant folding, dead code elimination
Code Generation: IR → Assembly
Understanding compilers helps write better code, debug more effectively, and design languages.
Building Compilers: From Source to Machine Code
Introduction: The Ultimate Translation
Every time you write print("Hello, World!") and hit compile, a minor miracle occurs. Your human-readable code transforms through multiple layers of abstraction, eventually becoming a sequence of 1s and 0s that your CPU can execute directly. This transformation is the work of a compiler—one of the most sophisticated programs ever written.
Understanding compilers is essential for serious programmers because:
Performance: Knowing what the compiler does helps you write faster code
Debugging: Compiler errors make more sense when you understand compilation phases
Language design: Creating DSLs or new languages requires compiler knowledge
Career growth: Compiler engineering is a specialized, high-value skill
This article will demystify compilers by building a simple one from scratch, explaining each phase with working code.
Optimizers transform IR to faster equivalent code:
classOptimizer{optimize(ir:TACInstruction[]):TACInstruction[]{letoptimized=ir;letchanged=true;while(changed){changed=false;constnewIR=this.constantFolding(optimized);if(newIR.length<optimized.length){changed=true;optimized=newIR;}optimized=this.deadCodeElimination(optimized);}returnoptimized;}// Constant folding: evaluate compile-time constantsprivateconstantFolding(ir:TACInstruction[]):TACInstruction[]{constconstants=newMap<string,number>();constoptimized:TACInstruction[]=[];for(constinstrofir){if(instr.op==="assign"&&typeofinstr.src==="number"){constants.set(instr.dest,instr.src);optimized.push(instr);}elseif(["add","sub","mul","div"].includes(instr.op)){constleftVal=constants.get(instr.left);constrightVal=constants.get(instr.right);if(leftVal!==undefined&&rightVal!==undefined){// Both operands are constants, fold them!letresult:number;switch(instr.op){case"add":result=leftVal+rightVal;break;case"sub":result=leftVal-rightVal;break;case"mul":result=leftVal*rightVal;break;case"div":result=leftVal/rightVal;break;}constants.set(instr.dest,result);optimized.push({op:"assign",dest:instr.dest,src:result});}else{optimized.push(instr);}}else{optimized.push(instr);}}returnoptimized;}// Dead code eliminationprivatedeadCodeElimination(ir:TACInstruction[]):TACInstruction[]{constused=newSet<string>();// Find all used variablesfor(constinstrofir){if(["add","sub","mul","div"].includes(instr.op)){used.add(instr.left);used.add(instr.right);}}// Remove assignments to unused tempsreturnir.filter(instr=>{if(instr.op==="assign"&&instr.dest.startsWith("t")){returnused.has(instr.dest);}returntrue;});}}
Compilers are the bridge between human thought and machine execution. Understanding them makes you a better programmer.
Key takeaways:
Compilation is a multi-phase pipeline
Each phase has a specific responsibility
IR enables target-independent optimization
Real compilers use sophisticated algorithms (graph coloring for register allocation, SSA for optimization)
Modern compilers like LLVM and GCC are marvels of engineering with millions of lines of code. But at their core, they follow the same principles we explored here.