Routing Back to C [Part 1]: Judgment Is the Last Expensive Thing
LLMs killed the cost of typing code. The bottleneck is now judgment, and judgment needs to know what's underneath. Part 1 of a six-part series where I rebuild the things every language hides from you, in C.
A LinkedIn post crossed my feed last week:
Every programmer should learn C. Implement a linked list, hash table, and binary tree. Then build a simple CLI program and a basic network server. Not because you’ll use it daily, but because it strips away every abstraction you’ve been hiding behind and shows you what’s really beneath whatever language you use daily.
I agreed instantly, then started writing this series.
Why Now
LLMs killed the cost of typing code. What didn’t change is judgment, knowing what the machine should do, and what a line of code actually costs.
writing code running code
BEFORE BEFORE
▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓
NOW NOW
▒ ▓▓▓▓▓ ← unchanged
Vibe-coding works if you already have judgment. Otherwise it ships bloat at LLM speed:
"build me a thing"
│
▼
┌──────────────────────────────────────┐
│ + 3 frameworks │
│ + accidental N+1 │
│ + dead branches │
│ + allocations you didn't ask for │
│ + 12 lines that should be 3 │
└──────────────────────────────────────┘
I make games. Every frame has a budget:
0ms 4ms 8ms 12ms 16ms
├─────────┼─────────┼─────────┼─────────┤
60fps: │ draw │ physics │ ai │ audio │
└─────────┴─────────┴─────────┴─────────┘
0ms 4ms 8ms
├─────────┼─────────┤
120fps: │ draw │ physics │ ← same work, half the budget
└─────────┴─────────┘
overshoot anywhere → stutter
Nowhere to hide a slow line. The LLM era should treat all code this way. Writing is free; running isn’t. Use the saved time to make what you ship actually fast.
C is the cheapest way to learn what a line really costs.
What C Strips Away
Every modern language hides three things:
- Where memory lives. Stack, heap, segments.
- Who frees it. A GC, a destructor, an ownership checker, somebody.
- What a pointer is. A variable that holds an address, not a value.
Memory layout of a running C program on mainstream hardware (x86, x86_64, ARM, ARM64):
high addresses
┌──────────────────────┐
│ stack │ ← grows down
│ (function frames, │
│ local variables) │
├──────────────────────┤
│ ↓ │
│ │
│ ↑ │
├──────────────────────┤
│ heap │ ← grows up
│ (malloc / free) │
├──────────────────────┤
│ bss + data │ ← globals, statics
├──────────────────────┤
│ text │ ← your compiled code
└──────────────────────┘
low addresses
Stack-down, heap-up is convention, not standard. Every modern target looks like this. Python, Java, Go, all sit on top of this picture. Prettier syntax doesn’t make it go away.
Stack vs Heap
The stack is automatic. Call pushes a frame. Return pops it. Fast, ordered, no thinking.
call main() call greet() greet returns
┌─────────┐ ┌─────────┐ ┌─────────┐
│ main │ │ greet │ │ main │
│ argc │ ├─────────┤ │ argc │
│ argv │ │ main │ │ argv │
└─────────┘ │ argc │ └─────────┘
│ argv │
└─────────┘
The heap is manual. Ask, get a pointer, keep it until you free. The frame that asked can disappear; the chunk stays until someone releases it.
int *p = malloc(sizeof(int)); // 4 bytes on the heap
*p = 42; // write 42 there
free(p); // give it back
Three lines. Every GC language does this for you a billion times a day.
Pointers in One Picture
A pointer is a variable that holds an address instead of a value.
int a = 42; // a holds a value
int *p = &a; // p holds the address of a
| Written | Means | Example |
|---|---|---|
a | value stored in a | 42 |
&a | address where a lives | 0x40 |
p | value stored in p (which is an address) | 0x40 |
*p | follow p, give me what’s there | 42 |
& = “address of”. * = “follow this address”. Opposites: *(&a) is a.
name address value
┌───┐ ┌──────┐ ┌──────┐
│ a │ → │ 0x40 │ → │ 42 │ stack variable
└───┘ └──────┘ └──────┘
┌───┐ ┌──────┐ ┌──────┐
│ p │ → │ 0x80 │ → │ 0x40 │ p's value is a's address
└───┘ └──────┘ └──────┘
Writing *p walks the arrow from p to a and lands on 42.
How Big Is an Address?
8 bytes on 64-bit, 4 on 32-bit. The pointed-at type doesn’t change it.
sizeof(char) = 1
sizeof(int) = 4 ← typical, not guaranteed
sizeof(int *) = 8 ┐
sizeof(char *) = 8 │ every data pointer is 8 bytes on 64-bit
sizeof(void *) = 8 ┘
Standard footnotes: sizeof(char) is exactly 1 by definition. sizeof(int) is implementation-defined (the standard guarantees ≥2), but gcc/clang/MSVC all pick 4. I’ll drop the disclaimers from here on.
On 64-bit, a pointer to a 4-byte int is bigger than the int itself. A linked list of 32-bit values spends more memory on next pointers than on the values:
┌──────┬──────────┐ ┌──────┬──────────┐
│ val │ next │ → │ val │ next │ → ...
│ 4 B │ 8 B │ │ 4 B │ 8 B │
└──────┴──────────┘ └──────┴──────────┘
▲
twice the value
First taste of why dynamic structures feel heavy.
One Byte, One Address
Memory is a flat array of bytes. Every byte has one address. A variable takes as many consecutive bytes as its type says, and is referred to by its first one.
address: 0x40 0x41 0x42 0x43 0x44
┌────┬────┬────┬────┬────┐
bytes: │ 2A │ 00 │ 00 │ 00 │ 07 │
└────┴────┴────┴────┴────┘
└──── int a = 42 ────┘ └─ char c = 7
2A 00 00 00 is little-endian: least significant byte at the lowest address. x86, x86_64, ARM64 (usual mode), all little-endian. Big-endian machines would store the same int as 00 00 00 2A. Same value, different byte order.
Addresses are fixed-width. The thing behind them is not.
How Does the Pointer Know How Many Bytes?
The address doesn’t, 0x40 is just a number. The type of the pointer tells the compiler how wide to read. Same byte, three windows:
char *cp = 0x40 → 1 byte ┌────┐
│ 2A │
└────┘
int *ip = 0x40 → 4 bytes ┌────┬────┬────┬────┐
│ 2A │ 00 │ 00 │ 00 │
└────┴────┴────┴────┘
long *lp = 0x40 → 8 bytes ┌────┬────┬────┬────┬────┬────┬────┬────┐
│ 2A │ 00 │ 00 │ 00 │ 07 │ ?? │ ?? │ ?? │
└────┴────┴────┴────┴────┴────┴────┴────┘
Casting changes the lens. That’s how buffer overruns happen, memory didn’t change, the view did. The type also sets stride:
cp+1 = 0x41 (+1)
ip+1 = 0x44 (+4)
lp+1 = 0x48 (+8)
Stack vs Heap, What It Actually Costs
Pointers carry no size info. Stack and heap track it in opposite ways.
Stack: variable is packed into the frame, back-to-back with neighbors, no metadata. Allocate = bump a pointer. Free = pop on return.
Where’s the tracking? In the compiler, at compile time. When your function compiles, the compiler picks fixed offsets and bakes them into the generated code. Nothing is looked up at runtime.
void f() {
int a = 42;
long b = 99;
a = a + 1;
}
┌─────────────┐
│ a (4 B) │ ← fixed spot #1
├─────────────┤
│ b (8 B) │ ← fixed spot #2
└─────────────┘
Every read or write to a goes to fixed spot #1, every time. The location never moves, only the bytes at it do. a as a “thing” doesn’t exist at runtime, it’s just 4 bytes, known to the compiler and then forgotten. This is why C has no reflection: sizes, types, and names got thrown away on the way down.
Heap: sizes aren’t known until runtime, so malloc writes a hidden header just before the pointer it hands you. free(p) walks backwards by the header size, reads the chunk size, and releases it.
STACK (frame) HEAP (malloc(100))
┌──────┐ ┌──────────┬─────────────────┐
│ a │ ← 4 B, no header │ header │ your 100 B │
├──────┤ │ size=100 │ │
│ b │ ← 8 B, no header └──────────┴─────────────────┘
├──────┤ ▲ ▲
│ c │ │ │
└──────┘ actual start p (what you got)
popped on return free(p) walks back
Header layout depends on the allocator (glibc’s ptmalloc, jemalloc, tcmalloc, mimalloc, HeapAlloc). I’ll use glibc, it’s the Linux default.
Minimum chunk size plus alignment means malloc(1) is never really 1 byte. On 64-bit glibc, every chunk is at least 32 bytes, aligned to 16:
┌──────────┬───┬──────────────────┐
│ header │ 1 │ padding │
│ (8 B) │ B │ (23 B) │
└──────────┴───┴──────────────────┘
32 bytes reserved for 1 byte of user space
(Technically glibc uses two 8-byte size fields, but the first overlaps the previous chunk while it’s in use, so only 8 bytes of “dead” space sit before p.)
Why the Padding, for 1 Byte?
Four reasons, none about your byte:
1. Alignment contract. malloc must return a pointer any type can use safely. On 64-bit that means 16-byte alignment (for double, SSE vectors, long double). An odd address would trap or tank perf on the first double you wrote.
valid malloc returns: 0x40 0x50 0x60 0x70 ...
└── every address divisible by 16
2. Free-list reuse. When you free a chunk, the allocator doesn’t zero it. It parks two pointers (prev, next in a free list) inside your old user region. That body must fit 2 × 8 B. So the minimum user space is 16 B, even if you asked for 1.
while in use: after free (same bytes reused):
┌──────┬────────────┐ ┌──────┬──────┬──────┬─────┐
│header│ your 1 B │ → │header│ prev │ next │ ... │
└──────┴────────────┘ └──────┴──────┴──────┴─────┘
└── allocator bookkeeping
3. Next chunk alignment. The header of the next chunk has to land on a 16-B boundary too. So every chunk is rounded up to a multiple of 16 anyway.
4. Metadata cost. Tracking millions of tiny chunks means millions of headers and free-list entries. A fixed minimum keeps the count sane and makes size math a shift instead of a divide.
Short version: the allocator isn’t padding for your 1 byte. It’s padding so the chunk can satisfy alignment and hold free-list pointers after you let it go. Your byte is a passenger.
The Cost
32x overhead for one byte. Other allocators differ on numbers; the shape is the same. A fixed minimum plus alignment crushes tiny requests:
malloc × 1,000,000 one big buffer, carve up
┌────┐ ┌────┐ ┌────┐ ┌────┐ ┌────┐ ... ┌──────────────────────┐
│ 32 │ │ 32 │ │ 32 │ │ 32 │ │ 32 │ │ a b c d e f g h i... │
└────┘ └────┘ └────┘ └────┘ └────┘ └──────────────────────┘
↑ 32 B each for 1 B of data ↑ 0 overhead per item
slow + fat fast + tight
This is why game engines and high-performance servers ship their own allocators (pools, arenas, slabs) instead of calling malloc directly.
Two hazards from the same picture:
- Heap corruption. Write one byte past your allocation, clobber a neighboring header. The next unrelated
malloc/freecrashes nowhere near your bug. free(p + 1)is undefined. No valid header where the allocator expects one. It reads random bytes as a size and destroys itself.
Bonus: C strings carry no length either. strlen walks byte-by-byte until \0. Every serious language wraps strings in a struct with a length to avoid this.
An address says where. A type says how much. A C pointer staples them together, lose either one and you’re reading garbage.
Back to the Heap
Swap the stack allocation for malloc and the picture barely changes:
int *p = malloc(sizeof(int));
*p = 42;
p now holds a heap address instead of a stack one. Same arrows, different neighborhood. Linked lists, trees, graphs, every dynamic language’s objects, all built from this one idea.
What Goes Wrong Without a GC
Four classic failures:
alloc (forever)
│ │
▼ ▼
┌───┐ ┌───┐
│ X │ ... │ X │ ← never released
└───┘ └───┘
Dangling pointer, free the chunk, keep the pointer.
p ──┐
▼
┌───────┐ ← freed; belongs to
│ ????? │ something else now
└───────┘
Use-after-free, dereference the dangling pointer.
read *p → garbage / another object / zeros
write *p → silently corrupts whoever moved in
Double free, free(p) twice.
free(p) ✓ free(p) ✗
│ │
▼ ▼
[released] [bookkeeping breaks]
│
▼
next malloc → already in use
Every GC and every ownership checker exists to make these four bugs impossible.
How a Garbage Collector Earns Its Keep
Two big families in the wild:
┌──────────────────────────┐ ┌──────────────────────────┐
│ REFERENCE COUNTING │ │ TRACING GC │
├──────────────────────────┤ ├──────────────────────────┤
│ count hits 0, free now │ │ walk graph from roots, │
│ │ │ free unreachable │
│ │ │ │
│ CPython, Swift, Obj-C │ │ Go, Java, JS, C# │
│ │ │ │
│ weakness: cycles │ │ weakness: GC pauses │
└──────────────────────────┘ └──────────────────────────┘
Cycles (A → B → A, nothing else reaches either) are a blind spot for refcounting, so those runtimes usually bolt on a cycle detector.
Canonical tracing algorithm: mark and sweep.
roots: [r1] [r2]
│ │
▼ ▼
┌─────┐ ┌─────┐ ┌─────┐
│ A │ │ B │ │ C │ ← C is unreachable
└──┬──┘ └─────┘ └─────┘
│
▼
┌─────┐
│ D │
└─────┘
Mark from the roots (globals, stack, registers). Paint everything reachable. Sweep: anything not painted is garbage.
after sweep:
┌─────┐ ┌─────┐
│ A │ │ B │
└──┬──┘ └─────┘
│
▼
┌─────┐
│ D │
└─────┘
Real runtimes build on this. Most split the heap into young and old generations. Young gen uses copying (Cheney-style), survivors get copied into a fresh space, everything left behind freed in one shot. Old gen often uses mark-compact to avoid fragmentation. Go, HotSpot Java, V8 all mix these.
Same idea everywhere: start at roots, find reachable, reclaim the rest. Variants differ in how, not what.
Three Memory Models
Modern languages pick one of three answers to “who frees this”:
manual automatic compile-time
──────── ───────────── ─────────────
C Java Go JS C# Rust
Python Swift
fast ←─────────────────→ safe
unsafe (pauses, overhead) hard
| Model | Languages | Cost |
|---|---|---|
| Manual | C | Bugs on you. Zero runtime overhead. |
| Tracing GC | Java, Go, JS, C# | GC pauses, extra memory. |
| Reference counting (+GC) | CPython, Swift, ARC | Per-op counter bumps. Cycles need backup. |
| Ownership | Rust | Compile-time checker. Steeper to learn. |
Everything except C and Rust pays runtime cost (pauses, refcount ops, extra memory) so you don’t think about lifetimes.
You don’t have to like C to benefit from knowing C. Once you’ve held the manual end of the rope, you see exactly what the GC and the ownership checker are doing for you, and what they cost.
The Series
Five builds, one post each. Each one is something your everyday language gives you for free. Building it in C makes the bill itemized.
1. linked list : simplest structure that needs the heap
2. hash table : the dict, the map, the object
3. BST : pointers + recursion in one shape
4. CLI program : argv, file I/O, errors
5. network server : sockets, accept loops, syscalls
Each post ends with what the build cost in C and what your runtime does on your behalf to hide it.
Closing
Vibe-coding isn’t the problem. Vibe-coding without judgment is. The fastest way to build judgment is a weekend writing code that doesn’t hide the cost. That’s this series.
Next: a linked list, from struct node to free_list.
Glossary
- Stack: memory for function call frames. Automatically managed.
- Heap: memory for dynamically allocated objects. Manual in C, automatic in GC languages.
- Pointer: a variable that holds a memory address.
- malloc / free: C standard library functions for asking for and returning heap memory.
- Garbage collector (GC): runtime system that frees memory no longer reachable.
- Tracing GC: walks reachable objects from roots (mark), frees the rest (sweep).
- Reference counting: each object tracks how many pointers reference it; freed when count hits zero.
- Ownership: Rust’s compile-time approach, every value has one owner, freed when the owner goes out of scope.
- Use-after-free: reading or writing a pointer to memory already freed.
- Memory leak: heap memory allocated but never freed and no longer reachable.
Game Programmer & Co-Founder of PixelPunch LLP. I ship games, build tools, and make the web work harder.
Know more →