672 bytes

March 2026

Castling costs 40 bytes. En passant, another 30. Check detection, properly done, knowing whether the king is under attack and filtering moves to only those that resolve it, at least 80. The entire program is 672 bytes of Z80 machine code. At that budget you don't decide what to build. You decide what to cut.

This is a chess engine for the Sinclair ZX81. 1K of RAM. A Z80A processor at 3.25 MHz. The whole program is two lines of BASIC:

1 REM ... (672 bytes of machine code)
2 RAND USR 16514

RAND USR 16514 jumps into the REM statement and executes it as Z80 machine code. The chess board, the move logic, the display routine, the computer's thinking engine, all of it packed into what the BASIC interpreter treats as a comment. The board itself is 64 bytes inside that REM, doing double duty as data and executable address space. In 1K, nothing serves a single purpose.

I wrote this in Durban, South Africa, between 1982 and 1983. Twelve when I started. Fourteen when it worked. The original is lost: a C15 cassette labelled "CHESS" in biro, tape degraded beyond loading, the exercise book with the hex dump gone somewhere between Durban and Johannesburg. What's on GitHub now is a reconstruction by an adult, in the spirit of the teenager. Some tricks are ones I genuinely used at fourteen. Others benefit from four decades of hindsight. It would be dishonest to pretend otherwise. But the constraints haven't changed. 1K of RAM. The Z80A instruction set. 672 bytes.

No assembler. No compiler. Nothing that would run in 1K, and nothing I could have afforded anyway. The process was: write Z80 mnemonics on paper, look up each opcode in the back of Rodnay Zaks' Programming the Z80 (624 pages, third revised edition, a Christmas present from my dad alongside the ZX81 itself), write down the hex bytes, calculate jump offsets by hand, then POKE each byte into the machine from the keyboard. One wrong byte and the screen filled with garbage. Back to the K cursor. Anything not saved to tape, gone.

No school computer club. No friends with ZX81s. Durban in 1982, not exactly a hub. Just the machine, the Zaks book, and whatever copies of Sinclair User made it to South Africa by sea mail.

I filled an exercise book with the design: board representations, bit patterns for pieces (bits 0-2 for type, bit 3 for colour), direction offset tables. Memory budgets scribbled in the margins, bytes spent, bytes remaining, what gets cut if the AI engine needs another ten. I didn't have a name for what I was doing. I thought it was just what you did when there wasn't enough room.

What 672 bytes can't afford

Castling is 40 bytes of special-case move logic: check that neither the king nor the rook has moved, check that the squares between them are empty, check that the king doesn't pass through check. Forty bytes is more than the entire game loop. En passant needs a flag tracking whether the last move was a two-square pawn advance, plus the capture logic itself. Thirty bytes, nearly the weight of the combined lookup tables for piece characters, material values, and direction vectors. These weren't features I couldn't implement. They were features the budget couldn't carry.

Check detection is the interesting cut. Full implementation means generating all opponent moves to see if any attack the king, then filtering the current player's legal moves to only those that resolve the threat. At least 80 bytes. Probably more. That's 12% of the program for something that doesn't improve the quality of most moves played.

The alternative: don't detect check at all. Let the king be captured like any other piece. Move into check and the computer takes your king next turn. The computer moves into check and you take its king. The game doesn't distinguish between checkmate and a blunder.

It doesn't need to. The king is gone either way.

The system can't afford to prevent illegal states, so instead it absorbs the consequences of those states cheaply. A captured king ends the game, same as checkmate. The failure mode exists. The failure is contained. Eighty bytes freed up for things that affect every move, not just the positions where check matters.

Play it and you notice. Nothing stops you moving your king next to the opponent's rook. The engine won't object. Next move, it takes your king, and that's the game. You learn not to do that. The discipline shifts from the system to the player, which costs the programmer nothing.

One loop, three pieces

Bishop, Rook and Queen all move in straight lines, just different ones. Bishops go diagonally. Rooks go orthogonally. Queens go both. Separate move-generation routines for each piece would need maybe 90 bytes. One shared loop with a bitmask controlling which directions are active needs 60.

Bishop: $A5, diagonals only. Rook: $5A, orthogonal only. Queen: $FF, all eight directions. One loop. Three piece types. Thirty bytes saved, 4.5% of the total program.

The trick came to me on the bus home from school. I'd been stuck for days on how the AI engine could generate moves for three different sliding pieces within its 250-byte budget. Separate routines wouldn't fit. On the bus, staring out the window somewhere between Durban North and Glenwood, the bitmask idea clicked. Each bit in a byte maps to a direction. Set the bits you want, mask out the ones you don't. One loop handles everything. I remember the moment because I nearly shouted on the bus.

Pawn promotion is twelve bytes. Pawn reaches the far rank, becomes a Queen. No choice of piece. In grandmaster play, underpromotion to a knight occasionally matters. In a 1-ply engine with no check detection and no look-ahead, it doesn't. Twelve bytes for the thing that almost always happens. Zero for the thing that almost never does.

The AI scans all 64 squares for Black pieces, generates moves from direction tables, scores captures by the value of the piece taken, and plays the highest-scoring move. No look-ahead. A slight queenside bias because it scans a-h and keeps the first equally-scored move it finds. It will take your queen before your pawn, but it can't see your reply.

Board display came first. Weeks of work. Had the ranks upside down for an embarrassing period. Player input next. The thinking routine was the mountain: 250 bytes of move generation and evaluation, debugged one POKE at a time. The AI would move a piece to a square that didn't exist, or lock up mid-search, or generate a move and forget to execute it. Each bug was a wrong byte somewhere in the 250, and finding it meant rereading the hex dump line by line until the numbers stopped matching the mnemonics.

When it finally worked I saved to tape twice, on different cassettes, before running it. Board appeared. I played e2-e4. Two seconds of thought. d7-d5. The Scandinavian Defence, which lets the queen recapture if the pawn is taken. Not sophisticated. Not random either.

My dad played it that evening. He didn't really understand what he was looking at: inverse video characters on a wobbly TV, channel 36. When the computer took his bishop, he said "It's actually quite good, isn't it?" He'd taught me chess. Now a program I'd written, running on a machine he'd given me, was taking his pieces. I was fourteen. I didn't know what to do with that moment so I just watched the next move.

The constraint that scales

The chess engine can't detect check, so it absorbs that failure cheaply. The king gets captured. The game ends. The budget that would have gone to prevention goes to features that affect every position instead. Pragmatism, not architecture. I had 672 bytes, not principles.

I've worked at a different scale since, on systems where mistakes are denominated in money and regulators want to know why. When a portfolio calculation produces the wrong number, nobody asks whether the logic was hand-coded or AI-generated. They ask how fast you noticed, how fast you fixed it, and whether the client was affected. The byte budget became a risk budget. The arithmetic didn't change. You can't verify everything. You can't prevent every bad state. You decide what the system can afford to enforce and design everything else for cheap containment.

The AI deployment conversations I sit in now circle the same question the chess engine answered. Not "how capable is the model?" Capability arrived faster than anyone expected. The question is what happens when the model is wrong, and whether the system around it absorbs that cheaply enough to be worth the risk.

The instinct is to reach for accuracy. Better prompting. Better fine-tuning. Better evaluation benchmarks. These matter, the way castling and en passant matter in chess: they improve the quality of play in specific positions. But they're features at a cost. Castling was 40 bytes I couldn't spend. Perfect accuracy is an engineering budget most organisations can't carry either. The teams deploying AI into production, rather than running pilots that never graduate, are the ones that invested in containment instead. Their systems catch bad outputs the way the chess engine catches an exposed king. Not by preventing the state. By making the consequences cheap.

672 bytes. Every one accounted for. You can read the source or play it in a browser. It won't beat a competent player. It was never going to. The constraints that shaped it are the same ones I think about now, in different units, at different stakes. What can we afford to check? What do we let through and catch downstream? Where does the budget go?