army

army is a small GPT-style transformer training stack written from scratch in C++17. It is meant to keep the whole system understandable: tokenizer, forward pass, backward pass, optimizer, sampling, gradient checks, and CPU build paths all live in this repository.

Quick start

make check
make

./army train
./army train small 500 data/curated_train.txt
./army chat

make check builds a double-precision checker and compares the hand-written backward pass against central finite differences. make builds the training and sampling binary for the host platform. Training writes army.bin; ./army chat loads that checkpoint and opens a prompt-continuation REPL.

On macOS, the default targets use Apple Accelerate for matrix multiplies. The explicit Apple Silicon aliases are also available:

make m1-check
make m1
./army_m1 train
./army_m1 chat

What is implemented

Decoder-only transformer training on CPU.
Hand-written forward and backward passes, with no autograd framework.
Byte-level BPE tokenizer trained from the input corpus.
AdamW optimizer with warmup plus cosine decay.
Sampling and prompt continuation from a local checkpoint.
Host builds for Linux/OpenMP and macOS/Accelerate.

This is a learning and systems project, not a production language model. The small models can learn local style from a corpus, but they do not have reliable world knowledge.

Architecture

Each block is:

x = x + attention(rmsnorm(x))
x = x + swiglu(rmsnorm(x))

The model includes:

RMSNorm pre-normalization.
Rotary position embeddings applied to queries and keys.
Grouped-query attention: more query heads than key/value heads.
SwiGLU feed-forward layers.
Multi-token prediction heads; generation uses head 0.
Bias-free linear layers.
One flat parameter buffer with typed views for model code, optimizer code, and gradient checking.

Current presets:

preset	dim	query heads	kv heads	layers	context	mtp heads	batch	default steps
`small`	128	4	2	4	128	4	32	3000
`big`	384	6	2	6	256	4	16	5000

The tokenizer starts from the 256 byte values and learns up to 256 BPE merges from the selected corpus, so the default vocabulary is at most 512 tokens.

Commands

./army gradcheck
./army train [small|big] [steps] [corpus]
./army chat

Examples:

./army train small 500 shakespeare.txt
./army train small 500 data/curated_train.txt
./army train big 1000 data/pretrain.txt

The default training command is equivalent to:

./army train small 3000 shakespeare.txt

Data

The default corpus is shakespeare.txt. The data/ directory also includes small local corpora and scripts for larger experiments.

Build the curated local corpus:

sh data/make_curated.sh
./army train small 500 data/curated_train.txt

Included local source files:

data/curated_general.txt - compact prose about algorithms, debugging, numerical checks, data cleaning, and systems habits.
data/textbook_transformer.txt - explanations of the model pieces used here.
data/nanoeuler_tasks.txt - project-specific instruction examples, commands, troubleshooting notes, review prompts, and continuation seeds.

Larger generated corpora are optional and ignored by git:

sh data/get_gutenberg.sh
sh data/get_web.sh
sh data/get_alpaca.sh
cat data/gutenberg.txt data/web.txt > data/pretrain.txt

data/get_web.sh expects the DuckDB CLI so it can read a FineWeb-Edu parquet slice without adding a Python dependency.

Build notes

Linux builds use g++ with OpenMP:

make
make check

macOS builds use clang++ with Accelerate.framework:

make
make check
make m1
make m1-check

make lint runs cpplint over src/ and the root compatibility shim.

Project layout

src/army.cpp              single translation-unit entry for the CPU build
src/app/                  CLI modes: train, chat, sampling, gradcheck
src/core/                 common types, runtime helpers, RNG
src/data/                 byte-level BPE and corpus batching
src/kernels/              attention, RoPE, normalization, linear, loss kernels
src/model/                config, parameter layout, activations, forward/backward
army.cpp                  compatibility shim that includes src/army.cpp
Makefile                  host, Apple Silicon, check, lint, and clean targets
data/                     local corpora and corpus-generation scripts
shakespeare.txt           default tiny training corpus

Current scope

army is intentionally CPU-first and compact. It is useful for reading, modifying, and verifying the mechanics of a transformer training loop end to end. Future work could add a native GPU backend while keeping the current simple CPU path easy to inspect.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
src		src
.gitignore		.gitignore
CPPLINT.cfg		CPPLINT.cfg
Makefile		Makefile
README.md		README.md
army.cpp		army.cpp
shakespeare.txt		shakespeare.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

army

Quick start

What is implemented

Architecture

Commands

Data

Build notes

Project layout

Current scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

army

Quick start

What is implemented

Architecture

Commands

Data

Build notes

Project layout

Current scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages