I built a tiny LLM to demystify how language models work

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.Fork it and swap the personality for your own character.

I built a tiny LLM to demystify how language models work
I built a tiny LLM to demystify how language models work Photo: Hacker News

A ~9M parameter LLM that talks like a small fish.

This project exists to show that training your own language model is not magic.

No PhD required.

No massive GPU cluster.

One Colab notebook, 5 minutes, and you have a working LLM that you built from scratch — data generation, tokenizer, model architecture, training loop, and inference.

If you can run a notebook, you can train a language model.

It won't produce a billion-parameter model that writes essays.

But it will show you exactly how every piece works — from raw text to trained weights to generated output — so the big models stop feeling like black boxes.

GuppyLM is a tiny language model that pretends to be a fish named Guppy.

It speaks in short, lowercase sentences about water, food, light, and tank life.

It doesn't understand human abstractions like money, phones, or politics — and it's not trying to.

It's trained from scratch on 60K synthetic conversations across 60 topics, runs on a single GPU in ~5 minutes, and produces a model small enough to run in a browser.

Vanilla transformer.

No GQA, no RoPE, no SwiGLU, no early exit.

As simple as it gets.

60 topics: greetings, feelings, temperature, food, light, water, tank, noise, night, loneliness, bubbles, glass, reflection, breathing, swimming, colors, taste, plants, filter, algae, snails, scared, excited, bored, curious, happy, tired, outside, cats, rain, seasons, music, visitors, children, meaning of life, time, memory, dreams, size, future, past, name, weather, sleep, friends, jokes, fear, love, age, intelligence, health, singing, TV, and more.

Chat with Guppy (no training needed)
Downloads the pre-trained model from HuggingFace and lets you chat.

Just run all cells.

arman-bd/guppylm-60k-generic on HuggingFace.

Why single-turn only?

Multi-turn degraded at turn 3-4 due to the 128-token context window.

A fish that forgets is on-brand, but garbled output isn't.

Single-turn is reliable.

Why vanilla transformer?

GQA, SwiGLU, RoPE, and early exit add complexity that doesn't help at 9M params.

Standard attention + ReLU FFN + LayerNorm produces the same quality with simpler code.

Why synthetic data?

A fish character with consistent personality needs consistent training data.

Template composition with randomized components (30 tank objects, 17 food types, 25 activities) generates ~16K unique outputs from ~60 templates.

Source: This article was originally published by Hacker News

Read Full Original Article →

Share this article

Comments (0)

No comments yet. Be the first to comment!

Leave a Comment

Maximum 2000 characters