A ~9M parameter LLM that talks like a small fish.
This project exists to show that training your own language model is not magic.
No PhD required.
No massive GPU cluster.
One Colab notebook, 5 minutes, and you have a working LLM that you built from scratch — data generation, tokenizer, model architecture, training loop, and inference.
If you can run a notebook, you can train a language model.
It won't produce a billion-parameter model that writes essays.
But it will show you exactly how every piece works — from raw text to trained weights to generated output — so the big models stop feeling like black boxes.
GuppyLM is a tiny language model that pretends to be a fish named Guppy.
It speaks in short, lowercase sentences about water, food, light, and tank life.
It doesn't understand human abstractions like money, phones, or politics — and it's not trying to.
It's trained from scratch on 60K synthetic conversations across 60 topics, runs on a single GPU in ~5 minutes, and produces a model small enough to run in a browser.
Vanilla transformer.
No GQA, no RoPE, no SwiGLU, no early exit.
As simple as it gets.
60 topics: greetings, feelings, temperature, food, light, water, tank, noise, night, loneliness, bubbles, glass, reflection, breathing, swimming, colors, taste, plants, filter, algae, snails, scared, excited, bored, curious, happy, tired, outside, cats, rain, seasons, music, visitors, children, meaning of life, time, memory, dreams, size, future, past, name, weather, sleep, friends, jokes, fear, love, age, intelligence, health, singing, TV, and more.
Chat with Guppy (no training needed)
Downloads the pre-trained model from HuggingFace and lets you chat.
Just run all cells.
arman-bd/guppylm-60k-generic on HuggingFace.
Why single-turn only?
Multi-turn degraded at turn 3-4 due to the 128-token context window.
A fish that forgets is on-brand, but garbled output isn't.
Single-turn is reliable.
Why vanilla transformer?
GQA, SwiGLU, RoPE, and early exit add complexity that doesn't help at 9M params.
Standard attention + ReLU FFN + LayerNorm produces the same quality with simpler code.
Why synthetic data?
A fish character with consistent personality needs consistent training data.
Template composition with randomized components (30 tank objects, 17 food types, 25 activities) generates ~16K unique outputs from ~60 templates.
Related Stories
Source: This article was originally published by Hacker News
Read Full Original Article →
Comments (0)
No comments yet. Be the first to comment!
Leave a Comment