The Birth - Learn - LLM Bento

Creating an LLM is one of the most resource-intensive endeavors in human history. It requires more computing power than sending humans to the moon, training data representing a significant fraction of human knowledge, and teams of hundreds of researchers and engineers.

Here's how these remarkable systems come to exist.

The Journey from Data to AI

Phase 1: Data Collection

Gather trillions of tokens from books, websites, code, scientific papers, and more. Quality and diversity of this data shapes everything that follows.
Phase 2: Pre-training

Train the base model to predict next tokens. This takes months on thousands of GPUs and costs tens of millions of dollars.
Phase 3: Fine-tuning

Train on high-quality examples of helpful, harmless conversations. Transforms the raw prediction engine into a useful assistant.
Phase 4: RLHF

Human raters compare outputs, and the model learns from their preferences. This is what makes AI assistants actually helpful and safe.

Phase 1: The Data

Everything begins with training data. Modern LLMs are trained on a substantial fraction of all text that exists on the internet, plus digitized books, academic papers, and code repositories.

Scale of Training Data

~15 trillion

tokens (GPT-4 estimate)

~300 billion

words equivalent

~1.5 million

books worth of text

10+ years

to read at human speed

The composition matters as much as the size:

Web crawls (filtered for quality)
Digitized books and publications
Code repositories (GitHub, etc.)
Scientific papers and databases
Forums, discussions, Q&A sites

Phase 2: Pre-training

Pre-training is where the model learns to predict the next token. The process is conceptually simple: show the model text, have it predict what comes next, and adjust its parameters to be slightly better at that prediction.

Repeat this trillions of times.

Pre-training Requirements

Computing power Thousands of GPUs for months

Estimated cost $50-100+ million

Training time 3-6 months typically

Energy usage Equivalent to small town

After pre-training, you have a "base model"—something that can complete text fluently but isn't yet useful as an assistant. It might continue your prompt but won't engage helpfully in conversation.

Phase 3: Fine-tuning

Fine-tuning teaches the base model how to be a helpful assistant. This involves training on carefully curated examples of good conversations.

Example Training Pair

User: What causes rainbows?

Assistant: Rainbows form when sunlight passes through water droplets in the air. The light bends and separates into different colors (red, orange, yellow, green, blue, indigo, violet) because each color bends at a slightly different angle. You typically see rainbows when the sun is behind you and there's rain in front of you.

These examples demonstrate the desired behavior: being helpful, accurate, clear, and appropriately cautious. The model learns to mimic these patterns.

Phase 4: RLHF

Reinforcement Learning from Human Feedback is often the secret sauce that separates impressive demos from truly useful AI assistants.

How RLHF Works

Generate: Model produces several different answers to the same prompt
Compare: Trained raters rank responses from best to worst
Learn: A separate reward model learns to predict human preferences
Optimize: Main model is trained to produce responses the reward model rates highly

The Staggering Scale

Creating a frontier LLM is among the most expensive and resource-intensive projects humans have ever undertaken:

Financial Cost

• Pre-training: $50-100M+
• Research & iteration: Similar
• Infrastructure: Billions in GPUs

Energy

• Training: ~10 GWh
• Equivalent to ~1,000 US homes/year
• Major environmental consideration

Human Effort

• Hundreds of researchers
• Thousands of data labelers
• Years of accumulated work

Time

• Research: 1-2 years
• Data preparation: Ongoing
• Training run: 3-6 months

Key Takeaways

LLM creation has four main phases: data collection, pre-training, fine-tuning, and RLHF
Training data quality and diversity fundamentally shape model capabilities
Pre-training teaches language patterns; fine-tuning and RLHF shape behavior
The scale is staggering: billions of dollars, massive energy use, years of work
Only a few organizations can currently create frontier models

Related Concepts

Pre-training Fine-tuning RLHF Parameters Training Data

The Journey from Data to AI

Phase 1: Data Collection

Phase 2: Pre-training

Phase 3: Fine-tuning

Phase 4: RLHF

Phase 1: The Data

Scale of Training Data

Phase 2: Pre-training

Pre-training Requirements

Phase 3: Fine-tuning

Example Training Pair

Phase 4: RLHF

How RLHF Works

The Staggering Scale

Financial Cost

Energy

Human Effort

Time

Key Takeaways

Related Concepts