Welcome to a new Series of 7 posts, where we’re going to deep dive into one of the most exciting videos from Andrej Karpathy on the topic: Let’s build GPT: from scratch, in code, spelled out.
- GPT From Scratch #1: IntroYou probably use AI, but do you understand it? Get ready to dive into the internals of what started the (gen) AI revolution: GPT.
- GPT From Scratch #2: The Training SetDon’t understate the importance of building a proper training set. It is a critical part of the process, and in GPT’s case, a beautifully cleaver one as well.
- GPT From Scratch #3: The Bigram ModelThe simplest model we can use to predict the next character is a Bigram Model. But if implemented as a neural net, the building blocks will stay the same up to GPT.
- GPT From Scratch #4: The Mathematical Trick Behind Self AttentionOne simple mathematical trick. The most cleaver matrix multiplication of the gen AI revolution. What enabled ultra fast self attention.
- GPT From Scratch #5: Positional EncodingsIn this post, we’ll show how to add to the neural net the notion of position of the tokens. Simple but powerful.
- GPT From Scratch #6: Coding Self AttentionThis is where we get to understand the ~20 most important and impactful lines of code which started the gen AI revolution.
- GPT From Scratch #7: Building a GPTSelf Attention is the heart of Transformers, the T of GPT. But there are few additional critical parts to the transformer architecture that actually made it shine.
If you don’t want to miss other posts and series, just subscribe below.







