{"id":2000,"date":"2025-10-24T11:13:11","date_gmt":"2025-10-24T11:13:11","guid":{"rendered":"https:\/\/philippeadjiman.com\/blog\/?p=2000"},"modified":"2025-11-26T12:05:19","modified_gmt":"2025-11-26T12:05:19","slug":"gpt-from-scratch-1-intro","status":"publish","type":"post","link":"https:\/\/philippeadjiman.com\/blog\/2025\/10\/24\/gpt-from-scratch-1-intro\/","title":{"rendered":"GPT From Scratch #1: Intro"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Since the (generative) AI revolution started, it seems like we\u2019re observing one breakthrough every 2 weeks on average, and sometimes it can feel overwhelming. Rather than shallowly chasing every breakthrough, I believe it is critical to first start by getting a deep understanding of what started it all: GPT.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Welcome to a new <a href=\"https:\/\/philippeadjiman.com\/blog\/gpt-from-scratch-series\/\" data-type=\"link\" data-id=\"https:\/\/philippeadjiman.com\/blog\/gpt-from-scratch-series\/\">Series of 7 posts<\/a>, where we\u2019re going to deep dive into one of the most exciting videos from Andrej Karpathy on the topic: <a href=\"https:\/\/www.youtube.com\/watch?v=kCc8FmEb1nY\">Let&#8217;s build GPT: from scratch, in code, spelled out.<\/a><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">A Note on Motivation and Inspiration<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">Before we begin this 7-part journey, I want to set the stage. This entire series is a deep dive directly inspired by and based on Andrej Karpathy&#8217;s phenomenal<strong> <\/strong>video mentioned above,<strong> <\/strong>which in my opinion is by far the best resource on the internet to understand GPT.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As a fellow PhD in AI, my own learning process has always been to &#8216;teach what I learn.&#8217; I created this series as a way to meticulously deconstruct and document every step from that video, solidifying my own understanding in the process.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While heavily guided by the video, I&#8217;ve invested significant time in structuring the content into clear topics, creating custom illustrations, and adding in-depth explanations to unpack the &#8216;why&#8217; behind the &#8216;how.&#8217; My hope is that this series serves as a valuable resource for those who learn best by reading, or who need a quick, searchable reference to complement the video format.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A final note: true understanding comes from <em>doing<\/em>. So I encourage you to use this, but then challenge yourself to e.g. reproduce Karpathy\u2019s code entirely on your own.&nbsp;<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">What is GPT?<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">GPT stands for Generative Pretrained Transformer.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Generative<\/strong>: because it is a model that generates new content, most commonly words or more generally tokens. You give it some initial input (a prompt) and it generates what is most likely to follow from it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Pretrained<\/strong>: because of the foundational training process the model undergoes before it is used.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Transformer<\/strong>: because it is based on the Transformer neural net architecture, introduced in the now famous <a href=\"https:\/\/arxiv.org\/abs\/1706.03762\">attention is all you need<\/a> paper.&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Once you have such a model, building a chat bot like Gemini or ChatGPT requires a few more important steps, but GPT is the foundational part that enables it and our primary focus in this series.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Starting from the end<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">In some tense TV shows or movies, it sometimes starts by showing the final scene, without context. It looks great, but we have no clue about what is going on. And then the show starts all from the beginning, walking us slowly but surely through that final scene again, and that time, we understand it all.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is what we\u2019ll do in that series. We\u2019ll just look at the end result, the heart of the beautiful and minimalist implementation of GPT by Karpathy. The core components basically looks like this (don\u2019t freak out just yet, it is expected if you don\u2019t have clue yet of what this code is doing):<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The init and forward pass of the model. The key magical component in that code is the \u201c<strong>block<\/strong>\u201d variable, which includes the implementation\u00a0of the transformer (and self attention).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"694\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png?resize=1024%2C694&#038;ssl=1\" alt=\"\" class=\"wp-image-2017\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png?resize=1024%2C694&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png?resize=300%2C203&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png?resize=768%2C521&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png?resize=1536%2C1042&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-8.png?w=1600&amp;ssl=1 1600w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">A self-attention <strong>head<\/strong> (which is the core component of a <strong>Transformer <\/strong>that we\u2019ll discuss later) :<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"624\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-6.png?resize=1024%2C624&#038;ssl=1\" alt=\"\" class=\"wp-image-2015\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-6.png?resize=1024%2C624&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-6.png?resize=300%2C183&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-6.png?resize=768%2C468&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-6.png?w=1286&amp;ssl=1 1286w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">And then, adding some key ingredient of the Transformer architecture, forming a <strong>Block<\/strong> (that is initialized in the first snippet).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"881\" height=\"1024\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-7.png?resize=881%2C1024&#038;ssl=1\" alt=\"\" class=\"wp-image-2016\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-7.png?resize=881%2C1024&amp;ssl=1 881w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-7.png?resize=258%2C300&amp;ssl=1 258w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-7.png?resize=768%2C892&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-7.png?w=1198&amp;ssl=1 1198w\" sizes=\"auto, (max-width: 881px) 100vw, 881px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">From there, to generate text, you just do:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"496\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-5.png?resize=1024%2C496&#038;ssl=1\" alt=\"\" class=\"wp-image-2014\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-5.png?resize=1024%2C496&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-5.png?resize=300%2C145&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-5.png?resize=768%2C372&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/09\/image-5.png?w=1180&amp;ssl=1 1180w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">That\u2019s it.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Not even 100 lines of code.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Knowing that this is what enabled the generative AI revolution, you\u2019re probably staring at those like:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"236\" height=\"200\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/10\/200.gif?resize=236%2C200&#038;ssl=1\" alt=\"Dog Looking GIFs - Find &amp; Share on GIPHY\" class=\"wp-image-2021\" style=\"width:375px;height:auto\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">But no worries, after this series of posts, this code will hopefully look much clearer and intuitive to you.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The journey to understand it all<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To follow along, you\u2019ll just need a basic understanding of python, also some basics of tensors and PyTorch neural networks, and from time to time, it will be required to review one of the posts from my <a href=\"http:\/\/www.philippeadjiman.com\/blog\/deep-learning-gymnastic\/\">Deep Learning Gymnastic<\/a> Series (i\u2019ll point them out each time it is relevant).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is the plan of the posts we\u2019ll study together in that series:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\">Part 1: Intro (this post)<\/li>\n\n\n\n<li class=\"\"><a href=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/06\/gpt-from-scratch-2-the-training-set\/\" data-type=\"link\" data-id=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/06\/gpt-from-scratch-2-the-training-set\/\">Part 2: The Training set<\/a><\/li>\n\n\n\n<li class=\"\"><a href=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/10\/gpt-from-scratch-3-the-bigram-model\/\" data-type=\"post\" data-id=\"2063\">Part 3: The Bigram model<\/a> <\/li>\n\n\n\n<li class=\"\"><a href=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/14\/gpt-from-scratch-4-the-mathematical-trick-behind-self-attention\/\">Part 4: The Mathematical Trick behind Self Attention<\/a> <\/li>\n\n\n\n<li class=\"\"><a href=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/15\/gpt-from-scratch-5-positional-encodings\/\">Part 5: Positional Encodings<\/a><\/li>\n\n\n\n<li class=\"\"><a href=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/19\/gpt-from-scratch-6-coding-self-attention\/\">Part 6: Coding Self-Attention<\/a><\/li>\n\n\n\n<li class=\"\"><a href=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/22\/gpt-from-scratch-7-building-a-gpt\/\" data-type=\"link\" data-id=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/22\/gpt-from-scratch-7-building-a-gpt\/\">Part 7: Building a GPT<\/a> <\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">From here, let\u2019s continue to the next part, the <a href=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/06\/gpt-from-scratch-2-the-training-set\/\" data-type=\"link\" data-id=\"https:\/\/philippeadjiman.com\/blog\/2025\/11\/06\/gpt-from-scratch-2-the-training-set\/\">training set<\/a>.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>You probably use AI, but do you understand it? Get ready to dive into the internals of what started the (gen) AI revolution: GPT. <\/p>\n","protected":false},"author":1,"featured_media":2029,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[18,46,14],"tags":[48,47,49],"class_list":["post-2000","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-deep-learning","category-gpt","category-pytorch","tag-deep-learning","tag-gpt","tag-pytorch"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2025\/10\/gpt-1.png?fit=510%2C285&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/2000","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/comments?post=2000"}],"version-history":[{"count":18,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/2000\/revisions"}],"predecessor-version":[{"id":2257,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/2000\/revisions\/2257"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media\/2029"}],"wp:attachment":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media?parent=2000"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/categories?post=2000"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/tags?post=2000"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}