Philippe Adjiman's blog

About Me | Post Series

  • November 22, 2025

    GPT From Scratch #7: Building a GPT

    Self Attention is the heart of Transformers, the T of GPT. But there are few additional critical parts to the transformer architecture that actually made it shine.


    GPT From Scratch #7: Building a GPT

  • November 19, 2025

    GPT From Scratch #6: Coding Self Attention

    This is where we get to understand the ~20 most important and impactful lines of code which started the gen AI revolution.


    GPT From Scratch #6: Coding Self Attention

  • November 15, 2025

    GPT From Scratch #5: Positional Encodings

    In this post, we’ll show how to add to the neural net the notion of position of the tokens. Simple but powerful.


    GPT From Scratch #5: Positional Encodings

  • November 14, 2025

    GPT From Scratch #4: The Mathematical Trick Behind Self Attention

    One simple mathematical trick. The most cleaver matrix multiplication of the gen AI revolution. What enabled ultra fast self attention.


    GPT From Scratch #4: The Mathematical Trick Behind Self Attention

  • November 10, 2025

    GPT From Scratch #3: The Bigram Model

    The simplest model we can use to predict the next character is a Bigram Model. But if implemented as a neural net, the building blocks will stay the same up to GPT.


    GPT From Scratch #3: The Bigram Model

  • November 6, 2025

    GPT From Scratch #2: The Training Set

    Don’t understate the importance of building a proper training set. It is a critical part of the process, and in GPT’s case, a beautifully cleaver one as well.


    GPT From Scratch #2: The Training Set

  • October 24, 2025

    GPT From Scratch #1: Intro

    You probably use AI, but do you understand it? Get ready to dive into the internals of what started the (gen) AI revolution: GPT.


    GPT From Scratch #1: Intro

  • November 28, 2024

    Decoding Transformers: The Neural Nets Behind LLMs and More

    Dive into the magic of self-attention and learn why Transformers became the backbone of every cutting-edge genAI model.


    Decoding Transformers: The Neural Nets Behind LLMs and More

  • March 9, 2024

    Deep Learning Gymnastics #4: Master Your (LLM) Cross Entropy

    Use all the gymnastics tricks we’ve learned in order to master (LLM) cross-entropy in PyTorch and TensorFlow.


    Deep Learning Gymnastics #4: Master Your (LLM) Cross Entropy

  • February 3, 2024

    Deep Learning Gymnastics #3: Tensor (re)Shaping

    Your tensors aren’t the right shape? Learn how to reshape, squeeze, and stack them like a deep learning gymnast.


    Deep Learning Gymnastics #3: Tensor (re)Shaping

  • December 23, 2023

    Deep Learning Gymnastics #2: Tensor Indexing

    Learn how smart indexing lets you build batches, embeddings, and masked ops efficiently in modern DL frameworks.


    Deep Learning Gymnastics #2: Tensor Indexing

  • July 16, 2023

    Deep Learning Gymnastics #1: Tensor Broadcasting

    Master broadcasting like a pro and learn how a single trick can make your deep learning code faster, cleaner, and more elegant.


    Deep Learning Gymnastics #1: Tensor Broadcasting

  • November 3, 2018

    Visualising SGD with Momentum, Adam and Learning Rate Annealing

    Watch optimizers battle it out in a visual showdown—Momentum vs Adam vs LR schedules, explained with intuition and flair.


    Visualising SGD with Momentum, Adam and Learning Rate Annealing

  • April 3, 2018

    Deep Dive Into Logistic Regression: Part 3

    In this third and last post of this series, we present the use of a very effective and powerful library to build logistic regression models (among others) in practice: Vowpal Wabbit.


    Deep Dive Into Logistic Regression: Part 3

  • February 26, 2018

    Deep Dive Into Logistic Regression: Part 2

    Want to know how to implement Stochastic Gradient Descent for Logistic regression able to learn millions of parameters using the hashing trick and per-coordinate adaptive learning rate with a tiny memory footprint? This post is for you.


    Deep Dive Into Logistic Regression: Part 2

  • December 9, 2017

    Deep Dive Into Logistic Regression: Part 1

    Learn the fundamental theory behind logistic regression.


    Deep Dive Into Logistic Regression: Part 1

  • September 12, 2013

    A Data Science Exploration From the Titanic in R

    Step aboard the Titanic dataset: Explore, feature-engineer, and model your way to survival predictions with style.


    A Data Science Exploration From the Titanic in R

  • December 30, 2010

    How To Easily Build And Observe TF-IDF Weight Vectors With Lucene And Mahout

    Want to peek inside TF-IDF weights? Here’s a quick way to build and analyze them without the headache.


    How To Easily Build And Observe TF-IDF Weight Vectors With Lucene And Mahout

  • February 6, 2010

    What Are The 10 Most Cited Websites On Twitter When Tweeting About Hot Trends?

    Scrapes and analyzes tweets around Google Hot Trends to see which domains dominate the conversation.


    What Are The 10 Most Cited Websites On Twitter When Tweeting About Hot Trends?

  • January 14, 2010

    Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner

    Explains when Hadoop Combiners help (or hurt) performance and correctness, with code‑level guidance.


    Hadoop Tutorial Series, Issue #4: To Use Or Not To Use A Combiner

  • January 7, 2010

    Hadoop Tutorial Series, Issue #3: Counters In Action

    Shows how to instrument MapReduce jobs with Hadoop Counters to track custom metrics during large‑scale processing.


    Hadoop Tutorial Series, Issue #3: Counters In Action

  • January 6, 2010

    How To Build A Relevant Real Time Search Engine Prototype In Few Hundreds Lines Of Code

    A hands‑on blueprint for a lightweight, low‑latency (toy) search engine that ingests and surfaces fresh content fast.


    How To Build A Relevant Real Time Search Engine Prototype In Few Hundreds Lines Of Code

  • December 20, 2009

    Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning

    Teaches key partitioning patterns (e.g., partial sorts to specific reducers) to control data flow in MapReduce jobs.


    Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning

  • December 7, 2009

    Hadoop Tutorial Series, Issue #1: Setting Up Your MapReduce Learning Playground

    Step‑by‑step setup of a Cloudera VM + Maven project so you can quickly experiment with Hadoop wordcount and beyond.


    Hadoop Tutorial Series, Issue #1: Setting Up Your MapReduce Learning Playground

  • November 11, 2009

    Flexible Collaborative Filtering In JAVA With Mahout Taste

    Rapid prototyping approach to a recommendation engine using Mahout Taste’s pluggable similarity and scoring components.


    Flexible Collaborative Filtering In JAVA With Mahout Taste

  • November 2, 2009

    Writing A Token N-Grams Analyzer In Few Lines Of Code Using Lucene

    Leverages Lucene analyzers to emit token n‑grams for downstream text mining or search tasks with minimal Java.


    Writing A Token N-Grams Analyzer In Few Lines Of Code Using Lucene

Philippe Adjiman's blog

Proudly powered by WordPress