{"id":1522,"date":"2023-12-23T18:00:10","date_gmt":"2023-12-23T18:00:10","guid":{"rendered":"https:\/\/www.philippeadjiman.com\/blog\/?p=1522"},"modified":"2025-07-18T13:13:44","modified_gmt":"2025-07-18T13:13:44","slug":"deep-learning-gymnastics-tensor-indexing","status":"publish","type":"post","link":"https:\/\/philippeadjiman.com\/blog\/2023\/12\/23\/deep-learning-gymnastics-tensor-indexing\/","title":{"rendered":"Deep Learning Gymnastics #2: Tensor Indexing"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Welcome to the second episode of the <a href=\"https:\/\/www.philippeadjiman.com\/blog\/deep-learning-gymnastic\/\">Deep Learning Gymnastics<\/a> series. Hope you&#8217;re in good shape. Get warmed up. We start.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Today, we&#8217;ll talk about a simple yet important and powerful aspect of tensor manipulations: tensor indexing.  <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Batches and embeddings motivating example<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">At the heart of any modern deep learning model, you&#8217;ll most often deal with batches and embeddings. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Batches?<\/strong> Below is a toy example of what a batch from a training set could look like:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-5.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"467\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-5-1024x467.png?resize=1024%2C467&#038;ssl=1\" alt=\"\" class=\"wp-image-1542\" style=\"width:530px;height:242px\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-5.png?resize=1024%2C467&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-5.png?resize=300%2C137&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-5.png?resize=768%2C350&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-5.png?w=1088&amp;ssl=1 1088w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The numbers represent an index in a vocabulary of size N, representing any kind of entity. This could be letters or words in a language model, a movie in a recommender system, a segment on a map in an ETA model, or ads in an ad Network. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the example, let&#8217;s assume those are letters (indexed between 0 and 26 for all letters + a special end character) as in the great Andrej Karpathy &#8220;<a href=\"https:\/\/www.youtube.com\/watch?v=TCH_1BHY58I\">makemore<\/a>&#8221; series. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Embeddings?<\/strong> For each element of that vocabulary, you&#8217;ve learned a representation of its (latent) characteristics, represented by a vector of size k. This vector is often called embeddings.  Continuing with our example above, let&#8217;s consider an embedding of size 4 for each element (in our case, english letters) of the vocabulary, i.e. a tensor of dimension (27, 4)<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-6.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"411\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-6-1024x411.png?resize=1024%2C411&#038;ssl=1\" alt=\"\" class=\"wp-image-1543\" style=\"width:543px;height:218px\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-6.png?resize=1024%2C411&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-6.png?resize=300%2C120&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-6.png?resize=768%2C308&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-6.png?w=1400&amp;ssl=1 1400w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Here is the gymnastic exercise<\/strong>: you have a toy batch containing 8 examples of size 3, where each number in the example are taken from vocabulary of size 27 . You also have an embedding matrix of dimension (27,4), where each raw is an embedding vector of size 4, for all of the 27 element of the vocabulary. For each element of the batch, you need to fetch its embedding vector, to end up with a batch which is a tensor of dimension (8,3,4) . This is illustrated below<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-7.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"534\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-7-1024x534.png?resize=1024%2C534&#038;ssl=1\" alt=\"\" class=\"wp-image-1544\" style=\"width:538px;height:281px\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-7.png?resize=1024%2C534&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-7.png?resize=300%2C156&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-7.png?resize=768%2C401&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-7.png?w=1250&amp;ssl=1 1250w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Tensor indexing, the PyTorch way<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s first generate the two input tensors (the same as the two inputs on the left of the picture above ) :<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import torch\ntorch.manual_seed(18)\n# Create a random batch of shape (8,3) \n# with indexes between 0 and 26\nrandom_tensor = torch.randint(low=0, high=26, size=(8,3))\n# Create a random embedding matrix of shape (27,4): \n# one vector for each of the 27 indexes elements\nembeddings = torch.randn(size=(27, 4))<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">And now, let&#8217;s solve the gymnastic exercise. Take a deep breath, prepare the movement, and here you go:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>embedded_batch = embeddings[random_tensor]<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Yes, that&#8217;s right. PyTorch allows to pass a full tensor as the index. And it works like magic.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can check the shape of the result, and observe it is indeed (8,3,4), as expected (see the picture above). Indeed, (8,3) is the shape of the initial batch, and for each element of it, we get the proper embedding vector of shape (1,4). <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s validate that the first element of the result (embedded_batch[0,0] ) corresponds to the embedding vector of the index of the first element of the batch. This corresponds to this part of the picture:<\/p>\n\n\n\n<p class=\"has-text-align-center wp-block-paragraph\"><img decoding=\"async\" loading=\"lazy\" width=\"597px;\" height=\"201px;\" src=\"https:\/\/lh7-us.googleusercontent.com\/C5Q2sSLWbfj_Grh6UcEBHblJIwsC1LuqXTVnu_Ba_hWH2QThs4jUBbJdkMArUiIwn07RcK1i65F5bCZHte_ZWP6rOQ8T5_XmTC_cC9SRa5Ox1NVJis5D1dJ6BSKCyl91dyHZ8N20G70KpvsXQgGM8UENtg=s2048\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And sure enough, it worked \ud83c\udf89 :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"786\" height=\"146\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?resize=786%2C146&#038;ssl=1\" alt=\"\" class=\"wp-image-1556\" style=\"width:392px;height:73px\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?w=786&amp;ssl=1 786w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?resize=300%2C56&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/11\/image-9.png?resize=768%2C143&amp;ssl=1 768w\" sizes=\"auto, (max-width: 786px) 100vw, 786px\" \/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What about TensorFlow?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In TensorFlow, it is of course possible to achieve the same result, but this is done a bit differently.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The tf.gather function<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Instead of injecting the batch directly as a (tensor) index in the embedding matrix, in TensorFlow we have to use a very powerful function:  <a href=\"https:\/\/www.tensorflow.org\/api_docs\/python\/tf\/gather\">tf.gather<\/a> .<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can read the details of the documentation, but essentially, the equivalent of the following PyTorch indexing: <\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism off-numbers lang-python\" data-lang=\"Python\"><code>embedded_batch = embeddings[random_tensor] <\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">in TensorFlow would be: <\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism off-numbers lang-python\" data-lang=\"Python\"><code>embedded_batch = tf.gather(embeddings,random_tensor)<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">And that&#8217;s all.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Full equivalent TensorFlow code below :<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import tensorflow as tf\ntf.random.set_seed(18)\n# Create a random batch of shape (8,3) with indexes between 0 and 26\nrandom_tensor = tf.random.uniform(shape=(8,3), minval=0, maxval=26, dtype=tf.int32)\n# Create a random embedding matrix of shape (27,4): one vector for each of the 27 indexes elements\nembeddings = tf.random.uniform((27,4), dtype=tf.float32)\n# Solving the gymnastic exercise: creating an embedded batch with the tf.gather function\nembedded_batch = tf.gather(embeddings,random_tensor)\n# Validating the results\nprint(random_tensor)\nprint(embeddings)\nprint(embedded_batch.shape) # (8,3,4) which is the expected dimension\nprint(embedded_batch[0,0])<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Hope you enjoyed the gymnastic lesson. Take some rest. Until the next one \ud83e\udd38&nbsp;. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"\"><a href=\"https:\/\/static.aminer.cn\/upload\/pdf\/1749\/1416\/1187\/53e9a636b7602d9702f66092_0.pdf\">Matrix Factorization Techniques For Recommender Systems<\/a> by Yehuda Koren and Al. Old yet still a highly recommended read to get introduced to the notion of latent factors (a.k.a embeddings)<\/li>\n\n\n\n<li class=\"\"><a href=\"https:\/\/www.youtube.com\/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ\">Andrej Karpathy youtube channel<\/a> . Inspired the example above, as many of the gymnastic exercices presented in that series.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how smart indexing lets you build batches, embeddings, and masked ops efficiently in modern DL frameworks.<\/p>\n","protected":false},"author":1,"featured_media":1759,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[18,41,14,16],"tags":[],"class_list":["post-1522","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-deep-learning","category-deep-learning-gymnastics","category-pytorch","category-tensorflow"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2023\/12\/image-7.png?fit=1250%2C652&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1522","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/comments?post=1522"}],"version-history":[{"count":4,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1522\/revisions"}],"predecessor-version":[{"id":1955,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1522\/revisions\/1955"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media\/1759"}],"wp:attachment":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media?parent=1522"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/categories?post=1522"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/tags?post=1522"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}