{"id":1622,"date":"2024-02-03T16:47:39","date_gmt":"2024-02-03T16:47:39","guid":{"rendered":"https:\/\/www.philippeadjiman.com\/blog\/?p=1622"},"modified":"2025-07-18T07:50:19","modified_gmt":"2025-07-18T07:50:19","slug":"deep-learning-gymnastics-tensor-reshaping","status":"publish","type":"post","link":"https:\/\/philippeadjiman.com\/blog\/2024\/02\/03\/deep-learning-gymnastics-tensor-reshaping\/","title":{"rendered":"Deep Learning Gymnastics #3: Tensor (re)Shaping"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Welcome to the 3rd episode of the <a href=\"https:\/\/www.philippeadjiman.com\/blog\/deep-learning-gymnastic\/\">Deep Learning Gymnastics<\/a> series. By now you should already start to be in shape. That&#8217;s good, because today we&#8217;ll talk about how to shape (or more precisely reshape) tensors, a basic yet critical operation that is needed in any advanced enough deep learning model implementation. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To best understand this post, it is highly recommended to read the <a href=\"https:\/\/www.philippeadjiman.com\/blog\/2023\/12\/23\/deep-learning-gymnastics-tensor-indexing\/\">previous gymnastic exercise around tensor indexing<\/a> as we&#8217;ll build on top of it.  <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">MLP Motivating example<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To illustrate the power of tensor (re-)shaping, we&#8217;ll continue to get inspired from Andrej Karpathy&#8217;s makemore series, where he <a href=\"https:\/\/www.youtube.com\/watch?v=TCH_1BHY58I&amp;t\">implements<\/a> from scratch the famous paper &#8220;<a href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.5555\/944919.944966\">A neural probabilistic language model<\/a>&#8221; . As Andrej says, it is not the first paper who proposed a neural network approach to predict the next token in a sequence, but it is one that is very often cited and is a really nice write-up. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The gymnastic exercise will consist into implementing the bottom part of the figure below, which describes the architecture of the neural network (or Multi Layer Perceptron, MLP for short) defined in the paper. First we&#8217;ll explain a bit the diagram so the goal of the exercise will be crystal clear.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-1.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-1-1024x766.png?resize=587%2C439&#038;ssl=1\" alt=\"\" class=\"wp-image-1626\" width=\"587\" height=\"439\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-1.png?resize=1024%2C766&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-1.png?resize=300%2C224&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-1.png?resize=768%2C574&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-1.png?resize=1536%2C1149&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-1.png?resize=2048%2C1531&amp;ssl=1 2048w\" sizes=\"auto, (max-width: 587px) 100vw, 587px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s assume that the 3 green dots at the bottom are the last three characters of a word and that we&#8217;re trying to predict (or generate) the next character.  The first layer  (this one: <img data-recalc-dims=\"1\" decoding=\"async\" width=\"150\" height=\"20\" loading=\"lazy\" class=\"wp-image-1628\" style=\"width: 150px;\" src=\"https:\/\/i0.wp.com\/ner.jul.mybluehost.me\/wp-content\/uploads\/2024\/01\/image-3.png?resize=150%2C20\" alt=\"\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-3.png?w=706&amp;ssl=1 706w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-3.png?resize=300%2C39&amp;ssl=1 300w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/>) is nothing else than the embeddings of each of the three characters. Turns out it is exactly the output of the example we introduced in our <a href=\"https:\/\/www.philippeadjiman.com\/blog\/2023\/12\/23\/deep-learning-gymnastics-tensor-indexing\/\">previous gymnastic exercise around tensor indexing<\/a> . We ended up with a tensor of shape (8,3,4) , the one on the right in the figure below. As a reminder, an embedding is simply here a one dimensional tensor (of size 4 in our case).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So in our example, the first layer of the neural net, the <img data-recalc-dims=\"1\" decoding=\"async\" width=\"150\" height=\"20\" loading=\"lazy\" class=\"wp-image-1628\" style=\"width: 150px;\" src=\"https:\/\/i0.wp.com\/ner.jul.mybluehost.me\/wp-content\/uploads\/2024\/01\/image-3.png?resize=150%2C20\" alt=\"\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-3.png?w=706&amp;ssl=1 706w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-3.png?resize=300%2C39&amp;ssl=1 300w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/>, is nothing else than the 3  embeddings of each character, as seen below: <\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-17.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-17-1024x618.png?resize=619%2C373&#038;ssl=1\" alt=\"\" class=\"wp-image-1649\" width=\"619\" height=\"373\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-17.png?resize=1024%2C618&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-17.png?resize=300%2C181&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-17.png?resize=768%2C463&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-17.png?resize=1536%2C927&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-17.png?resize=2048%2C1236&amp;ssl=1 2048w\" sizes=\"auto, (max-width: 619px) 100vw, 619px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">So the first example of the batch is associated with those three embeddings:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-5.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-5-1024x33.png?resize=883%2C28&#038;ssl=1\" alt=\"\" class=\"wp-image-1630\" width=\"883\" height=\"28\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-5.png?resize=1024%2C33&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-5.png?resize=300%2C10&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-5.png?resize=768%2C25&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-5.png?resize=1536%2C50&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-5.png?w=2026&amp;ssl=1 2026w\" sizes=\"auto, (max-width: 883px) 100vw, 883px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Now, in order to pass this to the next layer (this one<img data-recalc-dims=\"1\" decoding=\"async\" width=\"150\" height=\"21\" loading=\"lazy\" class=\"wp-image-1632\" style=\"width: 150px;\" src=\"https:\/\/i0.wp.com\/ner.jul.mybluehost.me\/wp-content\/uploads\/2024\/01\/image-6.png?resize=150%2C21\" alt=\"\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-6.png?w=582&amp;ssl=1 582w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-6.png?resize=300%2C42&amp;ssl=1 300w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/>), we need to concatenate those three embeddings of size 4 each, into a single long one of size 12.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So here is the gymnastic exercise: take our (8,3,4) tensor, and for each of the 8 lines of the batch, transform the 3 embeddings of size 4 into one of size 12 (which is just the concatenation of the 3). We should thus end up with a tensor of shape (8,12).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The basics of PyTorch Views<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s introduce the concept that will allow us to solve the gymnastic exercise as a breeze: PyTorch views. The easiest way to understand PyTorch views is through a simple example.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s create a one dimensional tensor of elements from 0 to 17.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-11.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"99\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-11-1024x99.png?resize=1024%2C99&#038;ssl=1\" alt=\"\" class=\"wp-image-1638\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-11.png?resize=1024%2C99&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-11.png?resize=300%2C29&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-11.png?resize=768%2C74&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-11.png?resize=1536%2C149&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-11.png?w=2000&amp;ssl=1 2000w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The exact same underlying storage can be <strong>view<\/strong>ed as (2,9) tensor.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-10.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"91\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-10-1024x91.png?resize=1024%2C91&#038;ssl=1\" alt=\"\" class=\"wp-image-1637\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-10.png?resize=1024%2C91&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-10.png?resize=300%2C27&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-10.png?resize=768%2C68&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-10.png?resize=1536%2C137&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-10.png?w=1998&amp;ssl=1 1998w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Or a a (9,2) one<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-12.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"208\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-12-1024x208.png?resize=1024%2C208&#038;ssl=1\" alt=\"\" class=\"wp-image-1639\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-12.png?resize=1024%2C208&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-12.png?resize=300%2C61&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-12.png?resize=768%2C156&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-12.png?resize=1536%2C312&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-12.png?w=1996&amp;ssl=1 1996w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Or a (3,2,3) one:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-13.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"196\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-13-1024x196.png?resize=1024%2C196&#038;ssl=1\" alt=\"\" class=\"wp-image-1641\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-13.png?resize=1024%2C196&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-13.png?resize=300%2C57&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-13.png?resize=768%2C147&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-13.png?resize=1536%2C294&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-13.png?w=1996&amp;ssl=1 1996w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">As you understand, as long as the product of the dimensions equals the number of element in the underlying storage (18 in our case), then we can view (or reshape) the tensor.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Beyond being very convenient, the big of advantage of this is that it is blazing fast, because no new tensors are created: the underlying storage stays the same, and only some metadata about the tensor are modified.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Bonus<\/strong>: we can also use -1 to infer the dimension automatically. E.g., if the underlying storage is 18 numbers, then invoking the <strong>view<\/strong> function with shape (-1,9), it will deduce that the first dimension has to be 2:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-14.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"84\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-14-1024x84.png?resize=1024%2C84&#038;ssl=1\" alt=\"\" class=\"wp-image-1643\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-14.png?resize=1024%2C84&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-14.png?resize=300%2C25&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-14.png?resize=768%2C63&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-14.png?resize=1536%2C126&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-14.png?w=1994&amp;ssl=1 1994w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Solving our gymnastic exercise with views<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Now that we understand views, let&#8217;s get back to our gymnastic exercise: we have a tensor of shape (8,3,4) and we need to transform into a tensor of shape (8,12). First, let&#8217;s reproduce the embedded batch of shape (8,3,4) (see <a href=\"https:\/\/www.philippeadjiman.com\/blog\/category\/machine-learning\/deep-learning\/\">our previous gymnastic exercise<\/a> to understand the code below):<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import torch\ntorch.manual_seed(18)\n\n# Create a random batch of shape (8,3) \n# with indexes between 0 and 26\nrandom_tensor = torch.randint(low=0, high=26, size=(8,3))\n\n# Create a random embedding matrix of shape (27,4): \n# one embedding for each of the 27 indexes elements\nembeddings = torch.randn(size=(27, 4))\n\n#Creating the embedded batch\nembedded_batch = embeddings[random_tensor]<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Get ready, and let&#8217;s solve our exercise. As in last post, it will be a short yet sharp (tensor) movement: <\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>input_layer = embedded_batch.view(8,12)<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Yes, that&#8217;s it, just one line. By doing this, each line of batch of 8 embeddings, will extremely effectively and in parallel take their 3 associated embedding of size 4 each, concatenate them together, to thus end up with a tensor of size (8,12).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s actually validate it on the first example of the batch:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-16.png?ssl=1\"><img data-recalc-dims=\"1\" decoding=\"async\" width=\"1024\" height=\"85\" loading=\"lazy\" src=\"https:\/\/i0.wp.com\/www.philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-16-1024x85.png?resize=1024%2C85&#038;ssl=1\" alt=\"\" class=\"wp-image-1647\" srcset=\"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-16.png?resize=1024%2C85&amp;ssl=1 1024w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-16.png?resize=300%2C25&amp;ssl=1 300w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-16.png?resize=768%2C64&amp;ssl=1 768w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-16.png?resize=1536%2C128&amp;ssl=1 1536w, https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/01\/image-16.png?w=1996&amp;ssl=1 1996w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We obtain an embedding of size 12 as expected, which is nothing else than the concatenation of the 3 embeddings of size 4 that we showed at the end of our motivating example above. Baam.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Oh, let&#8217;s not forget that we created this to pass it as input to a layer of a neural net. So let&#8217;s do it: we create the initial random weight and biaises of the layer, pass into it our (reshaped) batch and apply tanh on top of it, in other words:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>W1 = torch.randn((12, 100)) # weights\nb1 = torch.randn(100) # biases\nh = torch.tanh(emb.view(-1, 12) @ W1 + b1) # (8,12) @ (12,100) =&gt; (8,100)<\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">PyTorch view vs. reshape ?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">There is another function in PyTorch called <a href=\"https:\/\/pytorch.org\/docs\/stable\/generated\/torch.reshape.html\">reshape<\/a> that seems to achieve the exact same goal as view. So what&#8217;s the difference?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Typically, <a href=\"https:\/\/pytorch.org\/docs\/stable\/generated\/torch.Tensor.view.html\">view<\/a> is extremely efficient as it won&#8217;t move any underlying data and just modify the shape of the tensor. But it comes with a constraint: the underlying data has to be contiguous, otherwise calling view will return an error (see example below). <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you&#8217;re not sure if your tensor is contiguous, you can either use the <a href=\"https:\/\/pytorch.org\/docs\/stable\/generated\/torch.Tensor.contiguous.html#torch.Tensor.contiguous\">contiguous<\/a> function before calling view (it will make the tensor contiguous), or simply use <a href=\"https:\/\/pytorch.org\/docs\/stable\/generated\/torch.reshape.html\">reshape<\/a> which returns a view if the shapes are compatible, and copies otherwise.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You might ask why anyone would use view over reshape? I asked myself the same question, and I assume that given that using view is guaranteed to be efficient, seeing it in the code gives any reader the guarantee that there is nothing to optimize there. As for the one writing the code, if there are some cases where there would be an inefficient copy, then at least when using view it will fail explicitly and make you aware of the potentially efficiency bottleneck. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Below an example of code illustrating where view wouldn&#8217;t work:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import torch\n\n# Create a non-contiguous tensor\ntensor = torch.tensor([[1, 2, 3], [4, 5, 6]]).t()  # Transpose to make it non-contiguous\n\n# Reshape works successfully\nreshaped_tensor = tensor.reshape(6)\nprint(reshaped_tensor)  # Output: tensor([1, 4, 2, 5, 3, 6])\n\n# View fails with an error\ntry:\n    viewed_tensor = tensor.view(6)\nexcept RuntimeError as e:\n    print(e)  # Output: RuntimeError: view size is not compatible with input tensor&#39;s size and stride<\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">TensorFlow reshape <\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Obviously, TensorFlow also supports the same powerful reshape operation. In TensorFlow, you don&#8217;t have the explicit view function, but reshape handles non-contiguous tensors gracefully, similar to PyTorch&#8217;s reshape. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Below the full TensorFlow code equivalent to what we illustrated above in PyTorch.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import tensorflow as tf\ntf.random.set_seed(18)\n\n# Create a random batch of shape (8,3) with indexes between 0 and 26\nrandom_tensor = tf.random.uniform(shape=(8,3), minval=0, maxval=26, dtype=tf.int32)\n\n# Create a random embedding matrix of shape (27,4): one embedding for each of the 27 indexes elements\nembeddings = tf.random.uniform((27,4), dtype=tf.float32)\n\n# Solving the gymnastic exercise: creating an embedded batch with the tf.gather function\nembedded_batch = tf.gather(embeddings,random_tensor)\n\n# Validating the results\nprint(random_tensor)\nprint(embeddings)\nprint(embedded_batch.shape) # (8,3,4) which is the expected dimension\nprint(embedded_batch[0,0])\n\nW1 =  tf.random.normal([12, 100])\nb1 =  tf.random.normal([100])\nh = tf.math.tanh(tf.linalg.matmul(tf.reshape(embedded_batch, [8, 12]) , W1) + b1)<\/code><\/pre><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Another example of usage: CNNs<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Reshaping is a very useful operation in various cases in Deep Learning. Another frequent usage\/example is in the context of image manipulation in convolutional neural networks (CNN), where you need for instance to connect the output of a convolutional layer to a fully connected layer:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import torch\n\n# An output from a convolutional layer\nconv_output = torch.randn(10, 8, 5, 5)  # (batch size, channels, height, width)\n\n# Flatten for a fully connected layer\nflattened = conv_output.view(-1, 8 * 5 * 5)  # (batch size, flattened features)\n\nprint(flattened.shape)  # Output: torch.Size([10, 200])\n<\/code><\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Alright, that&#8217;s if for today. Hope you&#8217;re now in a better shape, and see you next time for other gymnastic exercises \ud83e\udd38.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=TCH_1BHY58I\">Part 2<\/a> of the amazing makemore series by Andrej Karpathy (which inspired this post).<\/li>\n\n\n\n<li>Great <a href=\"http:\/\/blog.ezyang.com\/2019\/05\/pytorch-internals\/\">blog post<\/a> on the internal representation of tensors, and his very cool <a href=\"https:\/\/ezyang.github.io\/stride-visualizer\/index.html\">stride visualizer<\/a> (it is from a PyTorch research engineer, so it is about PyTorch \ud83d\ude42 but still useful general concepts )<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Your tensors aren\u2019t the right shape? Learn how to reshape, squeeze, and stack them like a deep learning gymnast.<\/p>\n","protected":false},"author":1,"featured_media":1755,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[18,41,14,16],"tags":[],"class_list":["post-1622","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-deep-learning","category-deep-learning-gymnastics","category-pytorch","category-tensorflow"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2024\/02\/image-1-1536x1149-1.png?fit=1536%2C1149&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1622","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/comments?post=1622"}],"version-history":[{"count":2,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1622\/revisions"}],"predecessor-version":[{"id":1952,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1622\/revisions\/1952"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media\/1755"}],"wp:attachment":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media?parent=1622"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/categories?post=1622"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/tags?post=1622"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}