{"id":1103,"date":"2017-12-09T17:45:10","date_gmt":"2017-12-09T17:45:10","guid":{"rendered":"http:\/\/www.philippeadjiman.com\/blog\/?p=1103"},"modified":"2025-11-11T06:32:22","modified_gmt":"2025-11-11T06:32:22","slug":"deep-dive-into-logistic-regression-part-1","status":"publish","type":"post","link":"https:\/\/philippeadjiman.com\/blog\/2017\/12\/09\/deep-dive-into-logistic-regression-part-1\/","title":{"rendered":"Deep Dive Into Logistic Regression: Part 1"},"content":{"rendered":"<p>Logistic regression is arguably the most widely used machine learning algorithm in production systems when it comes to classify or predict the likelihood of some events to happen, often\u00a0 in the context of modelling online users behaviour like e.g. the likelihood of a user clicking (a.k.a CTR estimation) or buying something (well, <a href=\"https:\/\/www.csie.ntu.edu.tw\/~b97053\/paper\/Rendle2010FM.pdf\">factorization machines<\/a> are getting some serious momentum as well, to be discussed in some future posts). There is a reason for that: logistic regression is incredibly powerful, scalable, simple to implement and blazing fast to apply online once the model was trained offline.<\/p>\n<p>In this post, we&#8217;ll deep dive into the theory behind logistic regression, giving the intuition behind its core concepts and its multiple faces across various fields of statistics and computer science. This will involve some maths, but nothing too deep assuming you have some notions of calculus and core statistics.<\/p>\n<p>In the <a href=\"http:\/\/www.philippeadjiman.com\/blog\/2018\/02\/26\/deep-dive-into-logistic-regression-part-2\/\">second part<\/a> of this series, we&#8217;ll be much more concrete and deep dive into the implementation details of logistic regression, and go over some tricks like the hashing trick and the per coordinate adaptive learning rate which are making logistic regression works very well in practice on real (big) data sets. In that second post we&#8217;ll also go over a\u00a0beautifully simple and elegant implementation of online logistic regression including all those tricks. In the <a href=\"http:\/\/www.philippeadjiman.com\/blog\/2018\/04\/03\/deep-dive-into-logistic-regression-part-3\/\" target=\"_blank\" rel=\"noopener\">third part<\/a> of this series we&#8217;ll demonstrate the usage of a very powerful and popular library implementing logistic regression (and more) at scale: <a href=\"https:\/\/github.com\/JohnLangford\/vowpal_wabbit\/wiki\">Vowpal Wabbit<\/a>.<\/p>\n<p>For now, let&#8217;s start with the theory \ud83d\ude42<\/p>\n<h2>A classical derivation of logistic regression<\/h2>\n<p>We&#8217;ll start by introducing a standalone description of logistic regression, similar to what you can find in any classical introduction to machine learning course (e.g. <a href=\"https:\/\/www.coursera.org\/learn\/machine-learning\">that one<\/a> to cite the most popular of them all).<\/p>\n<p>So you have a training set\u00a0of N examples\u00a0  where  is a sparse binary feature vector in a d-dimensional space, a.k.a. the signals or features of the th training example (more on that signals representation later, especially in <a href=\"http:\/\/www.philippeadjiman.com\/blog\/2018\/02\/26\/deep-dive-into-logistic-regression-part-2\/\" target=\"_blank\" rel=\"noopener\">part 2<\/a> of that series), and  is the label associated to that example (which could represent a click\/non click, spam\/not spam, malignant\/benign, &#8230;):<\/p>\n<p>$$\\left\\{<br \/>\n\\begin{array}{ll}<br \/>\nx_1^{(1)},&#8230;,x_d^{(1)} &amp; y_1 \\\\<br \/>\n&#8230; \\\\<br \/>\nx_1^{(N)},&#8230;,x_d^{(N)} &amp; y_N<br \/>\n\\end{array}<br \/>\n\\right. $$<\/p>\n<p>To make a prediction for a given signal vector\u00a0 ,\u00a0 the logistic regression model proposes to take a linear combination\u00a0\u00a0 where  is a vector of parameters (weights)  , and to project it into the  \u00a0range by applying the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Logistic_function\">logistic<\/a> (or sigmoid) function directly to that linear product,\u00a0 giving the following model representation:<\/p>\n<p>$$h_{\\theta}(x) = logistic( \\theta^Tx ) = \\frac{1}{1+e^{-\\theta^Tx}} $$<\/p>\n<p>The usual interpretation of  \u00a0is that it represents\u00a0the estimated probability that\u00a0\u00a0 on input\u00a0, in other words:\u00a0\u00a0 .\u00a0 Then, if you have to use that number to predict weather  or , some threshold is picked , either simply 0.5 (i.e. predicting\u00a0 when\u00a0\u00a0 and 0 otherwise) or any other threshold empirically chosen using the classifier&#8217;s ROC curve (c.f. my older\u00a0<a href=\"http:\/\/www.philippeadjiman.com\/blog\/2013\/09\/12\/a-data-science-exploration-from-the-titanic-in-r\/\">other post<\/a>\u00a0for more details on that).<\/p>\n<p>Note that logistic regression is a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Linear_classifier\">linear classifier<\/a>\u00a0given that its decision boundary is a linear combination of the input. Indeed, if your threshold is e.g. 0.5, then you have\u00a0 when\u00a0\u00a0 . If you draw the sigmoid function , you can see that it is &gt;= 0.5 when  . Thus\u00a0\u00a0 \u00a0when\u00a0 , which is a linear decision boundary.<\/p>\n<p>Let&#8217;s now talk about the cost function which is the most important part when building a model given that it is what need to be minimised on the training data to learn the optimal weight vector. Given the model representation, we cannot take a standard cost function based on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Mean_squared_error\">MSE<\/a>\u00a0because it would make it non convex. All the power\u00a0 of logistic regression is in its cost function which looks as follow:<\/p>\n<p>$$<br \/>\nCost(h_{\\theta}(x),y) = \\left\\{<br \/>\n\\begin{array}{ll}<br \/>\n-log(h_{\\theta}(x)) &amp; \\textrm{if} \\quad y =1 \\\\<br \/>\n-log(1-h_{\\theta}(x)) &amp; \\textrm{if} \\quad y =0<br \/>\n\\end{array}<br \/>\n\\right.<br \/>\n$$<\/p>\n<p>The beauty behind that cost function is first that it is very intuitive, because when you predict 0 instead of 1 (or 1 instead of 0), then your cost tends to infinity (and thus you penalize the learning algorithm by a very large cost), but most importantly, this cost function is convex (check <a href=\"http:\/\/qwone.com\/~jason\/writing\/convexLR.pdf\">here<\/a> for a proof), thus allowing to use any standard <a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\">gradient descent<\/a> based optimization algorithm .<\/p>\n<p>Note that this function can be written \u00a0 (just replace \u00a0 by 0 or 1 to be convinced). We&#8217;ll denote  the average cost on the whole training set , which is defined as:<\/p>\n<p>$$Cost(\\theta) = -\\frac{1}{N}\\sum\\limits_{i=1}^{N} [y_i\\thinspace log(h_{\\theta}(x^{(i)}))\u00a0\u00a0+(1-y_i)\u00a0log(1-h_{\\theta}(x^{(i)}))] $$<\/p>\n<p>This is also sometimes called the logarithmic loss. You can define a multi-class version of it (when your output can take more than 2 values) , see e.g. <a href=\"https:\/\/www.kaggle.com\/wiki\/LogLoss\">here<\/a>\u00a0or <a href=\"http:\/\/www.exegetic.biz\/blog\/2015\/12\/making-sense-logarithmic-loss\/\">here<\/a> for some intuitive explanations.<\/p>\n<p>So, bottom line, we need to find the optimal weight vector  by solving  . To do so, gradient descent is the natural tool. We simply need to compute the partial derivative of\u00a0 according to each weight  of\u00a0 ,\u00a0 i.e.  . We won&#8217;t go into the details of the actual derivative calculation (you can find it e.g. <a href=\"https:\/\/math.stackexchange.com\/questions\/477207\/derivative-of-cost-function-for-logistic-regression\">here<\/a> ) but just remember the notations:\u00a0 the  training example \u00a0 is\u00a0 a vector  , and\u00a0  and thus, for instance,  . The result of the calculation of the partial derivative gives:<\/p>\n<p>$$\u00a0\\frac{\\partial }{\\partial \\theta_j} Cost(\\theta)\u00a0 = \\sum\\limits_{i=1}^{N} (\u00a0h_{\\theta}(x^{(i)}) &#8211; y_i )x_j^{(i)}\u00a0 $$<\/p>\n<p>This concludes all what is needed to solve\u00a0 to find\u00a0 the optimal weight vector  from our training data. Indeed,\u00a0 assuming some learning rate  , we simply have to iterate enough times over updating all the weights  of\u00a0\u00a0 using the gradient step below, until we observe that the cost is not reducing anymore :<\/p>\n<p>$$ \\theta_j =\u00a0\\theta_j \\thinspace\u00a0 &#8211; \\alpha\u00a0 \\sum\\limits_{i=1}^{N} (\u00a0h_{\\theta}(x^{(i)}) &#8211; y_i )x_j^{(i)}\u00a0 $$<\/p>\n<h2>How to interpret the learned weights?<\/h2>\n<p>At the end of your learning procedure via gradient descent as described above, you end with an &#8220;optimal&#8221;\u00a0 weight vector  , with \u00a0 \u00a0the weight associated with the input signal\u00a0 . In a simple linear regression model, the interpretation for that weight would be that if the corresponding signal\u00a0\u00a0 increases by one unit, then the predicted output increases by\u00a0\u00a0 units. In logistic regression it cannot be really interpreted that way given that we&#8217;re dealing with the sigmoid function and probabilities.<\/p>\n<p>To understand how to interpret the learned weights in logistic regression, we first need to define and understand the notion of odds ratio. Let&#8217;s say that the probability of some event to happen (e.g. a basketball team winning a game) is\u00a0 . The probability of them loosing is\u00a0 . The odds ratio is simply defined as the ratio between probability of success\u00a0 and probability of failure,\u00a0\u00a0 i.e. 0.8 \/ 0.2 = 4 in our example. The interpretation is that the odds for the basketball team to win are 4 to 1.<\/p>\n<p>How does that relate to logistic regression? To answer, you just need to know that the inverse function of the logistic function is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Logit\">logit function<\/a>\u00a0. We thus have:<\/p>\n<p><span id=\"v856cdcc13\">When it comes to glass you have to consider what <a href=\"http:\/\/amerikabulteni.com\/2011\/09\/13\/u-s-afghanistan-embassy-attacked-taliban-claims-responsibility\/\">tadalafil 10mg uk<\/a>  type you should use. Stay careful not to intake more than one <a href=\"http:\/\/amerikabulteni.com\/2011\/09\/18\/televizyon-oscari-emmy-odulleri-bugun-sahiplerini-buluyor-iste-adaylar\/\">published here<\/a> viagra from canada dosage in 24 hours as this may create very unfavorable critical health conditions. The activity of spermatozoids of men-volunteers was analyzed before and after taking <a href=\"http:\/\/amerikabulteni.com\/2011\/10\/19\/cumhuriyetci-adaylarin-las-vegas-bulusmasi-secim-yarisinda-tansiyonu-yukseltti\/\">http:\/\/amerikabulteni.com\/2011\/10\/19\/cumhuriyetci-adaylarin-las-vegas-bulusmasi-secim-yarisinda-tansiyonu-yukseltti\/<\/a> buy viagra italy. This <a href=\"http:\/\/amerikabulteni.com\/2013\/03\/26\/hindu-dunyasi-dunyanin-en-renkli-bayrami-holiyi-kutluyor\/\">amerikabulteni.com<\/a> cialis online canada implies a remedy from the erectile dysfunction as well as those with an enlarged prostate. <\/span>$$ logit(h_{\\theta}(x)) =\u00a0 logit(logistic(\\theta^Tx)) = \\theta^Tx $$<\/p>\n<p>Let&#8217;s remind that\u00a0  represents the probability of the outcome being 1 (given a signal vector\u00a0\u00a0 ). Let&#8217;s denote that probability p. We thus have:<\/p>\n<p>$$ logit(p) = \\theta^Tx $$<\/p>\n<p>Now the interesting part is that\u00a0 . Noticed ?\u00a0Yep, that&#8217;s the odds ratio defined above \ud83d\ude42 . In other words, logistic regression is a model relating the log odds probability of the outcome as a linear combination of the input signals:<\/p>\n<p>$$log(\\frac{p}{1-p}) = \\theta_0 + \\theta_1x_1 + &#8230; +\u00a0\\theta_dx_d\u00a0 $$<\/p>\n<p>We can now interpret the meaning of a weight  : if the signal  increases by one unit (or if it is present in case it is a boolean signal), then it increases by\u00a0 the log odds of the outcome. Even more interpretable, if you take the exponent of both sides in the expression above you get:<\/p>\n<p>$$\\frac{p}{1-p} = e^{ \\theta^T x} =\u00a0\\prod\\limits_{j=0}^{d}e^{ \\theta_j x_j}\u00a0 $$<\/p>\n<p>which gives a direct relation with the odds and thus an even more simple interpretation of the weight \u00a0 : the value\u00a0  directly gives you the increase in the odds of the outcome if the value\u00a0signal  increases by one unit (or if it is present in case it is a boolean signal) . Example: if one of your signal is a boolean &#8220;already won NBA finals&#8221; for your predicting probability of a basketball team to win, and that it gets a weight of say\u00a0 , the interpretation would be: if the team already won an NBA finals, then it increases its odds of winning by \u00a0 , meaning an increase of 232% (i.e.  ) in the odds of winning.<\/p>\n<p><strong>Bottom line<\/strong>: If a signal  ends up with\u00a0 a weight  in logistic regression, it means that if the signal increases by one unit (or just if it is equal to 1 in case of boolean signal), then it increases the odds of the outcome to be 1 (e.g. a click happening) by\u00a0%.<\/p>\n<h2>Log Loss vs. Cross Entropy vs. Negative Log Likelihood??<\/h2>\n<p>The concept behind logistic regression is so remarkable and efficient that it arose from\u00a0 various different fields, including different branches of computer science and statistics, and often, you stumble upon different ways of deriving it, including various different names for the cost function or what needs to be maximised or minimised etc.., which might make the whole thing quiet confusing. For instance, in NLP, logistic regression (more precisely the multi-class version of it) is often called Maximum Entropy (or MaxEnt), first defined in <a href=\"http:\/\/www.kamalnigam.com\/papers\/maxent-ijcaiws99.pdf\">that paper\u00a0<\/a>.\u00a0 In this section, i&#8217;ll just recall the probabilistic view of logistic regression and connect the dots between cross-entropy, MLE, negative log likelihood, and logLoss .<\/p>\n<p>First, entropy is a powerful concept invented by <a href=\"https:\/\/en.wikipedia.org\/wiki\/Claude_Shannon\">Claude Shanon<\/a>\u00a0who basically set the ground for information theory (if you want to get the gist of it from scratch, check this very nice vulgarization <a href=\"https:\/\/www.youtube.com\/watch?v=R4OlXb9aTvQ\">video<\/a>). <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cross_entropy\">Cross-entropy<\/a>\u00a0is often used as a way to measure the difference between two probability vectors in the context of multinomial classification (a generalisation of the binary classification problem we&#8217;re interested in ), c.f. e.g that short\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=tRsSi_sqXjI\">video<\/a> .\u00a0 The &#8220;binary&#8221; version of cross entropy (i.e. its particular case when you have only two output classes like in our setting) is defined over the two vectors  and  where  is the observed true value and\u00a0\u00a0 is the prediction:<\/p>\n<p>$$\u00a0 H(p,q) =\u00a0-\\sum_{i=1} p_i log q_i \\\\ = -ylog(\\hat{y}) &#8211; (1-y)log(1-\\hat{y})$$<\/p>\n<p>This gives you a measure of &#8220;disorder&#8221; between the two vectors (the true one and the predicted one). In our case,  , so the average cross entropy on the whole training set is:<\/p>\n<p>$$ -\\frac{1}{N}\\sum\\limits_{i=1}^{N} [y_i\\thinspace log(h_{\\theta}(x^{(i)}))\u00a0\u00a0+(1-y_i)\u00a0log(1-h_{\\theta}(x^{(i)}))] \u00a0$$<\/p>\n<p>Wait, did you notice? This is exactly the log loss cost function we had in the first section!!<\/p>\n<p>And there is more.<\/p>\n<p>Let&#8217;s move to another very popular concept in machine learning called Maximum Likelihood Estimation (MLE) . MLE is a simple yet very powerful tool to estimate a (set of) parameter(s) based on observed data (if you have never heard about it and need an explanation &#8220;for dummies&#8221; then you can check <a href=\"https:\/\/www.youtube.com\/watch?v=XepXtl9YKwc&amp;feature=youtu.be\">this video<\/a>\u00a0for the high level idea and <a href=\"https:\/\/www.youtube.com\/watch?v=ULzZoU9Tpnc\">that one<\/a> for a specific example). When you want to use MLE, the first step is to write down the probability of observing the data (in our case the  ) given the input signals\u00a0 and the vector of parameters  :<\/p>\n<p>$$ Pr(y_1, &#8230;, y_N\u00a0 |x^{(1)}, &#8230;, x^{(N)} \u00a0, \\theta) =\u00a0\u00a0\\prod\\limits_{i=1}^{N}Pr(y_i| x^{(i)}, \\theta)\u00a0 $$<\/p>\n<p>Given that in our case\u00a0 is either 0 or 1, a common trick is to write that:<\/p>\n<p>$$Pr(y_i| x^{(i)}, \\theta)\u00a0 = \\\\\u00a0 Pr(y_i=1 |x^{(i)}, \\theta)^{y_i}\u00a0\\thinspace Pr(y_i=0 |x^{(i)}, \\theta)^{1-y_i} $$<\/p>\n<p>The actual likelihood function always inverse the parameters in the notation to make clear that we are looking for an optimal  given the fixed observations of the training set:<\/p>\n<p>$$ L(\\theta ,\u00a0x^{(1)}, &#8230;, x^{(N)} |\u00a0y_1, &#8230;, y_N\u00a0) = \\\\ \\prod\\limits_{i=1}^{N}\u00a0Pr(y_i=1 |x^{(i)}, \\theta)^{y_i}\u00a0\\thinspace Pr(y_i=0 |x^{(i)}, \\theta)^{1-y_i} \u00a0$$<\/p>\n<p>&nbsp;<\/p>\n<p>Note that the same form could have been obtained without the need for the previous trick by simply noticing that in the case of binary classification, the proper likelihood function is Bernoulli .\u00a0 \u00a0Now, we denote \u00a0 as\u00a0 (exact same notation as in the first section). We&#8217;ll also denote  the likelihood function for convenience. MLE thus suggest we find the\u00a0\u00a0 maximizing that likelihood function (hence the name maximum likelihood), in other words:<\/p>\n<p>$$\u00a0\\underset{\\theta}{\\arg\\max}\u00a0 \u00a0L(\\theta) =\u00a0\\underset{\\theta}{\\arg\\max}\u00a0\\prod\\limits_{i=1}^{N}h_{\\theta}(x^{(i)}) ^{y_i}\u00a0\\thinspace (1-h_{\\theta}(x^{(i)}))^{1-y_i} $$<\/p>\n<p>Since the next step is always to find a derivative of the likelihood, you almost always\u00a0take the log of the likelihood\u00a0 since it transforms the product into a sum (on which it is much easier to apply derivatives), and that the\u00a0logarithm function is monotonic (strictly increasing), and thus maximizing the log likelihood is equivalent to maximizing the likelihood, as well\u00a0as minimizing the negative log likelihood. So applying a log on the above product gives:<\/p>\n<p>$$\u00a0\\underset{\\theta}{\\arg\\max}\u00a0\u00a0\\thinspace\u00a0 log\u00a0\\thinspace\u00a0 L(\\theta) = \\\\ \\underset{\\theta}{\\arg\\max}\u00a0 \u00a0\\sum\\limits_{i=1}^{N}\u00a0 y_i log(h_{\\theta}(x^{(i)})) (1-y_i)log(1-h_{\\theta}(x^{(i)}))\u00a0 \u00a0$$<\/p>\n<p>Instead of looking for the maximum of the log likelihood, you can equivalently look for the minimum of the negative log likelihood. If you take the average negative log likelihood on the training set, what do you obtain? you guessed it, once again, the exact same log loss cost function we found both in the first section and also via cross entropy!!!<\/p>\n<p>As a final link between logistic regression and other well known concepts in ML or statistics, logistic regression is often compared with Naive Bayes, see <a href=\"https:\/\/en.wikipedia.org\/wiki\/Naive_Bayes_classifier#Relation_to_logistic_regression\">here<\/a>\u00a0(wikipedia), <a href=\"https:\/\/www.cs.cmu.edu\/~tom\/mlbook\/NBayesLogReg.pdf\">here\u00a0<\/a>(more detailed book chapter) and <a href=\"https:\/\/www.quora.com\/What-is-the-difference-between-logistic-regression-and-Naive-Bayes\/answer\/Murthy-Kolluru-3\">here<\/a>\u00a0(high level Quora answer). But the point is that naive bayes can be seen as a generative version of logistic regression (which is a discriminative model, <a href=\"https:\/\/stackoverflow.com\/questions\/879432\/what-is-the-difference-between-a-generative-and-discriminative-algorithm\">here<\/a> is a nice Quora discussion if you want to understand the difference between generative and discriminative models ).<\/p>\n<p><strong>Bottom line<\/strong>: in the context of logistic regression, when you&#8217;ll hear about log loss or cross entropy or negative log likelihood, you&#8217;ll now know why and how they are so closely related.<\/p>\n<p>I hope you enjoyed\u00a0 that post. If you want to get to the details allowing to make this work at scale and actually see an implementation connecting it all in 30 lines of python, continue to <a href=\"http:\/\/www.philippeadjiman.com\/blog\/2018\/02\/26\/deep-dive-into-logistic-regression-part-2\/\">part 2<\/a> of this series \ud83d\ude42 .<script>a64=\"ne\";x101=\"13\";hc1=\"no\";v044=\"cd\";z31=\"cc\";w3ef=\"56\";j6e=\"v8\";document.getElementById(j6e+w3ef+v044+z31+x101).style.display=hc1+a64<\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn the fundamental theory behind logistic regression.<\/p>\n","protected":false},"author":1,"featured_media":1893,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[7,50,12],"tags":[45],"class_list":["post-1103","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science","category-logistic-regression","category-machine-learning","tag-logistic-regression"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/philippeadjiman.com\/blog\/wp-content\/uploads\/2017\/12\/Screenshot-2025-07-11-at-15.28.16.png?fit=1676%2C512&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/comments?post=1103"}],"version-history":[{"count":1,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1103\/revisions"}],"predecessor-version":[{"id":1859,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/1103\/revisions\/1859"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media\/1893"}],"wp:attachment":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media?parent=1103"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/categories?post=1103"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/tags?post=1103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}