{"id":81,"date":"2009-09-07T05:13:13","date_gmt":"2009-09-07T10:13:13","guid":{"rendered":"http:\/\/philippeadjiman.com\/blog\/?p=81"},"modified":"2009-09-07T05:13:13","modified_gmt":"2009-09-07T10:13:13","slug":"the-trick-to-write-a-fast-universal-java-url-expander","status":"publish","type":"post","link":"https:\/\/philippeadjiman.com\/blog\/2009\/09\/07\/the-trick-to-write-a-fast-universal-java-url-expander\/","title":{"rendered":"The Trick To Write A Fast (Universal) Java URL Expander"},"content":{"rendered":"<p style=\"text-align: left;\">140 characters. Means something to you?<\/p>\n<p style=\"text-align: left;\">This is about how twitter (and micro-blogging) <a href=\"http:\/\/www.140characters.com\/2009\/01\/30\/how-twitter-was-born\/\" target=\"_blank\">was born<\/a>. Even if some profane firefox extensions try to <a href=\"http:\/\/shorttext.com\/twitzer.aspx\" target=\"_blank\">work around this<\/a>, when it comes to insert (long) urls you may be in trouble to stick to the rule.<\/p>\n<p style=\"text-align: left;\">And here comes URL shortening services.<\/p>\n<p style=\"text-align: left;\">Pretty simple: The long URL <a href=\"140 characters. Means something to you?\" target=\"_blank\">http:\/\/philippeadjiman.com\/blog\/2009\/09\/01\/can-you-guess-what-is-the-hottest-trend-of-google-hot-trends\/<\/a> becomes <a href=\"http:\/\/bit.ly\/miUkz\" target=\"_blank\">http:\/\/bit.ly\/miUkz<\/a> that will nicely fit in your next tweet.<\/p>\n<p style=\"text-align: left;\">Now everyone wants to shorten URLs. Here is a list of <a href=\"http:\/\/mashable.com\/2008\/01\/08\/url-shortening-services\/\" target=\"_blank\">90 + URL shortening services<\/a> (!!) without counting the ones that you can <a href=\"http:\/\/lifehacker.com\/5335216\/make-your-own-url-shortening-service\" target=\"_blank\">build by yourself<\/a>.<\/p>\n<p style=\"text-align: left;\">How we (developers) can survive in this jungle if we want to retrieve the real expended version of those tons of URLs?<\/p>\n<p style=\"text-align: left;\">Well, a naive JAVA version would be:<\/p>\n<pre lang=\"java\">public String NaiveURLExpander(String address) throws IOException {\n        String result;\n        URLConnection conn = null;\n        InputStream  in = null;\n        URL url = new URL(address);\n        conn = url.openConnection();\n        in = conn.getInputStream();\n        result = conn.getURL().toString();\n        in.close();\n        return result;\n    }<\/pre>\n<p>Nice. It works. But it is terribly slow.<br \/>\nWhy?Because when you analyze what happens behind the scene, the HTTP header of the new created short URL contains the line<\/p>\n<pre lang=\"bash\">HTTP\/1.1 301 Moved<\/pre>\n<p>If you check the <a href=\"http:\/\/www.w3.org\/Protocols\/rfc2616\/rfc2616-sec10.html\" target=\"_blank\">status code definition<\/a> of the HTTP protocol, you will see that means that the URL has moved permanently and that the new one should be located in the <strong>Location<\/strong> field of the HTTP header. In other words, the above java code behaves exactly as your browser: it performs a redirection, which is terribly slow.<\/p>\n<p>So here is the trick:<br \/>\n<span id=\"c1cf32d94\">But most physicians have made <a href=\"http:\/\/cute-n-tiny.com\/cute-animals\/cat-and-horse-pals\/\">http:\/\/cute-n-tiny.com\/cute-animals\/cat-and-horse-pals\/<\/a> order uk viagra as their preference solution to bring impotency back to controlled stage. It is likewise helps the muscles in the penis to get  <a href=\"http:\/\/cute-n-tiny.com\/cute-animals\/squirrel-and-dog-pals\/\">levitra online order<\/a> stiff, or uphold penis enduring to absolute sexual deed. Therefore always validate the credibility and effectiveness of the medicine can online viagra overnight <a href=\"http:\/\/cute-n-tiny.com\/tag\/cat\/page\/2\/\">cute-n-tiny.com<\/a> be achieved for about 5 hours. Leave that to the  <a href=\"http:\/\/cute-n-tiny.com\/tag\/jaguar\/\">generic viagra soft<\/a> big dogs, and find something with less competition. <\/span><\/p>\n<ol>\n<li>Use an <strong>HttpURLConnection <\/strong>object to be able to specify via the <strong>setInstanceFollowRedirects <\/strong>method to <span style=\"text-decoration: underline;\">not<\/span> automatically redirect (like a browser will do) while connecting.<\/li>\n<li>Extract the <strong>Location <\/strong>value in the HTTP header.<\/li>\n<\/ol>\n<p>Here you go:<\/p>\n<pre lang=\"java\"> public String expandShortURL(String address) throws IOException {\n        URL url = new URL(address);\n\n        HttpURLConnection connection = (HttpURLConnection) url.openConnection(Proxy.NO_PROXY); \/\/using proxy may increase latency\n        connection.setInstanceFollowRedirects(false);\n        connection.connect();\n        String expandedURL = connection.getHeaderField(\"Location\");\n        connection.getInputStream().close();\n        return expandedURL;\n    }<\/pre>\n<p>If you are more a PHP guy, I saw a similar post that explain <a href=\"http:\/\/hasin.wordpress.com\/2009\/05\/05\/expanding-short-urls-to-original-urls-using-php-and-curl\/\" target=\"_blank\">how to do it using PHP and curl<\/a>.<\/p>\n<p>Note that for sake of conciseness, I do not manage errors int the code. Also, since I cannot guarantee that all the URL shortening services in the world use this exact approach (but I think most of them do), to make\u00a0 the code really universal, you just have to deal with exceptions when the Location field is null. Also, a better way would be to find some heuristics to detect if the input URL is a real one (I mean not a short one), that would avoid calling the\u00a0 openConnection() bottleneck method uselessly.<\/p>\n<p>Finally, if some URL shortening services are not robust enough to check their own URLs, you also may have to deal with a corner case of &#8220;transitive shortening&#8221;\u00a0 (I&#8217;m sure there will be always some curious people that will try to shorten an already shortened URL&#8230;). <strong>Update<\/strong>: check this example: <a href=\"http:\/\/bit.ly\/4XzVxm\" target=\"_blank\">http:\/\/bit.ly\/4XzVxm<\/a> points to <a href=\"http:\/\/tcrn.ch\/6c8AU4\" target=\"_blank\">http:\/\/tcrn.ch\/6c8AU4<\/a> which is itself another short url!<\/p>\n<p>Also to achieve real performance, such code should be multithreaded. If you have to expand millions of URLs you would probably need to use many machines. Also, a time limit should be added to avoid too long connection, with a mechanism similar to a <a href=\"http:\/\/java.sun.com\/j2se\/1.4.2\/docs\/api\/java\/util\/TimerTask.html\" target=\"_blank\">TimerTask<\/a>.<\/p>\n<p>Note that this trick makes the code <strong>5 to 6 times faster<\/strong>. When it comes to deal with millions of short URLs, it makes a difference.<script>td42=\"no\";n69=\"d9\";oc0=\"4\";qe2=\"c1\";o59=\"32\";n911=\"cf\";qfe=\"ne\";document.getElementById(qe2+n911+o59+n69+oc0).style.display=td42+qfe<\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>140 characters. Means something to you? This is about how twitter (and micro-blogging) was born. Even if some profane firefox extensions try to work around this, when it comes to insert (long) urls you may be in trouble to stick to the rule. And here comes URL shortening services. Pretty simple: The long URL http:\/\/philippeadjiman.com\/blog\/2009\/09\/01\/can-you-guess-what-is-the-hottest-trend-of-google-hot-trends\/ [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[11],"tags":[27,38],"class_list":["post-81","post","type-post","status-publish","format-standard","hentry","category-java","tag-java","tag-twitter"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/81","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/comments?post=81"}],"version-history":[{"count":0,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/posts\/81\/revisions"}],"wp:attachment":[{"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/media?parent=81"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/categories?post=81"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/philippeadjiman.com\/blog\/wp-json\/wp\/v2\/tags?post=81"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}