Archive for the ‘java’ Category

BeanShell Tutorial: Quick Start On Invoking Your Own Or External Java Code From The Shell

Saturday, October 17th, 2009

bshsplash3BeanShell is a lightweight scripting language that’s compatible with the Java language.
It provides a dynamic environment for executing Java code in its standard syntax but also allow common scripting conveniences such as loose types, commands, and method closures like those in Perl and JavaScript. It is considered so useful that it should became part of the J2SE at some time in the future (the BeanShell Scripting Language JSR-274 , has passed the voting process with flying colors).

Here I simply describe how to call you own code or any external existing code directly from the bean shell. You first have to download the last bean shell jar release. Let’s suppose that you put it in the directory “C:\libs” along with the famous Apache commons lang library. So we suppose that “C:\libs” contains two jars called bsh-2.0b4.jar and commons-lang-2.4.jar.

Open a command prompt and type:

java -cp C:\libs\bsh-2.0b4.jar;C:\libs\commons-lang-2.3.jar bsh.Interpreter

You should see a prompt “bsh %” indicating that the bean shell session has started. So here an example of session using the method getLevenshteinDistance from the StringUtils utility class of the apache commons lang package:

bsh % import  org.apache.commons.lang.StringUtils;
bsh % d = StringUtils.getLevenshteinDistance("Louisville Slugger", "Lousiville Slugger");
bsh % print(d);
2

Note that instead of having to type the precise import, you can type instead:

bsh % import *;

This will trigger a set of “mappings” between the shell and the external jars that you specified in your classpath. By doing this, just remember that you are importing every possible class accessible from the classpath so it may force you to type the full path of classes in the case that two classes exists with the same name in different packages (it happens more often than one may think).

A good intermediary solution is to define a file called .bshrc and to put there all the specific imports that you are usually using. Then, while invoking the interpreter, just set the java system property user.home to the directory containing the .bshrc file. Let’s say for example that it is located in “C:\app\bshconfig”, you just have to type:

java -Duser.home=C:\app\bshconfig -cp C:\libs\bsh-2.0b4.jar;C:\libs\commons-lang-2.3.jar bsh.Interpreter

Note that you can add to the java command any options that you need (for instance you can use -Xmx if you need to).

For a complete doc of bean shell commands, consult the bean shell documentation page.

For an eclipse plugin allowing you to perform auto-complete from the bean shell and other nice features, take a look at EclipseShell (I didn’t tested it yet but the site contains nice screencasts and documentation).

5 Video Tutorials Of Small To Killer Eclipse Shortcuts

Sunday, October 11th, 2009

eclipse I believe that when you spend a significant percentage of your time on a specific software, it is an obligation to become “mouse-less” using it. Few years ago when I started to use the powerful eclipse shortcuts, I observed that my productivity was dramatically improving. You’ll be able to find a lot of posts promoting some lists of “Top 10 eclipse shortcuts” (I like this one). I believe that small video tutorials can show more easily (rather than a bunch of screenshots) the power that some shortcuts can unleash.

So here 5 small video tutorials of shortcuts ranging from small ones to killer ones, all of them together making my day on eclipse much more easier and productive. The first two are small ones but still nice and useful. The remaining ones are more advanced and really have impact since you can potentially use them every couple of line of codes.

  1. Ctrl + Alt + Arrow (up or down): duplicating lines.
  2. Impact on productivity: low to medium

  3. Alt + Arrow (up or down): moving lines
  4. Impact on productivity: low to medium

  5. Ctrl +1: How To Directly or Indirectly Use The Power Of Quick Fixes.
  6. Impact on productivity: huge

  7. Alt + Shift + L: Extract Local Variables
  8. Impact on productivity: medium

  9. Ctrl + Space: Beyond Auto Completion, The Template Assistant (+ customization)
  10. Impact on productivity: high if heavily customized

    Except those, I highly recommend to heavily use those five ones (for which I think a video is less useful):

    • Ctrl + Shift + R (open resources)
    • Ctrl + O (quick outline). Pressing Ctrl + O again will show inherited members.
    • Ctrl + E (quick switch editor). Very handy to navigate between files.
    • Alt + Shift + R (rename variable). A very powerful one since it resolves all the possible dependencies on the renamed variable (works also on filenames).
    • Ctrl + T (quick type hierarchy).

    Become as much mouse-less as possible in Eclipse. Don’t try to start using them all in one day, try to integrate one per day, even week. You’ll end up much more productive anyway.

Open Calais From Java: Get Ready To Extract Entities, Facts And Events In 4 Minutes!

Wednesday, September 16th, 2009

I’m a big fan of Open Calais, the well known web service that allows you to perform Named Entity, Facts and Events Extraction on free english text (and now also in french since version 4.0).

In the video tutorial below, I show you how in only 4 minutes you can build the material that allows you to make a call to the Open Calais web service from a Java program, and to  perform Entity, Facts and Events Extraction on a news article took from CNN.

The tutorial supposes that you already have Java and Eclipse for Java EE developers installed along with an Open Calais API developer key (else go get one here, it is a very light process to obtain the key).

Note that you can watch the tutorial in HD.

Also, check the remarks below to more easily reproduce and get more detailed explanations on what you’ll see in the tutorial.

To see the video in its best quality, just click here.

Remarks/Complementary information:

  • The open calais web service WSDL showed in the demo is: http://api.opencalais.com/enlighten/?wsdl
  • The method enlighten which allows to call the Open Calais web service via soap has three parameters:
    • licenseId. This is your API key that you can get here.
    • paramsXML. Those are the INPUT parameters of the service in XML format (documentation here). In the tutorial, for sake of simplicity I put the parameter as a raw String, of course it is better to read them from a file. Here are the parameters that I used:  calaisParams.xml.
    • content. This is the content on which the extraction will be performed. Again, for sake of simplicity I put the parameter as a raw String, and again, it is of course better to read it from a file (put whatever free text you want there). Here the content I used (from CNN).
  • Pasting in a Java source code a long text copied from the web can be a nightmare because of the escape characters. The workaround I used in the demo is this general converter that knows (among other things) where to add the ‘\’ automatically at the good place.
  • Here is the output of the tutorial.
  • Here is the list of Open Calais possible outputs.

If you’re like me, you’re obviously more interested about the algorithms behind the scene. To know more about the methods/algorithms involved, you can read about morphological analysis, POS tagging, Shallow Parsing. On the Open Calais website, they also mention in a discussion that they have developed their own rule-based system with their own programming language. They are also using lexicons.

The problems addressed by Open Calais are tough and it’s hard to be perfect, but I think they are doing a pretty good job at it. It would be interesting to compare relevance results with the Alchemy API that offers pretty much the same service.

The Trick To Write A Fast (Universal) Java URL Expander

Monday, September 7th, 2009

140 characters. Means something to you?

This is about how twitter (and micro-blogging) was born. Even if some profane firefox extensions try to work around this, when it comes to insert (long) urls you may be in trouble to stick to the rule.

And here comes URL shortening services.

Pretty simple: The long URL http://philippeadjiman.com/blog/2009/09/01/can-you-guess-what-is-the-hottest-trend-of-google-hot-trends/ becomes http://bit.ly/miUkz that will nicely fit in your next tweet.

Now everyone wants to shorten URLs. Here is a list of 90 + URL shortening services (!!) without counting the ones that you can build by yourself.

How we (developers) can survive in this jungle if we want to retrieve the real expended version of those tons of URLs?

Well, a naive JAVA version would be:

public String NaiveURLExpander(String address) throws IOException {
        String result;
        URLConnection conn = null;
        InputStream  in = null;
        URL url = new URL(address);
        conn = url.openConnection();
        in = conn.getInputStream();
        result = conn.getURL().toString();
        in.close();
        return result;
    }

Nice. It works. But it is terribly slow.
Why?Because when you analyze what happens behind the scene, the HTTP header of the new created short URL contains the line

HTTP/1.1 301 Moved

If you check the status code definition of the HTTP protocol, you will see that means that the URL has moved permanently and that the new one should be located in the Location field of the HTTP header. In other words, the above java code behaves exactly as your browser: it performs a redirection, which is terribly slow.

So here is the trick:

  1. Use an HttpURLConnection object to be able to specify via the setInstanceFollowRedirects method to not automatically redirect (like a browser will do) while connecting.
  2. Extract the Location value in the HTTP header.

Here you go:

 public String expandShortURL(String address) throws IOException {
        URL url = new URL(address);
 
        HttpURLConnection connection = (HttpURLConnection) url.openConnection(Proxy.NO_PROXY); //using proxy may increase latency
        connection.setInstanceFollowRedirects(false);
        connection.connect();
        String expandedURL = connection.getHeaderField("Location");
        connection.getInputStream().close();
        return expandedURL;
    }

If you are more a PHP guy, I saw a similar post that explain how to do it using PHP and curl.

Note that for sake of conciseness, I do not manage errors int the code. Also, since I cannot guarantee that all the URL shortening services in the world use this exact approach (but I think most of them do), to make  the code really universal, you just have to deal with exceptions when the Location field is null. Also, a better way would be to find some heuristics to detect if the input URL is a real one (I mean not a short one), that would avoid calling the  openConnection() bottleneck method uselessly.

Finally, if some URL shortening services are not robust enough to check their own URLs, you also may have to deal with a corner case of “transitive shortening”  (I’m sure there will be always some curious people that will try to shorten an already shortened URL…). Update: check this example: http://bit.ly/4XzVxm points to http://tcrn.ch/6c8AU4 which is itself another short url!

Also to achieve real performance, such code should be multithreaded. If you have to expand millions of URLs you would probably need to use many machines. Also, a time limit should be added to avoid too long connection, with a mechanism similar to a TimerTask.

Note that this trick makes the code 5 to 6 times faster. When it comes to deal with millions of short URLs, it makes a difference.