Rants about Java and other internet technologies by Sam Pullara

2008 Olympic Medal Counts by Population

There are obviously a lot of ways to measure how well a country did at the Olympics. This post takes a view that we should look at how many people the country had to draw on in order to send the athletes to China to compete. There are a lot of problems with this including: ex-pats competing for their home country, vast disparity in wealth between countries and the relative interest in the Olympic games of the cultures. One of the things that jumps out immediately is that island nations that draw on a larger related population do very well in the games. They likely have inherited not only the interest in the competition but are also wealthy enough to train and compete in the games.

One of the things that was interesting in putting this together was that I eventually settled on PowerSet as the easiest way to lookup the population of a country.  Both Yahoo and Google will give shortcuts to many of them, they don’t do it for all of them.  Though PowerSet gets this population data through Freebase, Freebase itself doesn’t have a great search interface.

If I was going to declare an overall country winner for the games I would likely choose Australia.  I’ve highlighted the top-10 total medal winners in the table in blue and they are far ahead of anyone else in the top-10 on a people / medal basis.

 

 

Yuil is dead! 4hoursearch is now online.

As this was really just a demonstration of the power of Yahoo! BOSS, I have brought the site back as a demonstration site. Additionally, Yahoo! is making the source code to the new site available so anyone with a knack for Python, HTML and CSS can take a swipe at making a better search experience.  In order to make a nice UI I teamed up with another Sam, Sam Lind.  I put together the skeleton using Yahoo!’s amazing YUI tools and he created the look and feel.  Please try it out and take advantage of Yahoo!’s open search API:

 

 

Why 4hoursearch?  It took 4 hours to write the initial code, 4 hours for it to go from unknown to 20 hits / second, 4 hours looking for a domain name and 4 hours to build the brand new UI.  Fortunately, it won’t take 4 hours to find something with it :)

If you want the classic list of links, now enhanced with SearchMonkey results, you can always start here.

Yahoo! BOSS is easy — meet Yuil

Updated Yet Again: Relaunched as 4hoursearch including the source code. See this blog entry.

Updated Again: Yuil is dead. However, you can always get the same great search results here.

Updated: Using Glue I was able to add some simple category functionality.

I’m sure everyone saw the recent announcement of a new search engine, Cuil. I thought I would have a little fun with it and put together a quick parody of it by mashing up their UI and Yahoo!’s search results. As usual, the biggest problems I had were related to my pathetic Python skills. I’d love to add the category stuff in (Yahoo! has that info as you can see in search assist) but BOSS doesn’t yet have that in the API.  But it does have web and image search and even search suggestions. Here is the one, the only, the amazing:

It was great fun to hack together. Check out the BOSS APIs. Maybe I should have converted the UI to YUI as well…

Better Javadoc results using SearchMonkey

When you are searching for things like java.util.HashMap one of the issues that you run into is that it will give you the result with the highest rank which more often than not is the 1.4.2 version of the documentation.  I’ve moved on from that version of Java and would much rather see results for version 6.  I actually did this plugin back in December for the first SearchMonkey hackday and won “most useful” as it could be extended to any type of versioned documentation you might find on the web.  Today I’ll also include my plugin for MySQL but I’ll use Java as the example.

Here is the normal search result that you get on Yahoo:

Normal search result

What I would like to do is give some more options for the user.  Eventually I expect that SearchMonkey might allow per user preferences, but in the interim, I’ll produce links for 1.4.2, 1.5, 1.6 and a link to the entries package page:

Enhanced search result

This gives you direct access to other versions of the classes documentation from the search result page without having to qualify your search terms or scroll through pages of results looking for the one most relevant to you as a developer.  To create this enhanced result go to the SearchMonkey Developer Tool and create a new application.  Choose Enhanced Result rather than Infobar.  The URL pattern that I used was “*.java.sun.com/*”.  Obviously the real work is done in the PHP code for the appearance of the enhanced result:

public static function getOutput() {
    $ret = array();   

$classname = Data::get('yahoo:index/dc:identifier');
    $pattern = "/.*\/docs\/api\/(.*\/[A-Z].*).html/";
    if (preg_match($pattern, $classname, $matches)) {
        $classname = $matches[1];
        $link = $classname;
        $classname = str_replace("/", ".", $classname);
    } else {
        return $ret;
    }

/* pull the package reference out */
    if (preg_match("/(.*)\.([^.]+)/", $classname, $matches)) {
        $packagename = $matches[1];
    }

/* change the title to the name of the class */
    $ret['title'] = $classname;

// Deep links - up to 4
    $ret['links'][0]['text'] = "1.6.0";
    $ret['links'][0]['href'] = "http://java.sun.com/javase/6/docs/api/" . $link . ".html";
    $ret['links'][1]['text'] = "1.5.0";
    $ret['links'][1]['href'] = "http://java.sun.com/j2se/1.5.0/docs/api/" . $link . ".html";;
    $ret['links'][2]['text'] = "1.4.2";
    $ret['links'][2]['href'] = "http://java.sun.com/j2se/1.4.2/docs/api/" . $link . ".html";
    $ret['links'][3]['text'] = $packagename;
    $ret['links'][3]['href'] = "http://java.sun.com/javase/6/docs/api/" . str_replace(".", "/", $packagename) . "/package-summary.html";

return $ret;
}

Once that is done you confirm that you are finished and you will then see these enhanced result when you use alpha.search.yahoo.com.  Here are links to my applications that you can import into your own developer environment:

javadoc-smapp

mysql-smappкомпютри

 

Idiomatic Python?

I’ve been working my way through compiling Java into Python code but the Python back end of my isn’t that good (my brain). I would call my stage of Python development the “magic incantation” stage. This is the stage where you really aren’t comfortable yet with the way things work in a new language but you can still get things done by miming other developers. I’ve also had some help from some friends on Twitter: @lhl, @precipice and @jkwatson.  My distributed information system is now getting some redundancy.  Little did they know that I was doing parallel invocations of identical requests for reliability and incrementally higher performance — and the results were verified using a quorum of responders.

Here is my first service that I am porting. It takes an RSS feed (in JSON format from Pipes) and combines all the entries from each day into a single entry:

import logging
import wsgiref.handlers

from datetime import date
from google.appengine.ext import webapp
from django.utils import simplejson

class DayBinPipesWebService (webapp.RequestHandler):
 def post(self):
 	now = date.today()
 	now = now.strftime("%m/%d/%Y")
 	data = self.request.get("data")
 	items = simplejson.loads(data)["items"]
 	bins = {}
 	for item in items:
 		published = item["y:published"]
 		updateDay = "%(month)02d/%(day)02d/%(year)04d" % published
 		if now != updateDay:
 			bin = bins.get(updateDay, [])
 			bin.append(item)
 			bins[updateDay] = bin
 	entries = []
 	for bin in bins.items():
 		dayDate = bin[0]
 		binEntries = bin[1]
 		first = binEntries[0].copy()
 		first["description"] = ""
 		for e in binEntries:
 			first["description"] += "<p><a href='%(link)s'>%(title)s</a><br>%(description)s</p>" % e
 		first["title"] = "Items from " + dayDate
 		first["link"] = ""
 		entries.append(first)
 	self.response.content_type = "application/json"
 	simplejson.dump(entries, self.response.out)

How would you write this in idiomatic Python as opposed to my rudimentary translation? Would you change the whole design?

Tivo targeted advertising



Tivo targeted advertising, originally uploaded by Sam Pullara.

This looks like it might be both effective and also something that TV advertisers would like to buy.

Using Google App Engine to Extend Yahoo! Pipes

Update: A commenter pointed out that you can

from django.utils import simplejson

instead of including it. Makes this even easier.

Yahoo! Pipes has always been a great tool for manipulating data but often you have to go to great contortions to get it to do what you want because of its very simple data flow programming model.  Google’s App Engine opens up the possibility of extending Yahoo! Pipes in very interesting ways through Pipes’ Web Service operator.  Currently this operator sees little use as it requires you to be running an external server somewhere on the internet that is always available for the Pipe execution which is quite a high barrier to entry for the typical Pipes developer. Here is what a Pipe that is using web service looks like and our example pipe:

Web Service Pipes Example 

With the launch of Google App Engine there is now a very simple way to get code up on the internet quickly in order to include arbitrary processing in the interior of your Pipes.

To demonstrate how this works, let’s first build a very simple web service that simply mirrors the data that it receives from Pipes.   If you don’t have a Google App Engine account you can still follow along by download the SDK and executing all the stuff locally though it will have to be accessible from the public internet if you want Pipes to send you requests.

First create a new application directory:

mkdir pipes-mirror
cd pipes-mirror 

Now create an application descriptor called app.yaml:

application: javarants
version: 1
runtime: python
api_version: 1

handlers:
- url: /.*
  script: pipes.py

This application descriptor basically tells Google how to deploy your application. Your application name should match an application name that you create within the GAE administration tool:

Application Name

Now we need to process the data coming from pipes. Pipes is going to pass this web service some data in JSON format and we need to parse it. GAE doesn’t include ‘simplejson‘ in the Python container so you are going to have to include it with your application. I downloaded simplejson-1.8.1 and symbolically linked its simplejson directory into my application directory. When the request comes in the JSON data will be in the ‘data‘ parameter so we are going to pull it out, parse it, grab the items array and write it back over the wire in pipes.py:

import simplejson
import wsgiref.handlers

from google.appengine.ext import webapp

class MirrorPipesWebService (webapp.RequestHandler):
	def post(self):
		data = self.request.get("data")
		obj = simplejson.loads(data)
		obj = obj["items"]
		self.response.content_type = "application/json"
		simplejson.dump(obj, self.response.out)

def main():
  application = webapp.WSGIApplication([('/mirror', MirrorPipesWebService)],
                                       debug=True)
  wsgiref.handlers.CGIHandler().run(application)

if __name__ == "__main__":
  main()

Now you should have a directory structure that looks a lot like this:

-rw-r--r--@ 1 sam  sam  106 Apr 13 18:55 app.yaml
-rw-r--r--  1 sam  sam  559 Apr 13 19:28 pipes.py
lrwxr-xr-x  1 sam  sam   47 Apr 13 17:40 simplejson -> /Users/sam/Software/simplejson-1.8.1/simplejson

Now that we have all the pieces we can deploy the application to GAE with a simple command from the GAE SDK:

appcfg.py update .

At this point you should be able to replace my web service URL that you find in my example Pipe with your application URL which will be

http://[application name].appspot.com/mirror

and get the same results as mine.

What kind of uses can you put this great power? I currently have a web service that I run that combines RSS entries from the same day into a single entry and have it deployed on my own server. I will likely port that to GAE as it doesn’t require a lot of CPU and it is a pain having to administer it. In fact, most of the functionality that you see in a service like FeedBurner would be easy to build on top of this framework. More exotic use cases can be found on Y! Pipes itself where at least one person uses web services to pass in photo URLs and return the coordinates of human faces in the images.

JPA 2.0 with Criteria

(see: JSR 317 Persistently Improving)

I love the idea of adding a criteria API to JPA, the only thing I hope that they do differently than Hibernate is to implement that API in addition to string queries.  In Gauntlet we had issues where we wanted to use EJB-QL for selecting the right data and then a criteria-like API for applying security and filtering constraints on the query.  We ended up writing a criteria-like API that augmented the WHERE clause of the query to get the behavior that we needed (like described here).  For example, you could do this:

Query q = em.createQuery("SELECT p FROM Project p");
q.addExpression(Expression.notEqual("id", 2));

Or something like that. This would give you the best of both worlds, where you have the expressiveness of the textual query and the ability to further hone that query programmatically.

Macworld 2008 followup

Let’s see:

iPhone: failed to mention 1.1.4 update but the SDK is coming in Feb. The rest nope.  C

AppleTV/iTunes: Movie Rentals yes, missed buy on-iPhone support. AppleTV 2.0 — no computer required. Flickr + DVD + HD + Dolby 5.1, software upgrade! price drop! :) A

MacBook Air: “The World’s Thinnest Notebook” 0.16″ to 0.76″, 13.3″ screen, SSD option, 1.6/1.8 Core 2 Duo,  5 hours battery life, B

Blu-ray: Nothing. F

Monitors: Nothing. F

Other MacBooks: Nothing. F

Obviously the crazy prediction also didn’t make it, not even a wireless option for the Macbook Air…

Macworld 2008 predictions

UPDATE 1/13/08: This looks like confirmation of a new ultralight portable with some sort of wireless connectivity.

Might as well put up my predictions for what will be announced at Macworld this year. I think it will be a good one.

  • iMac, Mac Mini, Macbook and Macbook Pro lines upgraded to Penryn processors
    • Same prices, better specs
  • Ultra portable MacBook Pro ($2-3K)
    • 64G SSD drive standard (32G and 128G available), no DVD drive
    • 12″ OLED screen
    • 8-12 hour battery life
  • iPhone
    • SDK: Still WebKit-based but with native Javascript APIs for the phone
    • Better connectivity
    • Video recording
  • Monitors
    • Same panel as the new 30″ dell
    • Built-in iSights
    • Same prices
  • Blu-ray
    • Added as an option for MacPro and MacbookPro
    • Support for playing Blu-ray movies added to 10.5.2
  • AppleTV / iTunes
    • Movie rentals
    • Software upgrade for current devices
    • Possibly a new version with a blu-ray player

All these I consider relatively expected and if they don’t show up, some people will be disappointed. Now for the truly outrageous prediction:

APPLE WILL WIMAX ENABLE THEIR LAPTOPS AND IPHONE BY PLACING NODES IN THEIR APPLE STORES AND STARBUCKS.

That would be the most awesome thing ever and hopefully that is what is implied with “Something is in the air…”. The other possibility is just 3G enabled systems or both for those times when you are out of range :)

YUI-Mainstream Theme by Buzzdroid.com

 Premium Wordrpess Theme