Using Google App Engine to Extend Yahoo! Pipes

Using Google App Engine to Extend Yahoo! Pipes

Update: A commenter pointed out that you can

from django.utils import simplejson

instead of including it. Makes this even easier.

Yahoo! Pipes has always been a great tool for manipulating data but often you have to go to great contortions to get it to do what you want because of its very simple data flow programming model. Google’s App Engine opens up the possibility of extending Yahoo! Pipes in very interesting ways through Pipes’ Web Service operator. Currently this operator sees little use as it requires you to be running an external server somewhere on the internet that is always available for the Pipe execution which is quite a high barrier to entry for the typical Pipes developer. Here is what a Pipe that is using web service looks like and our example pipe:

With the launch of Google App Engine there is now a very simple way to get code up on the internet quickly in order to include arbitrary processing in the interior of your Pipes.

To demonstrate how this works, let’s first build a very simple web service that simply mirrors the data that it receives from Pipes. If you don’t have a Google App Engine account you can still follow along by download the SDK and executing all the stuff locally though it will have to be accessible from the public internet if you want Pipes to send you requests.

First create a new application directory:

mkdir pipes-mirror
cd pipes-mirror

Now create an application descriptor called app.yaml:

application: javarants
version: 1
runtime: python
api_version: 1
- url: /.*
This application descriptor basically tells Google how to deploy your application. Your application name should match an application name that you create within the GAE administration tool:
Now we need to process the data coming from pipes.  Pipes is going to pass this web service some data in JSON format and we need to parse it.  GAE doesn't include 'simplejson' in the Python container so you are going to have to include it with your application.  I downloaded simplejson-1.8.1 and symbolically linked its simplejson directory into my application directory.  When the request comes in the JSON data will be in the 'data' parameter so we are going to pull it out, parse it, grab the items array and write it back over the wire in
import simplejson
import wsgiref.handlers
from google.appengine.ext import webapp
class MirrorPipesWebService (webapp.RequestHandler):
def post(self):
data = self.request.get("data")
obj = simplejson.loads(data)
obj = obj["items"]
self.response.content_type = "application/json"
simplejson.dump(obj, self.response.out)
def main():
application = webapp.WSGIApplication([('/mirror', MirrorPipesWebService)],
if __name__ == "__main__":
Now you should have a directory structure that looks a lot like this:
-rw-r--r--@ 1 sam  sam  106 Apr 13 18:55 app.yaml
-rw-r--r-- 1 sam sam 559 Apr 13 19:28
lrwxr-xr-x 1 sam sam 47 Apr 13 17:40 simplejson -> /Users/sam/Software/simplejson-1.8.1/simplejson
Now that we have all the pieces we can deploy the application to GAE with a simple command from the GAE SDK: update .
At this point you should be able to replace my web service URL that you find in my example Pipe with your application URL which will be
http://[application name]
and get the same results as mine.
What kind of uses can you put this great power?  I currently have a web service that I run that combines RSS entries from the same day into a single entry and have it deployed on my own server.  I will likely port that to GAE as it doesn't require a lot of CPU and it is a pain having to administer it.  In fact, most of the functionality that you see in a service like FeedBurner would be easy to build on top of this framework.  More exotic use cases can be found on Y! Pipes itself where at least one person uses web services to pass in photo URLs and return the coordinates of human faces in the images.