Opendap proxies

I've been playing with node.js lately. Since 2007 I've been saying that Javascript is going to be the language for web development before 2012, and looking at AppJet (now bought by Google), Jaxer and node.js it looks like we're in the right direction. Writing server-side code in Javascript makes everything much more simple: you only need to know one language, and you can share code and data objects easily between the browser and the server. Combined with the fact that Javascript is a really nice language to program, I think this will become a new standard before the world ends in 2012. :-P
 
This month I wrote an Opendap accelerator using node.js. It works as a proxy to a remotely served dataset -- the server connects to the dataset, loads all data to memory, and serves it as a new dataset. Memory usage is constant, and the new server is up to 30x faster than my Python server. For small datasets (like in-situ measurements) it is amazing!
 
The server itself is pretty simple: 250 lines of code for a server that implements the full Opendap spec (though not all data structures are supported). The interesting fact is that when I first wrote the server I had no intention of using it as a proxy; my idea was to use it as a server loading data from disk. But since I had already written an Opendap client in Javascript, I realized I could combine the two and have an accelerator proxy that is extremely easy to configure and deploy.
 
One deficiency of the server is that it only supports basic responses for data and metadata. Fancy responses like an HTML form to retrieve the data or a KML response to visualize it in Google Earth are not supported. But we can use Pydap (my Python server) itself as a proxy! Our Pydap proxy works by redirecting specified responses (like DDS/DAS for metadata, or DODS for data) directly to the node.js server, while for other responses (like KML) the proxy will connect as an Opendap client and build the response from the dataset object.
 
The code for this is pretty simple: just look at the last block of lines in the code below. If the requested response is on the list of those that we should pass upstream just return a 303 status pointing to the node.js server; otherwise load the dataset from node.js as an Opendap client, and let the appropriate response (HTML, eg) take care of the rest of the work. This way, requests for
 
 
would be passed directly to the node.js server. These are requests for metadata and data. Other requests like
 
 
would be built by accessing the remote dataset, downloading the data and constructing the appropriate response. This way we can benefit from the speed of the node.js server when downloading data, and also benefit from all the responses that Pydap has (KML, WMS, NetCDF, etc.).

import re
from configobj import ConfigObj
from paste.httpexceptions import HTTPSeeOther
from pydap.handlers.lib import BaseHandler
from pydap.exceptions import OpenFileError
from pydap.client import open_url

class Handler(BaseHandler):
    extensions = re.compile(r"^.*\.url$", re.IGNORECASE)
    def __init__(self, filepath):
        self.filepath = filepath
    def parse_constraints(self, environ):
        try:
            config = ConfigObj(self.filepath)
        except:
            message = 'Unable to open file %s.' % self.filepath
            raise OpenFileError(message)
    
        url = config['dataset']['url']
        pass_ = config['dataset']['pass']
        response = environ['pydap.response']
        if response in pass_:
            # forward to the requested response
            raise HTTPSeeOther(url + '.' + response)
        else:
            # load a dataset and return it
            return open_url(url)

This is why I love vi

How about deleting everything up to the next slash? "df/" (delete, find character, slash). Delete the current paragraph? "dip" (delete inner paragraph). Delete current word? "diw" (delete inner word).

All the commands can be compounded, which is why it's so powerful. "d" is delete. "dd" is delete line, so "5dd" deletes 5 lines. Likewise, "d5l" deletes 5 characters to the right (l = move right; hjkl is used for movement, but arrow keys also work, so it's "delete 5 right"), and "d5k" (or d5 arrow up) deletes five lines upwards.

If you're standing on a character such as (), {}, [] or <> which has an opposite counterpart, d% will delete it and everything up to (and including) the matching one (not the next one, in case stuff's nested). Very useful when programming.

The learning curve is steep, and the rewards are great.