Opendap proxies

I've been playing with node.js lately. Since 2007 I've been saying that Javascript is going to be the language for web development before 2012, and looking at AppJet (now bought by Google), Jaxer and node.js it looks like we're in the right direction. Writing server-side code in Javascript makes everything much more simple: you only need to know one language, and you can share code and data objects easily between the browser and the server. Combined with the fact that Javascript is a really nice language to program, I think this will become a new standard before the world ends in 2012. :-P

This month I wrote an Opendap accelerator using node.js. It works as a proxy to a remotely served dataset -- the server connects to the dataset, loads all data to memory, and serves it as a new dataset. Memory usage is constant, and the new server is up to 30x faster than my Python server. For small datasets (like in-situ measurements) it is amazing!

The server itself is pretty simple: 250 lines of code for a server that implements the full Opendap spec (though not all data structures are supported). The interesting fact is that when I first wrote the server I had no intention of using it as a proxy; my idea was to use it as a server loading data from disk. But since I had already written an Opendap client in Javascript, I realized I could combine the two and have an accelerator proxy that is extremely easy to configure and deploy.

One deficiency of the server is that it only supports basic responses for data and metadata. Fancy responses like an HTML form to retrieve the data or a KML response to visualize it in Google Earth are not supported. But we can use Pydap (my Python server) itself as a proxy! Our Pydap proxy works by redirecting specified responses (like DDS/DAS for metadata, or DODS for data) directly to the node.js server, while for other responses (like KML) the proxy will connect as an Opendap client and build the response from the dataset object.

The code for this is pretty simple: just look at the last block of lines in the code below. If the requested response is on the list of those that we should pass upstream just return a 303 status pointing to the node.js server; otherwise load the dataset from node.js as an Opendap client, and let the appropriate response (HTML, eg) take care of the rest of the work. This way, requests for

http://server.example.com/dataset1.dds

http://server.example.com/dataset1.das

http://server.example.com/dataset1.dods?var1[0:1:9]

would be passed directly to the node.js server. These are requests for metadata and data. Other requests like

http://server.example.com/dataset1.wms

would be built by accessing the remote dataset, downloading the data and constructing the appropriate response. This way we can benefit from the speed of the node.js server when downloading data, and also benefit from all the responses that Pydap has (KML, WMS, NetCDF, etc.).

import re

from configobj import ConfigObj

from paste.httpexceptions import HTTPSeeOther

from pydap.handlers.lib import BaseHandler

from pydap.exceptions import OpenFileError

from pydap.client import open_url

class Handler(BaseHandler):

extensions = re.compile(r"^.*\.url$", re.IGNORECASE)

def __init__(self, filepath):

self.filepath = filepath

def parse_constraints(self, environ):

try:

config = ConfigObj(self.filepath)

except:

message = 'Unable to open file %s.' % self.filepath

raise OpenFileError(message)

url = config['dataset']['url']

pass_ = config['dataset']['pass']

response = environ['pydap.response']

if response in pass_:

# forward to the requested response

raise HTTPSeeOther(url + '.' + response)

else:

# load a dataset and return it

return open_url(url)

Roberto De Almeida

since 1978

Opendap proxies