Writing a OGC web service in Python using WSGI

I’m working on a new implementation of the WMS response for Pydap. The response allows you to transform any served dataset into a full WMS server by appending .wms to its URL. This is useful for visualizing datasets on an embedded Openlayers or Google Maps widget, for example.

There are three basic OGC web services — WMS, WFS and WCS — and their behavior is very similar. For WMS, the first request issued by a client is to get the supported capabilities of the server:

http://example.com/dataset.wms?SERVICE=WMS&REQUEST=GetCapabilities

The server will return a XML document describing what layers are available, what image formats it supports, and so on. WMS server return images of the dataset, depending on the request. Here' a hypothetical request for an image:

http://example.com/dataset.wms?
    SERVICE=WMS&
    REQUEST=GetMap&
    BBOX=-180,-90,180,90&
    LAYERS=SST&
    FORMAT=image/png

Some of these parameters in the query string are compulsory (eg, SERVICE=WMS), some are optional with defaults (eg, BBOX) and others have a list of possible values (eg, LAYERS must be a valid layer). Handling this on WSGI (using something like WebOb) can take a lot of work, so I wrote a decorator to simplify the process.

Using the decorator, we can write our WSGI app like this:

class WMSServer(object):
    def __init__(self):
        pass

    def __call__(self, environ, start_response):
        # I'll talk about this later
        pass

    @wxsrequest(SERVICE='WMS', LAYERS=['SST', 'SLP'])
    def GetMap(self, LAYERS, BBOX='-180,-90,180,90', FORMAT='image/png'):
        pass

The wxsrequest decorator takes care of all the bookkeeping. It will ensure that:

  1. SERVICE=WMS is set in the request URL.
  2. LAYERS is set, and is either SST or SLP.

It will also map the query string parameters to the function, passing LAYERS, BBOX and FORMAT in the function signature, so we don’t need to extract them from the query string.

How does __call__ look like? This is our dispatcher:

from webob import Request
from webob.exc import *

def __call__(self, environ, start_response):
    req = Request(environ)
    try:
        request = req.GET.pop('REQUEST')
        method = getattr(self, request)
        res = method(req)
    except KeyError:
        res = HTTPBadRequest('Missing parameter "REQUEST".')
    except AttributeError:
        res = HTTPBadRequest('Invalid parameter for "REQUEST".')
    except:
        res = HTTPInternalServerError()
    return res(environ, start_response)

What it does is: if REQUEST is not present in the query string it will return a 400 Bad Request status saying so; the same if it set to a value that is not available as method in our class. Other errors are returned as 500 Internal Server Error, and we can optionally add the traceback to the response here, to give more information.

Assuming everything goes ok, the request object from WebOb is passed to the corresponding method, that is wrapped by the decorator. The decorator will ensure that the query string is correct and unwrap the parameters as arguments to the method call.

The final piece here is our decorator:

import inspect

def decorator(method):
    def out(self, req):
        # check that all required arguments where passed
        spec = inspect.getargspec(method)
        n = len(spec.args) - len(spec.defaults or [])
        for arg in spec.args[1:n]:  # skip self
            if arg not in req.GET:
                return HTTPBadRequest('Missing parameter "%s".' % arg)

        # check if parameters are all valid
        for arg, value in required.items():
            if not isinstance(value, (list, tuple)):
                value = [ value ]
            if arg not in req.GET:
                return HTTPBadRequest('Missing parameter "%s".' % arg)
            elif req.GET[arg] not in value:
                return HTTPBadRequest('Invalid parameter "%s".' % arg)

        # remove extra parameters from the URL
        kwargs = { k.lower() : v for (k, v) in req.GET.items()
                if k in spec.args }
        try:
            return method(self, **kwargs)
        except TypeError:

    return out
return decorator

The decorator uses the inspect module to analyze the method signature and ensure that all required parameters are available and conform to the required values. It also strips other parameter in the query string that are not required by the method; this removes the need of writing our methods with a **kwargs catchall for unrelated parameters that may be on the URL.

Xargs to run commands in parallel

I recently found out xargs had options to parallelize what it is working on. I finally had a good reason to try it. I'm processing log files for the last year. Each day is it's own unique standalone task. My workstation is has 1 CPU with 6 cores that are hyperthreaded to give 12 logical cores. So... I asked xargs to run the processing script with 6 day log files and to run 10 processes in parallel. Zoom!
ls 2012*.tar | xargs -n 6 -P 10 process_log_files.py

Cool, I didn't know that xargs can parallelize it's work.