OPeNDAP authentication

I just read this interesting article about authentication and APIs. The author recommends not using HTTP Basic authentication for APIs, since it can be easily eavesdropped, and suggests that if you want to use HTTP instead of HTTPS there's a good compromise between security and implementation called HMAC-SHA:

"Essentially the request is hashed with as shared secret as a key, a nice side effect of this is you can add in the time of the request as a parameter, thereby sending a different signature each time, making it possible to expire keys, and prevent replay attacks."

Lately I've been thinking again about authentication for OPeNDAP clients, and how would it be possible to secure access to private datasets at Marinexplore. One problem here is that there are many different implementations of OPeNDAP clients (like R, Matlab, or Ferret), over which we have no control. These applications use libraries like libdap to handle the connection, and usually the only way of authenticating is by adding the credentials to the URL in the form http://{username}:{password}@example.com/dataset.

There are two reasons why I don't like this solution. First, if the server supports only HTTP Basic the credentials will be sent unencrypted without any warning, and the only way to know if HTTP Digest is supported is by checking the server headers -- not something most users would do. But more importantly, since we don't know the user's password it's not possible to display a URL that can be copied and pasted for OPeNDAP access. Instead, we need to show the URL and instruct the user to insert his password, which he likely forgot.

The solution we use is to create for each user an OPeNDAP token that is included in the dataset URL path. This allows the URL to be copied and pasted, and allows the URL to be shared between users. And if the URL falls into wrong hands it's possible to revoke the OPeNDAP token and get a new one. An important detail is that the token has to be in the URL path, not in the query string, since you can not specify additional parameters in the URL in most clients (Pydap is an exception).

Reading the HMAC-SHA authentication article I wondered if we could use it for our datasets, by creating the signatures on the website with a long expiration for the timestamps. But HMAC-SHA also has the problem that it requires additional arguments to be added to the query string, so it doesn't work with most of the available clients.

Moore’s Law and the Origin of Life

Sharov and Gordon say their interpretation also explains the Fermi paradox, which raises the question that if the universe is filled with intelligent life, why can’t we see evidence of it.

However, if life takes 10 billion years to evolve to the level of complexity associated with humans, then we may be among the first, if not the first, intelligent civilisation in our galaxy. And this is the reason why when we gaze into space, we do not yet see signs of other intelligent species.

Interesting article, combining panspermia, the Fermi paradox and the origin of Earth. This has always been "my solution" to the Fermi paradox: we're the first.

Streaming data via OPeNDAP

Last month I finally got myself a Raspberry Pi, when I participated in PyCon here in Santa Clara. This weekend I decided to play with it and a concept that I've been working on the new release of Pydap (available at my repo): streaming real time data via OPeNDAP. While the Pydap server always had the capability of streaming infinite datasets, as far as I know there were no clients that could process the data stream in real time. This is a feature that I've wanted to implement for a long time, but it required a lot of refactoring, involving changes in both the HTTP library and the XDR unpacking. The current repo has a simpler implementation of the client, and I finally managed to get it working.

In order to put it to use I created a special Pydap server on my Raspberry Pi, streaming data from a temperature sensor. The sensor was connected to the Raspberry Pi following this tutorial, and measurements are read using the RPi.GPIO library. We then create a Pydap dataset with a Sequence variable, normally used to represent sequential data from a database or a CSV file. The major difference here is that the Sequence data will be created on the fly by reading from the sensor — i.e., the dataset is materialized when requested, instead of being stored in memory or disk. The server will stream the data as fast as the client can consume it, since the whole process is based on Python generators.

The code itself is pretty simple. All we need to do is to create the SensorData class, which defines a generator yielding tuples with the values. The get_temperature() function can be derived from the tutorial.

from pydap.model import *
from pydap.handlers.lib import IterData, BaseHandler

class SensorData(IterData):                                                     
    """                                                                         
    Sensor data as a structured array like object.                              

    """                                                                         
    def gen(self):                                                              
        while 1:                                                                
            timestamp = time.time()                                             
            temperature, voltage = get_temperature()                            
            yield timestamp, voltage, temperature                               
            time.sleep(0.1)                                                     

dataset = DatasetType('roberrypi')                                          
seq = dataset['sensor'] = SequenceType('sensor')                            
seq['time'] = BaseType('time', units='seconds since 1970-01-01')            
seq['voltage'] = BaseType('voltage', units='mV')                            
seq['temperature'] = BaseType('temperature', units='deg C')                 
seq.data = SensorData('sensor', seq.keys())

if __name__ == '__main__':
    app = BaseHandler(dataset)                                                  
    from werkzeug.serving import run_simple                                     
    run_simple('0.0.0.0', 8080, app, use_reloader=True, threaded=True)

Running the script will create a server on http://localhost:8080/, and you check the typical OPeNDAP responses at http://localhost:8080/.{das,dds,dods}. If you want to look at the data itself you can check the ASCII response at http://localhost:8080/.asc, or use the development version of Pydap to access it like any other dataset.

I also create a simple Flask application that reads the data stream from the ASCII response and plots it on a real-time graph using Smoothie Charts. You can see it working here: http://69.181.252.12:5000/ (the data takes a while to load because the browser will accumulate a certain number of bytes before generating events) Update: I'm running a server at http://vps.dealmeida.net:5000/ (DDS|DAS) serving the CPU load and the number of bytes transferred. There's a static page on a different server showing the data at http://dealmeida.net/opendap-streaming/. The plot reads the data from the binary (dods) response using CORS-enabled XHR and parses it using some black magic. I'll post about this later.

Photo by Vanessa Schott

Using ctags in vim with a Python virtualenv

Yesterday I was talking about Python programming at home with the guys that came for Pycon. We were discussing terminal background colors, favorite editors, web frameworks, and so on, and I mentioned that I have a hard time navigating deep Python code on vim, which is what I use for development. Goran reminded me of ctags, which I learned some 15 years ago. Turns out it's pretty simple to use it with vim to search Python code.

I just had to do some modifications because I run multiple virtualenvs. I added the following two lines to my ~/.vimrc file after installing ctags:

map <S-F11> :!ctags -R -f $VIRTUAL_ENV/tags $VIRTUAL_ENV/lib/python2.7/site-packages<CT>
set tags=$VIRTUAL_ENV/tags

Pressing shift-F11 will index all the source in the virtualenv. I can then jump to any class definitions by pressing ctrl-] while the cursor is over a class. Pressing ctrl-t brings me back. Works great. One thing I missed is that it doesn't seem to be able to jump to variable definitions, only classes and functions/methods. Looks like I would need something like pycscope for this.

Posthaven

If you can read this the migration went fine. I have moved my blog from Posterous to Posthaven. I'm still developing my own blog engine, which will hopefully be that last one. Stay tuned.

GoPro Evolution: From 35mm Film To America's Fastest-Growing Camera Company

Consumers are no longer spending their money on point-and-shoot cameras–pocket cameras–because they already have that in the form of their smartphone,” says Woodman. “So they have disposable income for something like a GoPro, which is highly differentiated from a smartphone.

This is exactly why I bought a GoPro. I have a smartphone with an excellent camera that's always with me. And when I go diving, biking or snowboarding I can use the GoPro.

How to build a news app that never goes down and costs you practically nothing

t’s an opinionated template for building client-side apps, lovingly maintained by Chris, which provides a skeleton for bootstrapping projects that can be served entirely from flat files.

Briefly, it ships with:

For a more detailed rundown of the structure, check out the README.

There’s a lot of work that went into this app template and a fair amount of discipline after each project we do to continue to maintain it. With every project we learn something new, so we backport these things accordingly to ensure our app template is in tip-top shape and ready to go for the next project.

Really cool template for creating static apps using Flask/Jinja, LESS/JST Bootstrap and Fabric.

Interview: Makai Ocean Engineering

As I mentioned above, multiresolution data refers to data that is stored using level-of-detail or LOD technology. You can think of this data as being stored in a ‘pyramid’ – with the top of the pyramid representing one single low/coarse resolution chunk of data, and the bottom of the pyramid representing many high/fine resolution chunks of data. When you view the data from far away, you will only see a low resolution chunk. As you get closer and closer, higher resolution data is paged into your screen.

The term streaming simply refers to the fact that the data can be streamed from any location – not only your local hard drive. The data can sit on a remote server in the multiresolution / LOD format, and your client software will request data to be streamed in as needed. Depending on your connection speed, streaming over a network can be nearly as fast as paging from your local hard drive.

Interesting company. And they're using OPeNDAP, WFS, WMS and SOS.