SixLinks Wiki

Archive of SixLinks.org wiki content, 2008โ€“2009

Data format and Storage

Basics

So this page details the specs of how exactly we're storing all that bloody data. It is VERY subject to revision until it's nailed down. Take right now, when I'm making it up on the spot.

Data. There are four forms we need to archive.

.Sixml format

The file is composed of groups of datasets. Each data set contains:

The file also has a header for the data, applicable to all datasets (this means that each date's set must be in identical order).

Transformations

Whenever possible, the data transforms from raw to sixml should be automated, and the code preserved. We're pretty platform agnostic on this. All code is written in python, and stored in the 'transformationApps' folder.

Transformations from sixml to visualizaion xml must be automated, and will be a part of the back-end visualization code. To begin, we'll have the viz code regenerate manually (one of us hits the button when new data goes live), but eventually this system could be automated. Importantly, transforms will not be done real-time on the live site. For real-time, updated data, we'll be pulling from the database. If we run into a graph that really does update very often, we can re-evaluate this system.

← Main Wiki