Basics
So this page details the specs of how exactly we're storing all that bloody data. It is VERY subject to revision until it's nailed down. Take right now, when I'm making it up on the spot.
Data. There are three forms we need to archive.
- Raw Data - This is the PDF, Excel sheet, whatever where we get the raw data. We must maintain a local copy of this at all times.
- Parsed Data / .sixml files - This is SixLink's data format. It's described below.
- Vis Data - These are (mostly) .xml files that the actual charts and such use. They include formatting information, and just the data we need to display.
.Sixml format
The file is composed of groups of datasets. Each data set contains:
The data portion contains:
- header row. ... x ... max
- values. [ n ... y ... m ] (note that \"n\" and \"m\" can have a value of \"rel\", which indicates that they should be scaled to the relative mins and maxes of the set.)
- goodness. [ 0 ... z ... 100] Goodness is a value judgement we have about how good a particular column is relative to others. It's used in display (color, size, etc). For example, wind power might have a goodness of 90, where coal would be down at 7.
Transformations
Whenever possible, the data transforms from raw to sixml should be automated, and the code preserved. We're pretty platform agnostic on this. I'd say they should be done in either python, ruby, or php.
Transformations from sixml to visualizaion xml must be automated, and will be a part of the back-end visualization code. To begin, we'll have the viz code regenerate manually (one of us hits the button when new data goes live), but eventually this system could be automated. Importantly, transforms will not be done real-time on the live site. For real-time, updated data, we'll be pulling from the database. If we run into a graph that really does update very often, we can re-evaluate this system.