Data Barter: Real-life data interactions
As fellows, we've learnt a lot already and done a number of interesting things with Frictionless Data (FD) tools and specifications. This time around, the goal was to model real-life data interactions; trade each other's data and share on the experience working with them. We barter-traded data packages. Specifically, we aimed to:
- understand each other's data,
- recreate a data package with the partner's raw data &
- validate the data package with goodtables
My focus was to reproduce Monica's work, starting from her data pacakge backwards.
Understanding the data
For any reliable attempt on foreign data, you may agree with me, the best place to begin is the metadata. I understand the structure of a data package .json
file, thanks to FD fellowship, and so I know that the first place to look at it the area containing the overall metadata of the whole data set. This is usually found at the bottom of the file, that is, if the handler cared to include any metadata. Be like Monica, she provided metadata. The overall metadata directed to the article associated with the data and provided the license, among other things. I found the field metadata to be very informative, for instance, a field label Day could mean a lot of things in and of itself. However, the entries "title": "Experimental day
and "description": "Experiment ran for 63 days"
contextualises the heading with clarity.
Recreating the data package
To do this you, at least, need the data. This was available. Monica had the associated data in GitHub; in a public repository and under a CC-BY-4.0
license, making it easily accessible
. To reach the data the I copy-pasted the link, the value of the path
key, which contains the resource identifier in a web client. The uniform resource identifier (URI
) makes the data findable
because it is uniquely associated with only that single item universally. The same link was used inside the Data Package Creator. Here is the package I recreated.
Validate the data package
On validating the data package I came across two warnings regarding row limits for the Algae.csv and Tiles.csv tables. There was one type format
error for the Density
field in the Algae data set. The error was occasioned by having the type
as integer
rather than number
, as the field contain floats. On rectifying this in the data package I recreated the type error
disappeared. I raised an issue with the solution on GitHub. The CSV
and the JSON
data formats allowed for interoperability
given we, Monica and I, work on different software and hardware platforms.
Conclusion
Exchanging data packages and working backwards from them is an important test in the illustration of the overall goal of the Frictionless Data initiative. Remember, FD seeks to facilitate and promote open and reproducible research, consequently promoting collaboration. By trying to reproduce Monica's work I was able to capture an error, which I highlighted for her attention, thus improved the work. Exactly how science is supposed to work!