Data Barter: Real-life data interactions
As fellows, we've learnt a lot already and done a number of interesting things with Frictionless Data (FD) tools and specifications. This time around, the goal was to model real-life data interactions; trade each other's data and share on the experience working with them. We barter-traded data packages. Specifically, we aimed to:
- understand each other's data,
- recreate a data package with the partner's raw data &
- validate the data package with goodtables
Understanding the data
For any reliable attempt on foreign data, you may agree with me, the best place to begin is the metadata. I understand the structure of a data package
.json file, thanks to FD fellowship, and so I know that the first place to look at it the area containing the overall metadata of the whole data set. This is usually found at the bottom of the file, that is, if the handler cared to include any metadata. Be like Monica, she provided metadata. The overall metadata directed to the article associated with the data and provided the license, among other things. I found the field metadata to be very informative, for instance, a field label Day could mean a lot of things in and of itself. However, the entries
"title": "Experimental day and
"description": "Experiment ran for 63 days"contextualises the heading with clarity.
Recreating the data package
To do this you, at least, need the data. This was available. Monica had the associated data in GitHub; in a public repository and under a
CC-BY-4.0 license, making it easily
accessible. To reach the data the I copy-pasted the link, the value of the
path key, which contains the resource identifier in a web client. The uniform resource identifier (
URI) makes the data
findable because it is uniquely associated with only that single item universally. The same link was used inside the Data Package Creator. Here is the package I recreated.
Validate the data package
On validating the data package I came across two warnings regarding row limits for the Algae.csv and Tiles.csv tables. There was one
type format error for the
Density field in the Algae data set. The error was occasioned by having the
integer rather than
number, as the field contain floats. On rectifying this in the data package I recreated the
type error disappeared. I raised an issue with the solution on GitHub. The
CSV and the
JSON data formats allowed for
interoperability given we, Monica and I, work on different software and hardware platforms.
Exchanging data packages and working backwards from them is an important test in the illustration of the overall goal of the Frictionless Data initiative. Remember, FD seeks to facilitate and promote open and reproducible research, consequently promoting collaboration. By trying to reproduce Monica's work I was able to capture an error, which I highlighted for her attention, thus improved the work. Exactly how science is supposed to work!