image/svg+xml

Data package creation

Data Package creation

Data Package is a format that makes it possible to put your data collection and relevant information that provides context about your data in one container before you share it. In other word its a simple container format for describing a coherent collection of data in a single 'package', consisting of:

Find out more about the data package specifications data package While,Data Package Creator is an online service that facilitates the creation and editing of data packages. The service automatically generates a datapackage.json file for you as you add and edit data that is part of your data collection.

Getting the Data

Being a soil science student, I felt using soil data would be useful for me to better understand this process of packaging data for future use. I got data on impact of fertilizer recommendations on yield and felt it would be great to use it.However, this wasn’t such a good idea as I got so many error messages and clean-ups to do to suit the tabular data accepted by the data packge creator(create.frictionlessdata.io).Similarly incase you want to create a data package using someone elses data it should either have a licence or ask to use the data.Afterwards, I got around to working with a different data set that was more straight forward and easy to work with.The data was on the infection prevalence of ‘Ca. Anaplasma camelii’ in camels and camel keds evaluated in different seasons within a year. The error I recieved is as shown below.

screenshot of error

What I learned from the errors was the fact that the data was 1) really huge for the exercise 2) It had several headings making the data package creator difficult to pick which heading to use and 3) the application is case sensitive when entering the metadata. I therefore had two options either to clean the data manually or using python or use a different data. I decided to get an entirely different data.

Creating the Data Package

We are going to use data package creator to create our data package. So, to start with, we make sure our tabular data is in a csv format. Then load the csv file in the data package creator. Once loaded, we check whether all field are correctly labeled (data types as integers or/and strings). Then added the metadata for the data and then validate. Alas! this one went through as shown below.

screenshot of validate

Validation

Once the data is validated, we can download the datapackage.json file in which you can now share with your network.