Better Data: One resource at a time - My fellowship experience

Getting into the Frictionless data fellowship

My background is largely in molecular biology, particularly infection diagnostics targeting arthropod viruses, bacteria and protozoa. I have a relatively shorter bioinformatics experience, but this is the direction am passionate to build my research occupation in. I first heard about Frictionless data from the African Carpentries instructors’ mailing list. It was the inaugural fellowship call that had been shared by Anelda. I caught it at the nick of time; deadline submission! By the way, you can watch for annual calls and other interesting stuff by following @frictionlessd8a. The call for the second cohort just closed in June and was open from late April. The fellowship starts in September.

On-boarding

Lilly arranged the first-time meeting to usher me into the fellow, after a few email correspondence. I got introduced Jo Barrat who patiently took me through my paces completing logistical preliminaries. I was really looking forward to getting started. The on-boarding enabled acquaintance with the rest of the fellows, awesome people. I was excited!

Context

Overall, the world is in search of and is promoting better ways to work with data, whether it is collecting data or accessibility or novel ways to analyse high-throughput data or dedicated workflows to publish data alongside accustomed scientific publishing or moving/working with data across frameworks or merely storage and security. All these, plus other factors, provide avenues to exhaustively interrogate data in multiple ways, thus promoting improved data usefulness. This has been arguably under-appreciated in times past. Frictionless data, through its Progressive Data Toolkit and with the help of organisations like OKF and funding by Sloan Foundation, is dedicated to alleviating hindrances to some of the aforementioned efforts. People empowerment is a core resource to the #BetterData dream.

The fellowship

An aspect of any research is the collection of data, which is applied to test hypotheses under study. The underlying importance of data, good data for that matter, in research is therefore unquestionable. Approaches to data analysis may differ from field to field, yet there are conventional principles that do not discriminate fields; such are the targets to Frictionless Data. I jump at the opportunity to learn ways to ramp up my data workflow efficiency, with a touch of research openness and reproducibility.

The journey took off withdrawing a meticulous roadmap, which I found very helpful, and seem to end with this – sharing my experience. In between exciting things happened. In case one was coming in a little rusty with their basic Python/R, they were catered for early on, though you didn’t exactly need them to use the tools. To say, literally, ZERO programming skills prerequisite. There were a plethora of resources, and help from the fellows, not to mention from the ever welcoming Lilly. The core sections of the Fellowship were prefaced by grasping basic components like the JSON schema data interchange format. Following were the core tools and their specifications. The Data Package Creator tool is impressively emphatic on capturing metadata, a backbone theme for reproducibility. I found Table Schema and Schema specifications initially confusing. Other fellows and I have previously shared on the Data Package Creator and GoodTables, tools for creating and validating data packages respectively. These tools are very progressive, continually incorporating feedback from the community, including fellows, to improve user experience. So don’t be surprised at a few changes since the fellows’ blogs. In fact, a new entrant, which I only knew of recently, is the DataHub tool - “Is a useful solution for sharing datasets, and discovering high-quality datasets that others have produced”. I am yet to check it out.

Besides the main focus of the fellowship, I got to learn a lot covering organisational skills and tools such as GitHub projects, Toggl for time-monitoring, general remote working, among others. I got introduced to new communities/initiatives such as PREreview; my first time to participate in open research reviewing. The fellows were awesome to work with and Lilly Winfree provided the best mentorship.

Sometimes problems are foreseen and contingencies planned, other times unforeseen surprises rear their heads into our otherwise “perfect” plan. Guess what? You nailed it! COVID-19. Such require adaptability akin to that of the fictional El Professor in the Money Heist. Since we could not organise the in-person seminar and/or workshops as part of the fellowship, we collectively adopted a virtual workshop. It went amazingly well.

What next

Acquired knowledge and skills become more useful when implemented. My goal is to apply them in every opportune opening and to keep learning other integrative tools. Yet, there is also this about knowledge; it is to be spread. I hope to compensate for suspended social sessions and to keep engagement with @frictionlessd8a. To continue open and reproducible research advocacy.

Conclusion

Tools that need minimal to no coding experience support well the adoption of good data hygiene practices, more so in places with scanty coding expertise. The FD tools will surely help your workflows with some greasing regardless of your coding proficiency, especially for tabular data. This is especially needful seeing the deluge of data persistently churned out from various sources. Frictionless Data is for everyone working with data; researcher, data scientist or data engineers. The ultimate goal is to work with data in an open and reproducible way, which is consistent with modern scientific research practice. A concerted approach is also key, I am glad to have represented Africa in the fellowship. Do not hesitate to reach out if you think I can be resourceful to your cause.