Science & Society

Specialists build up a new open-source system to oversee and share complex datasets

Data is often at the heart of science—scientists track velocities, measure the light originating from stars, analyze heart rates and cholesterol levels and scan the human brain for electrical impulses.

However, often, sharing that information to different researchers—or with peer-reviewed journal editors, or funders—is troublesome. The software may be proprietary, and restrictively costly to buy. It may take long stretches of training for an individual to have the option to oversee and comprehend the software. Or on the other hand, the organization that made the software may have left business.

A research team has built up an open-source information management system that the researcher’s expectation will solve all of those issues. The scientists outlined their system today in the journal PLOS ONE.

“We wanted to create a file format and a dataset model that would encapsulate the majority of datasets we work on, on all the instruments in a lab,” said Philip Grandinetti, professor of chemistry at The Ohio State University and senior author of the paper. “There’s this long-standing problem, pervasive among scientists, that you buy a multimillion-dollar instrument and the companies that make that instrument have their proprietary format, and it’s a nightmare to share with anyone else.”

Enormous datasets are tricky to share, to a limited extent since the software is frequently proprietary yet in addition to some extent because the files are often so huge that they are difficult to partake in an email or through a cloud-based server. Furthermore, regardless of whether the files can be exported as a file type that can be shared, significant metadata—the things that clarify what the dataset is—are frequently lost.

Their system, which Grandinetti and associates named the “Core Scientific Data Model,” is designed to share complex datasets easily, without massive files that take up a great deal of bandwidth and hard drive space, and without losing metadata. Consider a dataset that incorporates air temperature, air pressure, wind speed, and solar flux—this system can deal with it. Or then again think about the measurements and color of light originating from a star in a distant galaxy—this system can deal with it.

“You need a dataset that is incredibly flexible in its ability to hold all those things in one file format without losing information,” Grandinetti said. “So the idea is we created a model that we thought was flexible enough to do that.”

The Ohio State University group, in collaboration with Professor Thomas Vosegaard at the University of Aarhus in Denmark, and Dr. Dominique Massiot at the University of Orléans in France built software that can run on a Mac or PC. They uploaded it to the web and made the code open-source (which means anybody can see it, use it, and download it for free.) The publication in PLOS ONE is deliberate: The journal is additionally accessible to anybody, free of charge.

Also, the researchers hope, the system could be a basic, free approach to consolidate numerous sorts of information into one place.

“We study multiple datasets as scientists—and as a scientist myself, I’d like to be able to get the data from all those files and put them together in a way that I can work with,” said Deepansh Srivastava, a postdoctoral researcher in Grandinetti’s group.

“Instead of looking for data and plucking it from datasets, if we could simply export it as this one file type—as a core scientific data file type—we’d be able to work in a common system.”

Disclaimer: The views, suggestions, and opinions expressed here are the sole responsibility of the experts. No News Feed Central journalist was involved in the writing and production of this article.

Comment here