Data Sharing

Sharing datasets between research groups has the potential to unleash a vast array of scientific collaborations. Exchanges of data may enable one team to validate another team's work resulting in improvements to the final published product. Teams may also wish to work with other groups' datasets to provide input or boundary conditions to model simulations.

Challenges often arise when trying to share data across teams:

  • people may be reluctant to share data before it has been published by their own group because they may wish to be the first ones recognized in creating a specific product.
  • a lack of metadata can result in confusion and unintentional misuse of data for a specific application.
  • some teams may publish data using proprietary data formats for which other teams may not have licenses.
  • datasets may be of such high volume that they cannot be transferred across teams using conventional approaches (e.g. e-mail or FTP).

Data Sharing Agreements

Similar to agreements for authorship, it is crucial to have early discussions to establish data sharing agreements. The most important component of data sharing is to encourage everyone to commit to a minimum set of metadata standards. Code sharing websites such as GitHub provide an excellent forum for each team to post rich documentation on specific datasets, such as this Ocean Modeling data product generated by a group from HiMAT.

HiMAT data sharing TRIZ

About mid-way through the HiMAT participants in a TRIZ activity were invited to explore "When it comes to sharing your data and the results of your work with each other and the external world what is the worst possible result?" Teams identified scenarios such as incorrect interpretation of model output variables, conveying the wrong advice to regional stakeholders, and generating massive inefficiences due incomplete metadata. From here, teams began to brainstorm on possible solutions, including the development of a team-wide data atlas to document the various modeling products, and the adherence to data sharing agreements when research use each other's products.