6.6 Data licenses

Choosing an approach to structure the inventory of your data assets

Why should I do this?

To enable the responsible and ethical use of data. It is a crucial aspect in the evolving landscape of information exchange.

 

How do data licenses relate to FAIR?

 

Reusability: Licensing your own dataset, if it is being released as part of a product, provides legal conditions for the ways other people can re-use it.

 

Download this data licenses factsheet for more insights.

 

What is a data license?

A data license is the legal framework for granting permission to access, use and share data, as well as to monetize it.

 

1) If you are a Program Officer (PO), you may want to share this page directly with your grantee, so they can act on it.

2) If you are a grantee, ensure you have technical team members involved in this process. While the content is accessible to both technical and non-technical members, technical expertise will be required to make decisions for the investment in this step.

3) If you have not already downloaded ‘Project SIS’ or ‘Waterways’, the illustrative scenarios provide examples on how each theme is navigated. These scenarios are frequently referred to across the content in Step 6 to help you understand how different aspects within a theme are applied.

 

Things to consider for your investment:

©Gates Archive/Mansi Midha ©Gates Archive/Mansi Midha
  • Refer to the illustrative scenario that you have downloaded to see how this has been considered.
  • Ensure any work notes or decisions taken are being documented, as this would be useful to refer to at later stages or for someone new joining the team.

Only the specific theme related content has been highlighted here. To get a feel for the scenario, read here.

 

1. Data onboarding

Onboarding TPP data like that received from Visual Crossing must be used in a legal way, as dictated by the datasets’ licenses. We have examined the license for use of Visual Crossing data (in its terms of use [link to be inserted-https://www.visualcrossing.com/weather-services-terms ]) to ensure the data can be used for our purposes.

 

In doing so, we identified the type of membership we require and the cost this would incur. We are in the process of exploring whether information on the Normalized Difference Vegetation Index (NDVI) of regions in Dataland’s highlands would be valuable to include in the SIS. If so, this data would be bought as an asset from a commercial satellite source, in which case data licensing problems may arise: sharing proprietary data of satellite companies with the final users of SIS may require a specific legal contract between SoilScience and said companies. If that is not possible, NDVI data will not be onboarded to the project at all. Similarly, consent is required from farmers to collect data from their land (with regard to soil samples) and their experiences (with regard to interviews).

 

We will draft a data-sharing agreement for farmers to sign in order to provide consent, with its terms published on the project website as part of the license/terms of use for SIS.

 

2. Data analysis

Preliminary analysis on the SIS’ coverage of Dataland as well as the identification of general trends will be done with Python or R notebooks. Given these are third-party tools, the licenses for TPP datasets must be consulted to understand their restrictions for use. For Visual Crossing data, analysis with third-party software is allowed in its terms of use. Proprietary commercial satellite data, if used, will likely have different restrictions.

 

3. Data products

We will need to investigate how to license the SIS data to researchers beyond Dataland’s MOA and our project partners, both of whom we will have data-sharing agreements with. We could use an open license like Creative Commons 4.0, in which case we will reserve attribution rights, but otherwise the data is free to use for any sort of project, although another option would be a non-commercial license (Creative Commons BY-NC 4.0).

 

The latter may be preferable to fit the overall community aims of SoilScience, but user research and some further stakeholder interviews would be required to judge what license to choose.

Only the specific theme related content has been highlighted here. To get a feel for the scenario, read here.

 

1.Data onboarding

Satellite data that we collect from TPPS must be used in a legal way, as dictated by the datasets’ licenses. Depending on the license, we may not have the authority to use data in certain ways. In the best case, we might only have to credit the original data authors (as is the case with CC BY 4.0 licenses); whereas in the worst case, licenses may have ‘No Derivatives’ clauses that could prohibit the publishing of the topographical map as its own resource (as is the case with CC BY-NC-ND 4.0s). Similar to licensing, we must understand the means of authorization to get access to TPP satellite data. Some datasets will require fees for access, which must be budgeted for. Thankfully, SoilScience has a professional network that may be able to help with fee waivers. Finally, consent is required from LFs to collect their data via interviews.

 

2. Data processing

Work with the satellite data will most likely be done with third-party tools. Besides the standardization script in Python, we will also use GIS software for the map presentation. Sometimes, data licenses can prohibit the use of third-party tools on the data, so our terms of use for the satellite data must be considered.

 

3. Data analysis

Data will be analyzed using R or Python notebooks. These are both third-party tools, so licenses for the satellite data must be checked. This should be done in the ‘Data processing’ stage, but is worth considering here too.

 

4. Data products

We will need to make sure we are allowed to publish the satellite data in the topographical map. If the data is licensed with a non-derivative component, that would be impossible. This should have been addressed in the data onboarding, so it is hoped that it would not be a concern by the time of product release.

The theme of data licenses can be important at different stages of your project, whether or not you expect that to be the case. To help you incorporate them into your project planning, this section provides suggestions about where you should think about the theme, structured using the stages from the Data Value Chain (DVC).

 

The DVC is a way of viewing the process of running a project from the point of view of the data, thereby identifying how it is onboarded, processed, enriched, analyzed and released in a product. In doing so, the DVC shows the moving parts in project implementations, making it a useful framework regarding the general steps of any project working with data.

 

 

For planning and implementation of projects and programs, data sharing among various stakeholders is very key and also prior to data sharing, data quality issues on FAIR are paramount.

Learn more
Was this page helpful?
YesNo