6.7 Data privacy and security

Choosing an approach to structure the inventory of your data assets

Why should I do this?

To protect personal information and to prevent unauthorized access or misuse.

 

How does data privacy and security relate to FAIR?

 

  • Accessibility: Login systems work to build accessibility, where authorized people can securely work while data privacy constraints are well-respected, therefore being inclusive of stakeholders’ privacy requirements. It is important to recognize that accessibility does not mean ‘opening datasets to all’, but rather ‘opening datasets to the right people’.
  • Reusability: If your dataset has been cleared of all sensitive data, therefore adhering to privacy and security requirements, only then can it be released as part of a product for reuse. If privacy and security is not considered when releasing your product, you are personally liable for any resulting damage.

Download this data privacy and security factsheet for more insights.

What is data privacy and security?

 

Data privacy and security refers to the protection of personal information, ensuring that sensitive data is handled, stored and processed securely to prevent unauthorized access or misuse. It encompasses a range of principles and practices that safeguard individuals’ privacy in the digital age.

1) If you are a Program Officer (PO), you may want to share this page directly with your grantee, so they can act on it.

2) If you are a grantee, ensure you have technical team members involved in this process. While the content is accessible to both technical and non-technical members, technical expertise will be required to make decisions for the investment in this step.

3) If you have not already downloaded ‘Project SIS’ or ‘Waterways’, the illustrative scenarios provide examples on how each theme is navigated. These scenarios are frequently referred to across the content in Step 6 to help you understand how different aspects within a theme are applied.

 

Things to consider for your investment:

©Gates Archive/Mansi Midha ©Gates Archive/Mansi Midha
  • Refer to the illustrative scenario that you have downloaded to see how this has been considered.
  • Ensure any work notes or decisions taken are being documented, as this would be useful to refer to at later stages or for someone new joining the team.

Only the specific theme related content has been highlighted here. To a get feel for the scenario, read here.

 

1. Data onboarding

The number of stakeholders in project SIS will require a keen eye over privacy and security of data. For onboarding, SoilScience researchers will have the responsibility of bringing in TPP data, but on-the-ground data collection is spread between the GSS teams and the project partners. Security of data was a concern raised during the initial proposal of the project, given the vulnerable status of farmers in Dataland. Malicious actors that came across their interviews or soil sample test results could potentially take advantage of the farmers with fraudulent schemes or predatory competition. To avoid this problem, only PPs and authorized staff at the MOA will be able to onboard (raw) data collected to the project’s centralized repository, with further responsibility for ensuring deletion of any data stored in unsecured, local devices (like those used to record interviews) and keeping oversight on GSS and farmers.

This system will be governed by a simple login system. As per the CDM, the centralized repository data will be onboarded to is a private Github repository, with access only provided to SoilScience, PPs, and authorized staff at the MOA. SoilScience will audit the work of all parties in data collection to ensure privacy and security.

 

2. Data processing

All PII will be removed from the data received from Dataland, as recommended in the data privacy and security workbook. This includes deletion of names, contact details, and specific location (although latitude and longitude will be reserved for dataset linkage in the next stage before deletion). For the sake of version control, this processing is not done by overwriting the raw data sources, but by making a ‘clean’ copy of each one. However, after the processing is complete, the raw data sources will be deleted from the repository.

 

3. Data enrichment

After the datasets are linked, the latitude and longitude columns will have served their purpose. Although it would be nice to keep their measurements for each soil sample and its corresponding meteorological data points for reproducibility, releasing latitude and longitude in the final SIS dataset would put farmers at risk. However, to retain a level of geographic granularity, latitude and longitude will be used to categorize entries in the dataset by sub-regions in the Dataland Highlands. We will likely use sub-regions based on the data points’ relation to landmark geographic features—for example, some data will be categorized as ‘West side of Mt. Data’. Once these categorizations are made, latitude, longitude, and linking fields can be removed from the dataset. It is likely that we will delete these closer to time of product release so that we can validate our work and link further datasets if required.

 

4. Data analysis

More high-level analysis will be run on the data, as well as the entire repository, to ensure it is clean and, importantly, that farmers’ details have been deleted properly. Only authorized researchers at SoilScience will be allowed to do so, although Dataland’s MOA will be consulted on findings afterward.

Only the specific theme related content has been highlighted here. To get a feel for the scenario, read here.

 

1. Data onboarding

Our main concern with regard to privacy and security is ensuring that the data is only accessible to core people in the project. Although the data is not necessarily sensitive, we would like to keep things organized and safe, especially if required by the satellite data licenses.

 

To do so, we are designing a login system with four levels, as suggested in the instructions.

  • Uploading: WRO researchers can upload their collected ground field data, and edit it if required. Similarly, Project Partners (PPs) can upload transcripts from the interviews with farmers.
  • Uploading and analyzing: SoilScience researchers can upload satellite data and edit it. Meanwhile, they can view and edit other data in the repository and then use all of the datasets to build the topographical map and empirical analysis.
  • Content management: The PO at SoilScience will be the only person with authorization to delete files in the repo and change the overall folder structure.
  • Viewing: Consumers of the final outputs (i.e., users of the topographical map) will only have permission to view the map.

 

2. Data enrichment

Building the points from latitude and longitude could potentially create a privacy risk. Although interview data will likely not be sensitive in nature, it might still put farmers in vulnerable positions. Further PII, like the farmers’ names could have similar effects, and must therefore be identified. Via a contract, we will have consent from interviewed farmers to publish this information, but we believe education is far more important: farmers should know what their interviews are contributing to and, moreover, how they might be at risk if their data is online. Our project partners in Waterways are experienced in providing this sort of education.

 

3. Data analysis

The only people authorized to perform the data analysis will be researchers at SoilScience. This will once again be governed by our login system.

The theme of data privacy and security can be important at different stages of your project, whether or not you expect that to be the case. To help you incorporate them into your project planning, this section provides suggestions about where you should think about the theme, structured using the stages from the Data Value Chain (DVC).

 

The DVC is a way of viewing the process of running a project from the point of view of the data, thereby identifying how it is onboarded, processed, enriched, analyzed and released in a product. In doing so, the DVC shows the moving parts in project implementations, making it a useful framework regarding the general steps of any project working with data.

 

 

 

The moment you visit a farmer in her field and explain why you're here and why you would like, for example, to take a soil sample to analyze it, she has to give consent to it. If you miss out on that, you know, you might not be able to put the data out in the public.

Christian Witt, Program Officer, Bill & Melinda Gates Foundation

Learn more
Was this page helpful?
YesNo