6.3 Common Data Models (CDMs)

Establishing a standardized, consistent and shared data structure designed to organize and describe data in a way that facilitates interoperability and data integration across various applications and domains

Why should I do this?

To establish a common understanding and representation of data, reducing the complexity and effort required for data sharing and integration.

 

How do CDMs relate to FAIR?

 

  • Interoperability: Building a CDM to plot the infrastructure of your project ensures everyone in the team knows where data is coming from, and at what point in the project datasets and systems are linked together.
  • Accessibility: CDMs provide clear architectures for projects that can be used so that all researchers within the project team are on the same page, which can mitigate the effects of problems such as language barriers.

Download this CDM factsheet for more insights.

 

What is a CDM?

A CDM is a standardized, consistent and shared data structure designed to organize and describe data in a way that facilitates interoperability and data integration across various applications and domains. The goal of a CDM is to establish a common understanding and representation of data, reducing the complexity and effort required for data sharing and integration.

 

1) If you are a Program Officer (PO), you may want to share this page directly with your grantee, so they can act on it.

2) If you are a grantee, ensure you have technical team members involved in this process. While the content is accessible to both technical and non-technical members, technical expertise will be required to make decisions for the investment in this step.

3) If you have not already downloaded ‘Project SIS’ or ‘Waterways’, the illustrative scenarios provide examples on how each theme is navigated. These scenarios are frequently referred to across the content in Step 6 to help you understand how different aspects within a theme are applied.

 

Things to consider for your investment:

©Gates Archive/Mansi Midha ©Gates Archive/Mansi Midha
  • Refer to the illustrative scenario that you have downloaded to see how this has been considered.
  • Ensure any work notes or decisions taken are being documented, as this would be useful to refer to at later stages or for someone new joining the team.

Only the specific theme related content has been highlighted here. To get a feel for the scenario, read here.

 

1. Data onboarding

Given the aim of this project is to combine data from TPPs and GSS, we will design a CDM to create a blueprint for our data management. Although our data will be geographic in nature, Project SIS will not be producing a topographic map. Instead, the eventual output will be a dataset.

 

 

2. Data enrichment

The CDM is consulted to ensure that data is being stored in the right folders within the repository before and after enrichment.

Only the specific theme related content has been highlighted here. To get a feel for the scenario, read here.

 

1. Data onboarding

Before any data is actually onboarded, we will design a CDM that will govern the flows of data within our project. We can split the project into two sub-sections: the building of a comprehensive topographic map (to present data we collect) and the publishing of an academic paper with analysis of the data, which should provide recommendations for crop selection in Waterways.

 

  • A CDM can be very simple, listing the elements of the research and the relationships to be tested in a quantitative research experiment. As in Scenario A, this might be the predominant question that the scientists are asking.
  • Where there is a lot of complexity in the situation, as is increasingly the case where the climate has become very unpredictable, the data model needs to be more open. In Scenario B, a geographical data model was layered on the ground map of the area, the satellite and field data, the canal, and interviews with the farmers, to arrive at the real picture of the contrasting yields of crops across villages. In either case, the data model is a foundation to the research that can be expanded to deal with related situations, or new elements to be researched. The basic frame of a data model does not change if one repeats the experiment in a different location or with slightly changed parameters. If a new data source is found, and more satellite data is available, the data model allows this to be seamlessly included in the research.
  • The data model, when properly chosen, is at the heart of a FAIR approach. The data is findable, accessible, interoperable, and reusable, to a large extent based on the CDM that allows one experiment to communicate and relate its results, both to other researchers and to the array of stakeholders.

The theme of CDMs can be important at different stages of your project, whether or not you expect that to be the case. To help you incorporate it into your project planning, this section provides suggestions about where you should think about the theme, structured using the stages of the Data Value Chain (DVC).

 

The DVC is a way of viewing the process of running a project from the point of view of the data, thereby identifying how it is onboarded, processed, enriched, analyzed and released in a product. In doing so, the DVC shows the moving parts in project implementations, making it a useful framework regarding the general steps of any project working with data.

 

 

It’s important we don’t lose any more data.

Martin Parr, Director, Data Policy & Practice, CABI

Learn more
Was this page helpful?
YesNo