How Ceres2030 used machine learning to create an evidence map for agricultural research

When we knew how interventions were described in agricultural research, we could set about analyzing our sample of articles to find and classify specific interventions.

We found synonyms by looking at hypernyms and hyponyms, which are a type of relationship in semantics (for example, a lemon is a hyponym of fruit), and classified them into four broad categories—technical, socioeconomic, ecosystem, unclassified—and then, specifically, as 995 narrow intervention concepts.

This is a much more targeted approach to uncover important research. It gives us a wholly new way to classify and organize sciences that it is accessible to an audience interested in policy-relevant research. 

TOPIC MODELING AND GAPS IN EVIDENCE

Natural language processing enabled us to unify and explore data, despite the data came from many different places. We used topic modeling—a way of exploring text to see what is has in common with other text in the same corpus—to establish a baseline from which we could map the evolution of research from 2008 to 2018.

We can see the topics where there was a high level of research (the darker the blue in the image below, the greater the density of research papers) or where evidence and research were limited or missing. We can also create comprehensive research baselines to see the volume of research by topic, by funder, by country, and the potential relevancy of the research.

Having created a way of finding and classifying interventions in agriculture, we could automate bringing in new research, taking us closer to the possibility of real-time analysis of research for policy relevance. We used an open-source tool (Elastic Stack) to visualize queries and results. This helps make all this information accessible, visualizable, and shareable. It is also easy to add new sources of information.

How we assess the quality of evidence

Evidence syntheses, like scoping and systematic reviews, bring all the studies on a particular issue or intervention together to evaluate what they mean. It’s a process with specific steps designed to minimize bias and to ensure rigor and transparency, so that someone else could replicate the process and reach the same conclusion.

Each of the eight intervention research teams is supported by research synthesis experts. The first critical step is for the authors to create a protocol for each review. This is the roadmap setting out how the review is going to be done, how the reviewers will decide what studies or data to include or exclude in the review, and how those studies and data will be reviewed.

One particular issue facing agricultural research is that it has fewer randomized control trials than, say, medicine, and needs to be inclusive of many different kinds of evidence and data. This makes scientific appraisal more difficult. For this reason, we are taking a mixed-methods approach, which combines quantitative and qualitative evidence on complex and pressing questions, and which has been successful in previous agricultural systematic reviews. Groups with deep experience in mixed-methods reviews, such as the Campbell Collaboration and the Center for Evidence-Based Agriculture have worked closely with us to explore appropriate methods and offer expert advice.

Once each research team reached a consensus on the protocol, it was published—and it cannot be changed. Publishing ahead of doing the review protects its replicability and transparency. It also gives us a chance to share our work, what we are doing, and allow for scientific dissent as part of the process.

All the protocols have been uploaded to the OSF open-science platform. You can find links to these on each question pag

 

Evidence synthesis

1. Formulate a research question

2. Search for similar systematic reviews

3. Identify all relevant evidence bases

4. Develop and test search strategies

5. Write inclusion and exclusion criteria

6. Publish protocol

7. Execute searching and screen results

8. Conduct quality of evidence assessment

9. Review and synthesize results

Using machine learning to find
policy interventions to end hunger

Given the amount of information available, relying only on keyword searching doesn’t work. If we want to understand the fullness of human knowledge, we need to incorporate new methods of discovery that account for the way we describe similar things in different ways.

Over the past decade, there have been enormous advances in artificial intelligence that enable computers to analyze the way we use language. This involves training a computer program to recognize relationships between words, so that it can capture the different ways people describe similar things.

We used machine learning and natural language processing (NLP) to create and analyze a preliminary dataset of ~50,000 articles and reports (2008-2018) about smallholder farmers from science journals and research and development organizations. We used a variety of search terms, such as small-scale food producers, rural farmers, and subsistence and contract farmers.

In order to increase coverage of materials published in low and middle income countries, we included the full table of contents from the African Journal of Biotechnology, African Journal of Agricultural Research, African Journal of Food, Agriculture, Nutrition and Development, African Crop Science Journal, Indian Journal of Agronomy, and the Indian Journal of Agricultural Economics.

In order to increase coverage of materials published in low and middle income countries, we included the full table of contents from the African Journal of Biotechnology, African Journal of Agricultural Research, African Journal of Food, Agriculture, Nutrition and Development, African Crop Science Journal, Indian Journal of Agronomy, and the Indian Journal of Agricultural Economics.