Our research provides an overview of the work done to enhance the reproducibility, explainability, and interoperability of scientific experiments, machine learning and deep learning models across interdisciplinary domains including biomedicine, biodiversity, biology, and data science.
Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. We address computational reproducibility at two levels: First, using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks related to publications indexed in PubMed Central. We identified such notebooks by mining the articles full text, locating them on GitHub and re-running them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. Second, this study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over two years. Out of 27271 notebooks from 2660 GitHub repositories associated with 3467 articles, 22578 notebooks were written in Python, including 15817 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 10388 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 1203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. We zoom in on common problems, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.
Relevant Publications:
Code: https://github.com/fusion-jena/computational-reproducibility-pmc
Initial run data: https://doi.org/10.5281/zenodo.6802158
Re-run data: https://doi.org/10.5281/zenodo.8226725
ReproduceMeON is an ontology network for the reproducibility of scientific studies. The ontology network, which includes the foundational and core ontologies, attempts to bring together different aspects of the provenance of scientific studies from various applications to support their reproducibility. The repository provides the development process of ReproduceMeON and the design methodology of developing core ontologies for the provenance of scientific experiments and machine learning using a semi-automated approach. The repository provides a systematic literature review in different areas in provenance, scientific experiments, Machine Learning, computational, microscopy, and scientific workflows. We also provide the state of the art ontolgies used for the development of ReproduceMeON. Ontology matching techniques are used to select and develop core ontology for each sub-domain and link to other ontologies in the sub-domain.
Relevant Publications:
The REPRODUCE-ME Data Model is a generic data model for the representation of scientific experiments with their provenance information. The aim of this model is to capture the general elements of scientific experiments for their understandability and reproducibility. An Experiment is considered as the central point of the REPRODUCE-ME data model. The model consists of eight components: Data, Agent, Activity, Plan, Step, Setting, Instrument, Material. The REPRODUCE-ME Data Model forms a basis for the REPRODUCE-ME ontology. The REPRODUCE-ME ontology extended from PROV-O and P-Plan is used to represent the whole picture of an experiment describing the path it took from its design to result. We aim to enable end-to-end reproducibility of scientific experiments by capturing and representing the complete provenance of a scientific experiment using the REPRODUCE-ME ontology.
Relevant Publications:
Ontology Documentation:https://w3id.org/reproduceme/
ProvBook, an extension of Jupyter Notebook, to capture and view the provenance over the course of time. It also allows the user to share a notebook along with its provenance in RDF and also convert it back to a notebook. We use the REPRODUCE-ME ontology extended from PROV-O and P-Plan to describe the provenance of a notebook. This helps the scientists to compare their previous results with the current ones, check whether the experiments produce the results as expected and query the sequence of executions using SPARQL. The notebook data in RDF can be used in combination with the experiments that used them and help to get a track of the complete path of the scientific experiments.
Relevant Publications:
CAESAR is a framework for the end-to-end provenance management of scientific experiments. This collaborative framework allows scientists to capture, manage, query and visualize the complete path of a scientific experiment consisting of computational and non-computational steps in an interoperable way.
The “Reproducibility Crisis”, where researchers find difficulty in reproducing published results, is currently faced by several disciplines. To understand the underlying problem in the context of the reproducibility crisis, it is important to first know the different research practices followed in their domain and the factors that hinder reproducibility. We performed an exploratory study by conducting a survey addressed to researchers representing a range of disciplines to understand scientific experiments and research practices for reproducibility. The survey findings identify a reproducibility crisis and a strong need for sharing data, code, methods, steps, and negative and positive results. Insufficient metadata, lack of publicly available data, and incomplete information in study methods are considered to be the main reasons for poor reproducibility. The survey results also address a wide number of research questions on the reproducibility of scientific results.
Relevant Publications:
Data Availability: http://doi.org/10.5281/zenodo.3862597
Analysis: https://mybinder.org/v2/gh/fusion-jena/ReproducibilitySurvey/master
ReproduceMeGit is a visualization tool for analyzing the reproducibility of Jupyter Notebooks. This will help repository users and owners to reproduce and directly analyze and assess the reproducibility of any GitHub repository containing Jupyter Notebooks. The tool provides information on the number of notebooks that were successfully reproducible, those that resulted in exceptions, those with different results from the original notebooks, etc. Each notebook in the repository along with the provenance information of its execution can also be exported in RDF with the integration of the ProvBook tool.
Relevant Publications:
MLProvLab is a JupyterLab extension to track, manage, compare, and visualize the provenance of machine learning notebooks. The tool is designed to help data scientists and ML practitioners to automatically identify the relationships between data and models in ML scripts. It efficiently and automatically tracks the provenance metadata, including datasets and modules used. It provides users the facility to compare different runs of ML experiments, thereby ensuring a way to help them make their decisions. The tool helps researchers and data scientists to collect more information on their experimentation and interact with them.
Relevant Publications:
Machine learning (ML) is an increasingly important scientific tool supporting decision making and knowledge generation in numerous fields. With this, it also becomes more and more important that the results of ML experiments are reproducible. Unfortunately, that often is not the case. Rather, ML, similar to many other disciplines, faces a reproducibility crisis. In this paper, we describe our goals and initial steps in supporting the end-to-end reproducibility of ML pipelines. We investigate which factors beyond the availability of source code and datasets influence reproducibility of ML experiments. We propose ways to apply FAIR data practices to ML workflows.
Relevant Publications:
Deep learning models have transformed various scientific fields, including medical image analysis, drug design, speech recognition, and material inspection. While these models are widely used, their internal mechanisms remain complex and not well understood, hindering their validation and improvement. Recent research emphasizes the need for understanding model behavior and addressing biases within them. Regulations like the General Data Protection Regulation advocate for transparent algorithmic decisions. This highlights the importance of interpretability in AI models, making it crucial rather than optional. The project aims to develop interpretability methods that leverage domain knowledge, offering human-understandable explanations extracted directly from neural networks. It integrates Knowledge Graphs to enhance interpretation and accuracy, focusing on an application related to plant disease classification, essential for sustainable agriculture in a changing climate.
Biodiversity is the variety of life on Earth, including its evolutionary, ecological, and cultural processes. It is important to understand where biodiversity is, how it is changing over time, and the factors that drive these changes. To do this, we need to describe and integrate the conditions and measures of biodiversity. We present a core ontology for biodiversity to establish a link between foundational and domain-specific ontologies. Furthermore, we present two gold-standard corpora for Named Entity Recognition (NER) and Relation Extraction (RE) generated from biodiversity datasets metadata and abstracts. These corpora can be used as evaluation benchmarks for the development of new computer-supported tools that require machine learning or deep learning techniques.. The underlying ontology for the classes and relations used to annotate such corpora has also been demonstrated.
Relevant Publications:
Code:
Acknowledgements
This research is supported in parts by the Deutsche Forschungsgemeinschaft (DFG) in Project Z2 of the CRC/TRR 166 High-end light microscopy elucidates membrane receptor function - ReceptorLight, Carl Zeiss Foundation for the financial support of the project 'A Virtual Werkstatt for Digitization in the Sciences (K3)' within the scope of the program line 'Breakthroughs: Exploring Intelligent Systems for Digitization - explore the basics, use applications' and the University of Jena for IMPULSE funding: IP 2020-10.