Algorithms — MIDRC

Downloadable tools.

Last updated November, 2024

Algorithms, methods, and code are continuously being developed within MIDRC. Please see below for a current list.

Check back often for updates!

See our ‘Web-based tools’ page for ‘plug-and-play’ tools for cohort building, indexing, performance metric selection, AI reliability, and more.

Featured:

The MIDRC representativeness exploration and comparison tool (REACT) is an open-source tool designed to assess and quantify the differences between medical imaging datasets. By analyzing acquisition types, disease variations, and/or demographic attributes, this calculator helps researchers ensure that their datasets are representative of intended populations, aiding in the development of more reliable AI models in healthcare.. The tool is hosted on GitHub, where users can access the code, contribute, and collaborate to improve its functionality.

Learn more about MIDRC-REACT in this short demo video, a MIDRC seminar recording, or the peer-reviewed publication.

The MIDRC DICOM harmonization mapping tool is an open-source utility aimed at standardizing and harmonizing DICOM files across diverse medical imaging datasets. By aligning and normalizing metadata and image attributes, including unstructured character string fields, this mapping table facilitates cohort selection in the MIDRC data commons and ensures the integration and interoperability of imaging data, providing consistency quality in large-scale research initiatives. It is available on GitHub for users to access and input suggestions for additional LOINC terminology.

Learn more in this MIDRC seminar recording.

RadGraph is a tool designed to support the development of AI models that understand and evaluate radiology reports. Combining a Python-based labeling tool with an annotated dataset, RadGraph enables researchers to extract medical terms and their relationships from unstructured text. It serves as a foundation for training and refining natural language processing (NLP) models specifically tailored to healthcare applications. RadGraph is hosted on PhysioNet, accessing the tool requires user credentials.

Explore RadGraph’s code and additional resources on its GitHub page. Pre-trained models and details about RadGraph are available on HuggingFace. RadGraph is described in several peer-reviewed publications, which provide deeper insights into its development and applications: ACL 2024 Findings: Overview of RadGraph’s advancements in radiology NLP. and EMNLP 2022 Findings: Key use cases and evaluation of RadGraph’s dataset.

The Stanford de-identifier base is a pre-trained model developed by Stanford's AIMI Center to automatically remove or obscure personally identifiable information (PII) from medical text. Hosted on Hugging Face, this tool is designed to ensure patient privacy by de-identifying sensitive data in clinical reports, enabling researchers to safely use and share medical documents for research and AI development without compromising confidentiality. The model can be easily integrated into workflows to enhance data privacy practices in healthcare.

Read more about the de-identifier in this peer-reviewed publication.

The RSNA DICOM anonymizer is a free open-source tool for curating, de-identifying and transferring imaging datasets. The Anonymizer program has versions for major OS platforms (MacOS, Windows, Linux Ubuntu) designed to perform "on-prem" de-identification of imaging datasets for use in research. Written in Python, using widely adopted libraries for processing medical images, Anonymizer is designed to be extended for specific project needs.

The generalized stratified sampling tool on GitHub is a resource for researchers looking to implement advanced sampling techniques in medical imaging studies. This tool offers a framework for stratified sampling, which helps ensure that samples are representative of various subgroups within a dataset. It supports the development of more robust and generalizable models by improving the distribution and representativeness of sampled data, making it easier to analyze and interpret complex imaging datasets effectively.

Read more about the de-identifier in this peer-reviewed publication.

Complete list:

CRP=Collaborative Research Project, TDP=Technology Development Project, AIRWG=AI Reliability Working Group

The MIDRC representation exploration and comparison tool (REACT)is a tool designed to compare the representativeness of biomedical data. By leveraging the Jensen-Shannon distance (JSD) measure, this tool provides insights into the representativeness of datasets within the biomedical field. It also supports monitoring the representativeness of datasets over time by assessing the representativeness of historical data. Developed and utilized by MIDRC, this tool assesses the representativeness of data within the open data commons to the US population. Additionally, it can be generalized by users for other diversity representativeness needs, such as assessing the similarity of distributions across multiple attributes in different biomedical datasets.
Available at https://github.com/MIDRC/MIDRC-REACT
This algorithm uses multi-dimensional stratified sampling where several variables of interest (such as demographics, imaging acquisition systems, or imaging protocols) can be sequentially used to divide the data into numerous strata, each representing a unique combination of variables. Within each resulting stratum, patients are assigned to a specific dataset. This algorithm was developed and is used by MIDRC for separation of data into either the open data commons or the sequestered data commons. However, as shared here by MIDRC, it can be generalized by users for other needs for stratified sampling, e.g., dividing your own dataset into a two separate sets: one for training and one for testing.
Code:
COVID-specific: https://github.com/MIDRC/Stratified_Sampling
General: https://github.com/MIDRC/Generalized_Stratified_Sampling
Publication:
N. Baughan, H. M. Whitney, K. Drukker, B. Sahiner, T. Hu, G. H. Kim, M. McNitt-Gray, K. J. Myers, M. L. Giger, “Sequestration of imaging studies in MIDRC: Stratified sampling to balance demographic characteristics of patients in a multi-institutional data commons.” Journal of Medical Imaging, Vol. 10, Issue 6, 064501 (November 2023). https://doi.org/10.1117/1.JMI.10.6.064501.
Task based sampling begins with the identification of cases relevant for a specific task and target population demographic characteristics (such as age range, COVID status, and imaging modality). Then, optimized quota sampling is conducted by randomly sampling cases until the maximum category margin (Baughan et al. 2022) is less than a pre-specified value. N. Baughan et al., “Task-Based Sampling of the MIDRC Sequestered Data Commons for Algorithm Performance Evaluation,” presented at Annual Meeting of the American Association of Physicists in Medicine, 2022, E257–E258).
Code:
https://github.com/MIDRC/task-based-sampling
A document-level classifier for radiology reports to help find relevant cases, as well as create large numbers of labels for computer vision models.
Code:https://huggingface.co/StanfordAIMI/covid-radbert
Publication:https://pubmed.ncbi.nlm.nih.gov/36323915/
An automated de-identification pipeline for radiology reports that detects protected health information (PHI) entities and replaces them with realistic surrogates "hiding in plain sight." Our model outperformed all de-identifiers as well as human labelers when it was compared on all test sets of i2b2 2014 data. It enables accurate and automatic de-identification of radiology reports.
Code: https://huggingface.co/StanfordAIMI/stanford-deidentifier-base
Publication: https://www.ncbi.nlm.nih.gov/pubmed/36416419
RadBERT is a transformer that was continuously pre-trained on radiology reports from a BioBERT initialization.
Code: https://huggingface.co/StanfordAIMI/RadBERT
RadGraphis a tool designed to support the development of AI models that understand and evaluate radiology reports. Combining a Python-based labeling tool with an annotated dataset, RadGraph enables researchers to extract medical terms and their relationships from unstructured text. It serves as a foundation for training and refining natural language processing (NLP) models specifically tailored to healthcare applications. RadGraph is hosted on PhysioNet, accessing the tool requires user credentials.
Explore RadGraph’s code and additional resources on its GitHub page. Pre-trained models and details about RadGraph are available on HuggingFace.
Code: https://physionet.org/content/radgraph/1.0.0/
Publications: ACL 2024 Findings: Overview of RadGraph’s advancements in radiology NLP, and EMNLP 2022 Findings: Key use cases and evaluation of RadGraph’s dataset.
An end-to-end pipeline for the classification of chest X-rays.
Code:https://github.com/MIDRC/COVID19_Lung_Classification_CXR_Emory-ResNet50
Disease Trajectory Prediction using Xrays and EHR, this model predicts a label for each chest X-ray.
Code:
https://github.com/amaratariq/COVID19_GNN_public
The model is trained with JSRT data and the corresponding lung masks. The training images are enhanced and re-sized to 256 x 256 before feeding to the network. The model is trained at The Ohio State University Wexner Medical Center, Department of Radiology, using Python, Tensorflow Keras API, and trained on an NVIDIA QuadroGV100 system with CUDA/CuDNNv9 dependencies.
Code:
https://github.com/MIDRC/COVID19_Lung_Segmentation_CXR_OSU-UNet
The model is trained with CT sequences and the corresponding lung masks. The training images are enhanced and re-sized to 256 x 256 before feeding to the network. The model is trained at The Ohio State University Wexner Medical Center, Department of Radiology, using Python, Tensorflow Keras API, and trained on an NVIDIA QuadroGV100 system with CUDA/CuDNNv9 dependencies.
Code: https://github.com/MIDRC/COVID19_Lung_Segmentation_CT_OSU-UNet
ViLMedic is a modular framework for vision and language multimodal research in the medical field.
This library contains reference implementations of state-of-the-art vision and language architectures, referred as “blocks” and full solutions for multimodal medical tasks using one or several blocks.
Code: https://vilmedic.app/, https://github.com/jbdel/vilmedic
RoentGen is a generative vision-language model to create chest x-rays based on radiological text inputs.
Code: https://stanfordmimi.github.io/RoentGen/
A classification model for Chest X-Rays.
Code:https://github.com/MIDRC/COVID19_Lung_Classification_CXR_DenseNet
The American College of Radiology developed a chest x-ray classification algorithm by training on the labeled CXR MIDRC data.
Code: https://github.com/MIDRC/COVID19_Lung_Classification_CXR_ACR
Notebooks and materials for cohort building for MIDRC Grand Challenges.
Materials for ThoraxAI:https://github.com/MIDRC/COVID19_Challenges/tree/main/Challenge_2022_COVIDx
Materials for mRALE Mastermind:https://github.com/MIDRC/COVID19_Challenges/tree/main/Challenge_2023_mRALE%20Mastermind
MIDRC AI Interface for Covid (MAIIC) provides an interface for easy prototyping and testing of AI algorithms for AI researchers and physicians.
Code: https://github.com/MIDRC/COVID19_CRP10-AIInterface
MIDRC collaborators at Argonne National Laboratory developed the Advanced Privacy Preserving Federated Learning (APPFL) framework for federated learning scenarios in which data privacy can be maintained across communication through differential privacy.
Code:
Coming soon!
Documentation:
https://appfl.readthedocs.io/en/latest/index.html
Publication: https://ieeexplore.ieee.org/abstract/document/9835407?casa_token=a8LeXteVcDwAAAAA:nsOtdixRKhxx7ua0qjTckBFiaWOxL4gt-wlmfCLAnCibLu-cs40U6AtrLKn5eXT-JtnBlg
The MIDRC-LOINC mapping table serves as a tool for standardizing DICOM metadata, particularly for secondary research endeavors such as AI studies. By translating DICOM image terms into LOINC codes and Long Common Names, this resource streamlines cohort selection based on essential attributes like body region and contrast presence. Its regular updates, managed by the MIDRC Data Quality and Harmonization subcommittee, ensures ongoing relevance and utility for the broader research community.
Code:
https://github.com/MIDRC/midrc_dicom_harmonization
Jupyter or R notebooks that demonstrate how to build cohorts via queries and access associated metadata and files in MIDRC using Python or R code.
Code: https://github.com/MIDRC/tutorial_notebooks
Where to find in the data portal: https://data.midrc.org/resource-browser
The RSNA DICOM Anonymizer is a free open-source tool for curating, de-identifying and transferring imaging datasets. The Anonymizer program has versions for major OS platforms (MacOS, Windows, Linux Ubuntu) designed to perform "on-prem" de-identification of imaging datasets for use in research. Written in Python, using widely adopted libraries for processing medical images, Anonymizer is designed to be extended for specific project needs.
Code: https://github.com/RSNA/Anonymizer
The ThoraxAI challenge task was the classification of portable chest radiographs.
First place: Ran Zhang, Dalton Griner, Guang-Hong Chen
Code:https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2022_COVIDx/Winning%20Challenge%20Submissions/winner_algorithm_description.md
Second place: Mathieu Goulet
Code:https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2022_COVIDx/Winning%20Challenge%20Submissions/runner_up_algorithm_description.md
Third place: Finn Behrendt
Code:https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2022_COVIDx/Winning%20Challenge%20Submissions/third_place_algorithm_description.md
1st place: Ian Pan (Brigham and Women’s Hospital)
code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/winner_algorithm_description.md
2nd place: Ran Zhang (University of Wisconsin-Madison)
code: currently not available due to potential regulatory approval
3rd place: Finn Behrendt (University of Technology Hamburg)
code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/third_place_algorithm_description.md
4th place: Team: Christian Mattjie, Luis Vinicius de Moura, Rafaela Cappelari Ravazio, Otavio Parraga, Luca Silveira Kupssinskü, Adilson Medronha, and Rodrigo Coelho Barros (Pontificia Universidade Católica do Rio Grande do Sul)
code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/fourth_place_algorithm_description.md
5th place: Yijie Yuan (Johns Hopkins Medical)
code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/fifth_place_algorithm_description.md
6th place: Team: Cohen Archbold, Imran Abdullah-Al-Zubaer, Atik Ahamed (University of Kentucky)
code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/sixth_place_algorithm_description.md
7th place: Mathieu Goulet (Centre régional intégré de cancérologie)
code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/seventh_place_algorithm_description.md
8th place: Team: Yifan Wu, Hayden Gunraj, Chengzong Zhao, Yuhao Chen, Alexander Wong, Pengcheng Xi (University of Waterloo)
code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/eighth_place_algorithm_description.md
9th place: Team: Stanley Liang, Sameer Antani, Zhiyun Xue, Sivaramakrishnan Rajaraman, Feng Yang (NIH National Library of Medicine, Computational Health Research Branch)
code: https://github.com/MIDRC/COVID19_Challenges/blob/main/Challenge_2023_mRALE%20Mastermind/Winning%20Challenge%20Submissions/ninth_place_algorithm_description.md
The XAI Challenge aimed to advance explainable AI for medical image analysis. Participants were tasked with developing and training explainable artificial intelligence/machine learning (AI/ML) model(s) in the task of classifying frontal-view MIDRC portable chest radiographs (CXRs) for the presence of lung opacities associated with any type of pneumonia for evaluation against the reference standard for the validation and test datasets. The AI/ML output for each CXR was 1) a likelihood that the patient presented with pneumonia of any type, and 2) an 'explainability' map, interpretable as the probability of presence of lung opacity at each pixel (of the same size as the input image). This Challenge used Docker as a containerization solution.
Trained models from participants available at https://github.com/MIDRC/MIDRC_Grand_Challenges/tree/main/Challenge_2024_XAI

Questions? Check out our answers to frequently asked questions!

How to acknowledge 1) MIDRC funded research and 2) use of data downloaded from the MIDRC Data Commons

Downloadable tools.

Featured:

Complete list:

More about MIDRC

MIDRC data

Support

Downloadable tools.

Featured:

Complete list:

AIRWG: Representativeness exploration and comparison tool (REACT)

TDP3d: Stratified sampling for dataset splitting

TDP3d: Task-based sampling

CRP1: Detection of phrases in radiology reports

CRP1: De-identification of radiology reports

CRP1: RadBERT

CRP1: RadGraph

CRP2: End-to-end pipeline for the classification of chest X-rays for COVID-19

CRP2: GNN model for disease trajectory prediction on chest X-rays

CRP2: Chest X-ray lung segmentation model based on U-Net

CRP2: Lung Segmentation Model for CT

CRP4: ViLMedic: A framework for research at the intersection of vision and language in medical AI

CRP4: Roentgen: Vision-language foundation model for chest X-ray generation

CRP4: Diagnosis based on a DenseNet architecture

CRP5: Classifier for lung chest radiographs

CRP9: ThoraxAI and mRALE Mastermind cohort building

CRP10: Interface for easy prototyping and testing of AI algorithms

CRP10: Advanced privacy preserving federated learning framework

CRP12: LOINC mapping

Gen3: Example cohort building notebooks

RSNA: RSNA anonymizer

AI models from MIDRC ThoraxAI challenge participants

AI models from the MIDRC mRALE Mastermind Challenge

AI models from the MIDRC XAI Challenge

More about MIDRC

MIDRC data

Support