Algorithms, methods, code.

Last updated October, 2024

Algorithms, methods, and code are continuously being developed within MIDRC. Please see below for a current list.

Check back often for updates!

See our ‘Tools’ page for ‘plug-and-play’ tools for cohort building, indexing, performance metric selection, bias awareness, and more.

Featured:

The MIDRC diversity calculator is an open-source tool designed to assess and quantify the diversity within medical imaging datasets. By analyzing demographic attributes, this calculator helps researchers ensure that their datasets are representative of diverse populations, aiding in the development of more equitable and inclusive AI models in healthcare. The tool is hosted on GitHub, where users can access the code, contribute, and collaborate to improve its functionality.

Learn more about the diversity calculator in this short demo video, a MIDRC seminar recording, or the peer-reviewed publication.

The MIDRC DICOM harmonization mapping tool is an open-source utility aimed at standardizing and harmonizing DICOM files across diverse medical imaging datasets. By aligning and normalizing metadata and image attributes, including unstructured character string fields, this mapping table facilitates cohort selection in the MIDRC data commons and ensures the integration and interoperability of imaging data, providing consistency quality in large-scale research initiatives. It is available on GitHub for users to access and input suggestions for additional LOINC terminology.

Learn more in this MIDRC seminar recording.

RadGraph is a dataset and toolset designed to extract and label entities and relations from radiology reports, facilitating the development of natural language processing (NLP) models in the medical field. It includes annotated radiology reports and a Python-based tool for generating ground truth labels, which are crucial for training AI models to accurately interpret medical language. RadGraph is hosted on PhysioNet, where researchers can access the dataset, utilize the toolset, and contribute to advancing medical NLP research. Credentialed access.

Read more about RadGraph in this peer-reviewed publication.

The Stanford de-identifier base is a pre-trained model developed by Stanford's AIMI Center to automatically remove or obscure personally identifiable information (PII) from medical text. Hosted on Hugging Face, this tool is designed to ensure patient privacy by de-identifying sensitive data in clinical reports, enabling researchers to safely use and share medical documents for research and AI development without compromising confidentiality. The model can be easily integrated into workflows to enhance data privacy practices in healthcare.

Read more about the de-identifier in this peer-reviewed publication.

The RSNA DICOM anonymizer is a free open-source tool for curating, de-identifying and transferring imaging datasets. The Anonymizer program has versions for major OS platforms (MacOS, Windows, Linux Ubuntu) designed to perform "on-prem" de-identification of imaging datasets for use in research. Written in Python, using widely adopted libraries for processing medical images, Anonymizer is designed to be extended for specific project needs.

The generalized stratified sampling tool on GitHub is a resource for researchers looking to implement advanced sampling techniques in medical imaging studies. This tool offers a framework for stratified sampling, which helps ensure that samples are representative of various subgroups within a dataset. It supports the development of more robust and generalizable models by improving the distribution and diversity of sampled data, making it easier to analyze and interpret complex imaging datasets effectively.

Read more about the de-identifier in this peer-reviewed publication.


Complete list:

CRP=Collaborative Reseach Project, TDP=Tecnology Development Project, BDWG=Bias and Diversity Working Group)

Questions? Check out our answers to frequently asked questions!

How to acknowledge 1) MIDRC funded research and 2) use of data downloaded from the MIDRC Data Commons