Tools, algorithms, methods, code.

Last updated November, 2024

Algorithms, methods, and code are continuously being developed within MIDRC. Please see below for a current list.

Check back often for updates!

See our ‘Web-based tools’ page for ‘plug-and-play’ tools for cohort building, indexing, performance metric selection, bias awareness, and more.

Featured:

The MIDRC diversity calculator is an open-source tool designed to assess and quantify the diversity within medical imaging datasets. By analyzing demographic attributes, this calculator helps researchers ensure that their datasets are representative of diverse populations, aiding in the development of more equitable and inclusive AI models in healthcare. The tool is hosted on GitHub, where users can access the code, contribute, and collaborate to improve its functionality.

Learn more about the diversity calculator in this short demo video, a MIDRC seminar recording, or the peer-reviewed publication.

The MIDRC DICOM harmonization mapping tool is an open-source utility aimed at standardizing and harmonizing DICOM files across diverse medical imaging datasets. By aligning and normalizing metadata and image attributes, including unstructured character string fields, this mapping table facilitates cohort selection in the MIDRC data commons and ensures the integration and interoperability of imaging data, providing consistency quality in large-scale research initiatives. It is available on GitHub for users to access and input suggestions for additional LOINC terminology.

Learn more in this MIDRC seminar recording.

RadGraph is a tool designed to support the development of AI models that understand and evaluate radiology reports. Combining a Python-based labeling tool with an annotated dataset, RadGraph enables researchers to extract medical terms and their relationships from unstructured text. It serves as a foundation for training and refining natural language processing (NLP) models specifically tailored to healthcare applications. RadGraph is hosted on PhysioNet, accessing the tool requires user credentials.

Explore RadGraph’s code and additional resources on its GitHub page. Pre-trained models and details about RadGraph are available on HuggingFace. RadGraph is described in several peer-reviewed publications, which provide deeper insights into its development and applications: ACL 2024 Findings: Overview of RadGraph’s advancements in radiology NLP. and EMNLP 2022 Findings: Key use cases and evaluation of RadGraph’s dataset.

The Stanford de-identifier base is a pre-trained model developed by Stanford's AIMI Center to automatically remove or obscure personally identifiable information (PII) from medical text. Hosted on Hugging Face, this tool is designed to ensure patient privacy by de-identifying sensitive data in clinical reports, enabling researchers to safely use and share medical documents for research and AI development without compromising confidentiality. The model can be easily integrated into workflows to enhance data privacy practices in healthcare.

Read more about the de-identifier in this peer-reviewed publication.

The RSNA DICOM anonymizer is a free open-source tool for curating, de-identifying and transferring imaging datasets. The Anonymizer program has versions for major OS platforms (MacOS, Windows, Linux Ubuntu) designed to perform "on-prem" de-identification of imaging datasets for use in research. Written in Python, using widely adopted libraries for processing medical images, Anonymizer is designed to be extended for specific project needs.

The generalized stratified sampling tool on GitHub is a resource for researchers looking to implement advanced sampling techniques in medical imaging studies. This tool offers a framework for stratified sampling, which helps ensure that samples are representative of various subgroups within a dataset. It supports the development of more robust and generalizable models by improving the distribution and diversity of sampled data, making it easier to analyze and interpret complex imaging datasets effectively.

Read more about the de-identifier in this peer-reviewed publication.


Complete list:

CRP=Collaborative Reseach Project, TDP=Tecnology Development Project, BDWG=Bias and Diversity Working Group)

Questions? Check out our answers to frequently asked questions!

How to acknowledge 1) MIDRC funded research and 2) use of data downloaded from the MIDRC Data Commons