Annotations — MIDRC

Annotations.

Last updated January, 2025

Annotations are available in MIDRC, both based on expert human annotators and “Helper AI”.

Note that the availability of third party annotations does not automatically imply their endorsement by MIDRC.

This diagram illustrates the process by which large language models (LLMs) can be used to to automate the labeling process at the exam, series and image level. A helper AI algorithm can extract, correct and normalize the values in the DICOM header to a common vocabulary that is both machine and human consumable and returned in a JSON payload. Simultaneously, the clinical report can be evaluated by another pre-trained LLM that is prompted to extract the relevant features and return them in a structured format (e.g. A table as shown) mapped to a standard format such as SNOMED or ICD-10. Human experts can then do spot-checks or audits of the automated process.

Annotations are often a required element in the performance of supervised learning and can serve as ground-truth for validation and testing. Annotations may consist of labels created by a human expert, metadata extracted from a linked record (e.g EMR) or values derived from an algorithm, that is, from “Helper AI”. The term "Helper AI" is often used to reference methods in which AI tools are used to automate portions of the data curation process. This includes algorithms that analyze the data in the DICOM header and even evaluate the pixel data to check and correct errors, and normalize the information to a common vocabulary. Many of the text fields the DICOM header contains errors or use a non-standard vocabulary that varies by site and even by device within an organization. Helper AI methods can be used to map exam and series descriptions to a catalog of standardized naming conventions such as the RadLex Playbook or the LOINC catalog. Helper AI can also verify and correct other parameters such as image orientation, modality, pulse sequence descriptors, body part(s) included on the image, protocol and even the presence of contrast material. Normalizing these values automatically and at scale makes it easier for a researcher to select cohorts of exams from multiple sources without pre-existing knowledge of the different naming conventions that were used.

Annotations in MIDRC:

Are applicable at various levels of granularity, from the entire imaging exam to individual pixels in an image.
Can take various forms such as a free text, a measurement or a region (e.g. bounding box).
Can provide a reference standard to supplement the images themselves and can be used to train an AI model and subsequently test performance of trained models.
May be produced internal to MIDRC or submitted from external sources along with contributed imaging data or subsequently as part of an external project.
May originate from MIDRC research activities or the Grand Challenge Workgroup (GCWG).

Since most of the MIDRC data is publicly available, in some instances annotations may be temporarily withheld (for the purpose of a Challenge). All publicly available annotations will be linked to the public training imaging data made available in Gen3 whenever possible.

Below is a list of public annotation sets that are available on the Gen3 data portal linked to the images and downloadable for your use. Some of the annotations are viewable directly on the OHIF viewer but in some instances the format is not supported because they utilize a non-standard format.

MIDRC annotations.

The training set for the MIDRC mRALE Mastermind Challenge was annotated by expert radiologists and are currently available through GitHub as well as in the data portal (see below). This dataset contains 2079 portable chest radiography exams (multiple exams per patient).
Description:
To provide a reference standard assessment of disease severity on chest radiographs, a simplified version of the Radiographic Assessment of Lung Edema (RALE) score is provided. This grading scale was originally validated for use in pulmonary edema assessment in acute respiratory distress syndrome and incorporates the extent and density of alveolar opacities on chest radiographs.
Image type annotated:
Portable chest radiographs
Annotation method:
Retrospective expert
Annotation type:
Ordinal label
Annotation level:
Imaging study
Availability:
Searchable in the data explorer:yes
https://github.com/MIDRC/COVID19_Challenges/tree/main/Challenge_2023_mRALE%20Mastermind
Publication:
In preparation
Acknowledgment:
Many thanks to the expert radiologists who participated in the annotation of this dataset

Third party annotations.

The Radiological Society of North America (RSNA) assembled the RSNA International COVID-19 Open Radiology Database (RICORD), a collection of COVID-related imaging datasets and expert annotations to support research and education.
Description
MIDRC-RICORD dataset 1a was created through a collaboration between the RSNA and the Society of Thoracic Radiology (STR). Pixel-level volumetric segmentation with clinical annotations by thoracic radiology subspecialists was performed for all COVID positive thoracic computed tomography (CT) imaging studies in a labeling schema coordinated with other international consensus panels and COVID data annotation efforts.
MIDRC-RICORD dataset 1a consists of 120 thoracic computed tomography (CT) scans from four international sites annotated with detailed segmentation and diagnostic labels.
Patient Selection: Patients at least 18 years in age receiving positive diagnosis for COVID-19.
1. 120 Chest CT examinations (axial series only, any protocol).
2. Annotations comprised of
a) Detailed segmentation of affected regions;
b) Image-level labels (Infectious opacity, Infectious TIB/micronodules, Infectious cavity, Noninfectious nodule/mass, Atelectasis, Other noninfectious opacity)
c) Exam-level diagnostic labels (Typical, Indeterminate, Atypical, Negative for pneumonia, Halo sign, Reversed halo sign, Reticular pattern w/o parenchymal opacity, Perilesional vessel enlargement, Bronchial wall thickening, Bronchiectasis, Subpleural curvilinear line, Effusion, Pleural thickening, Pneumothorax, Pericardial effusion, Lymphadenopathy, Pulmonary embolism, Normal lung, Infectious lung disease, Emphysema, Oncologic lung disease, Non-infectious inflammatory lung disease, Non-infectious interstitial, Fibrotic lung disease, Other lung disease)
d) Exam-level procedure labels (With IV contrast, Without IV contrast, QA- inadequate motion/breathing, QA- inadequate insufficient inspiration, QA- inadequate low resolution, QA- inadequate incomplete lungs, QA- inadequate wrong body part/modality, Endotracheal tube, Central venous/arterial line, Nasogastric tube, Sternotomy wires, Pacemaker, Other support apparatus).
Image type annotated:
Chest CT
Annotation method:
Retrospective expert (external to MIDRC)
Annotation type:
Categorical labels and segmentations
Annotation level:
Imaging study and image-level
Availability:
Searchable in the data explorer: no
https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969742
Publication
https://pubs.rsna.org/doi/full/10.1148/radiol.2021203957
The Radiological Society of North America (RSNA) assembled the RSNA International COVID-19 Open Radiology Database (RICORD) collection of COVID-related imaging datasets and expert annotations to support research and education.
Description
This dataset was created through a collaboration between the RSNA and Society of Thoracic Radiology (STR). Clinical annotation by thoracic radiology subspecialists was performed for all COVID positive chest radiography (CXR) imaging studies using a labeling schema based upon guidelines for reporting classification of COVID-19 findings in CXRs (see Review of Chest Radiograph Findings of COVID-19 Pneumonia and Suggested Reporting Language, Journal of Thoracic Imaging).
The RSNA International COVID-19 Open Annotated Radiology Database (RICORD) consists of 998 chest x-rays from 361 patients at four international sites annotated with diagnostic labels.
Patient Selection: Patients at least 18 years in age receiving positive diagnosis for COVID-19.
998 Chest x-ray examinations from 361 patients.
Annotations with labels:
Classification
Typical Appearance
Multifocal bilateral, peripheral opacities, and/or Opacities with rounded morphology
Lower lung-predominant distribution (Required Feature - must be present with either or both of the first two opacity patterns)
Indeterminate Appearance
Absence of typical findings AND Unilateral, central or upper lung predominant distribution of airspace disease
Atypical Appearance
Pneumothorax or pleural effusion, Pulmonary Edema, Lobar Consolidation, Solitary lung nodule or mass, Diffuse tiny nodules, Cavity
Negative for Pneumonia
No lung opacities
Airspace Disease Grading
Lungs are divided on frontal chest xray into 3 zones per lung (6 zones total). The upper zone extends from the apices to the superior hilum. The mid zone spans between the superior and inferior hilar margins. The lower zone extends from the inferior hilar margins to the costophrenic sulci.
Mild - Required if not negative for pneumonia
Opacities in 1-2 lung zones
Moderate - Required if not negative for pneumonia
Opacities in 3-4 lung zones
Severe - Required if not negative for pneumonia
Opacities in >4 lung zones
Image type annotated:
Chest radiographs
Annotation method:
Retrospective expert (external to MIDRC)
Annotation type:
Categorical labels
Annotation level:
Imaging study
Availability:
Searchable in the data explorer: no
https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70230281
Publication:
https://pubs.rsna.org/doi/full/10.1148/radiol.2021203957
Machine-generated annotations based on information produced retrospective to the subject’s care using automated means.
Description:
Smart imagery framing and Ttruthing (SIFT) system is an image annotation software system to assist annotators to identify abnormalities (or diseases) and their corresponding boundaries to train and test machine learning and deep learning artificial intelligence.
SIFT is based on Multi-task, Optimal-recommendation, and Max-predictive Classification and Segmentation (MOM ClasSeg) technologies to detect and delineate 65 different abnormal regions of interest (ROI) on chest x-ray image, provide score for detected ROI, and various recommendation of abnormality for each ROI.
The MOM ClasSeg System integrating Mask R-CNN and Decision Fusion Network is developed on a training dataset of over 300,000 adult chest x-ray, which contains over 240,000 confirmed abnormal images with over 300,000 confirmed ROIs corresponding to 65 different abnormalities and over 67,000 normal (i.e., “no finding”) images.
Image type annotated:
Chest radiographs
Annotation method:
Retrospective machine (AI) generated
Annotation type:
DICOM seg objects and categorical labels.
Annotation level:
Image
Availability:
Searchable within the data explorer: yes
Description:
The annotations in this dataset contain analysis results produced using off-the-shelf open source tools - specifically BodyPartRegression and lungmask tools. The accuracy of the annotations produced by those tools were not verified by the experts.
BodyPartRegression algorithm is described in this preprint: https://arxiv.org/abs/2110.09148.
The lungmask algorithm is described in this publication: https://doi.org/10.1186/s41747-020-00173-2.
Radiomics features were extracted for the segmentations using open source pyradiomics library, described in this publication: https://doi.org/10.1158/0008-5472.CAN-17-0339.
Application of those tools to the selected MIDRC data is described in this publication: https://doi.org/10.1117/12.2653606.
Image type annotated:
CT (389 CT series)
Annotation method:
Retrospective machine (AI) generated
Annotation type:
Volumetric segmentations, slice-level body part assignment
Annotation level:
PIxel-based
Links to source project:
https://github.com/MIC-DKFZ/BodyPartRegression
https://github.com/MIC-DKFZ/BodyPartRegression
Availability:
Searchable with the data explorer: No
Publication: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12469/2653606/Integrating-deep-learning-algorithm-for-the-lung-segmentation-with-body/10.1117/12.2653606.short?SSO=1

How to find/view segmentation annotations.

Find and view DICOM annotations that are viewable in MIDRC's integrated OHIF viewer:

Go to the data explorer at data.midrc.org/explorer
Select the "Imaging Studies" main tab
Select the "Annotations" filters tab
Under the filter "Datafile Annotation Name" select an annotation type you’re interested in such as: midrc_bpr_landmarks, midrc_bpr_regions, midrc_lung_measures, midrc_lung_segs etc.
Click the "Browse in DICOM Viewer" button in the explorer table to view a particular study.
When the OHIF viewer tab opens, scroll down to the bottom of the list of imaging series, and double-click on one of the series labeled "SEG".
A message will ask "Do you want to open this Segmentation?", and click "Yes".

Questions? Check out our answers to frequently asked questions!

How to acknowledge 1) MIDRC funded research and 2) use of data downloaded from the MIDRC Data Commons

Annotations.

MIDRC annotations.

Third party annotations.

How to find/view segmentation annotations.

More about MIDRC

MIDRC data

Support

Annotations.

MIDRC annotations.

mRALE pneumonia severity scores

Third party annotations.

RICORD 1a

RICORD 1c

SIFT: Smart imagery framing and truthing system

MIDRC lung measures

How to find/view segmentation annotations.

More about MIDRC

MIDRC data

Support