Annotations.

Last updated January, 2025

Annotations are available in MIDRC, both based on expert human annotators and “Helper AI”.

Note that the availability of third party annotations does not automatically imply their endorsement by MIDRC.

This diagram illustrates the process by which large language models (LLMs) can be used to to automate the labeling process at the exam, series and image level.  A helper AI algorithm can extract, correct and normalize the values in the DICOM header to a common vocabulary that is both machine and human consumable and returned in a JSON payload.  Simultaneously, the clinical report can be evaluated by another pre-trained LLM that is prompted to extract the relevant features and return them in a structured format (e.g. A table as shown) mapped to a standard format such as SNOMED or ICD-10.  Human experts can then do spot-checks or audits of the automated process. 

Annotations are often a required element in the performance of supervised learning and can serve as ground-truth for validation and testing.  Annotations may consist of labels created by a human expert, metadata extracted from a linked record (e.g EMR) or values derived from an algorithm, that is, from “Helper AI”. The term "Helper AI" is often used to reference methods in which AI tools are used to automate portions of the data curation process.  This includes algorithms that analyze the data in the DICOM header and even evaluate the pixel data to check and correct errors, and normalize the information to a common vocabulary.  Many of the text fields the DICOM header contains errors or use a non-standard vocabulary that varies by site and even by device within an organization.  Helper AI methods can be used to map exam and series descriptions to a catalog of standardized naming conventions such as the RadLex Playbook or the LOINC catalog.  Helper AI can also verify and correct other parameters such as image orientation, modality, pulse sequence descriptors, body part(s) included on the image, protocol and even the presence of contrast material.  Normalizing these values automatically and at scale makes it easier for a researcher to select cohorts of exams from multiple sources without pre-existing knowledge of the different naming conventions that were used. 

Annotations in MIDRC: 

  • Are applicable at various levels of granularity, from the entire imaging exam to individual pixels in an image.  

  • Can take various forms such as a free text, a measurement or a region (e.g. bounding box).  

  • Can provide a reference standard to supplement the images themselves and can be used to train an AI model and subsequently test performance of trained models.  

  • May be produced internal to MIDRC or submitted from external sources along with contributed imaging data or subsequently as part of an external project.

  • May originate from MIDRC research activities or the Grand Challenge Workgroup (GCWG).  

Since most of the MIDRC data is publicly available, in some instances annotations may be temporarily withheld (for the purpose of a Challenge).  All publicly available annotations will be linked to the public training imaging data made available in Gen3 whenever possible. 

Below is a list of public annotation sets that are available on the Gen3 data portal linked to the images and downloadable for your use.  Some of the annotations are viewable directly on the OHIF viewer but in some instances the format is not supported because they utilize a non-standard format.

MIDRC annotations.

  • The training set for the MIDRC mRALE Mastermind Challenge was annotated by expert radiologists and are currently available through GitHub as well as in the data portal (see below). This dataset contains 2079 portable chest radiography exams of COVID-19 positive patients.

    Description:

    To provide a reference standard assessment of disease severity on chest radiographs, a simplified version of the Radiographic Assessment of Lung Edema (RALE) score is provided. This grading scale was originally validated for use in pulmonary edema assessment in acute respiratory distress syndrome and incorporates the extent and density of alveolar opacities on chest radiographs. The grading system is relevant to COVID-19 patients as the chest radiograph findings tend to involve multifocal alveolar opacities, and many hospitalized COVID-19 patients develop acute respiratory distress syndrome.

    Image type annotated:

    Portable chest radiographs

    Annotation method:

    Retrospective expert

    Annotation type:

    Ordinal label

    Annotation level:

    Imaging study

    Availability:

    Searchable in the data explorer: yes

    https://github.com/MIDRC/COVID19_Challenges/tree/main/Challenge_2023_mRALE%20Mastermind

    Publication:

    In preparation

    Acknowledgment:

    Many thanks to the expert radiologists who participated in the annotation of this dataset


Third party annotations.

  • The Radiological Society of North America (RSNA) assembled the RSNA International COVID-19 Open Radiology Database (RICORD), a collection of COVID-related imaging datasets and expert annotations to support research and education.

    Description

    MIDRC-RICORD dataset 1a was created through a collaboration between the RSNA and the Society of Thoracic Radiology (STR). Pixel-level volumetric segmentation with clinical annotations by thoracic radiology subspecialists was performed for all COVID positive thoracic computed tomography (CT) imaging studies in a labeling schema coordinated with other international consensus panels and COVID data annotation efforts.

    MIDRC-RICORD dataset 1a consists of 120 thoracic computed tomography (CT) scans from four international sites annotated with detailed segmentation and diagnostic labels.

    Patient Selection: Patients at least 18 years in age receiving positive diagnosis for COVID-19.

    1. 120 Chest CT examinations (axial series only, any protocol).

    2. Annotations comprised of

    a) Detailed segmentation of affected regions;

    b) Image-level labels (Infectious opacity, Infectious TIB/micronodules, Infectious cavity, Noninfectious nodule/mass, Atelectasis, Other noninfectious opacity)

    c) Exam-level diagnostic labels (Typical, Indeterminate, Atypical, Negative for pneumonia, Halo sign, Reversed halo sign, Reticular pattern w/o parenchymal opacity, Perilesional vessel enlargement, Bronchial wall thickening, Bronchiectasis, Subpleural curvilinear line, Effusion, Pleural thickening, Pneumothorax, Pericardial effusion, Lymphadenopathy, Pulmonary embolism, Normal lung, Infectious lung disease, Emphysema, Oncologic lung disease, Non-infectious inflammatory lung disease, Non-infectious interstitial, Fibrotic lung disease, Other lung disease)

    d) Exam-level procedure labels (With IV contrast, Without IV contrast, QA- inadequate motion/breathing, QA- inadequate insufficient inspiration, QA- inadequate low resolution, QA- inadequate incomplete lungs, QA- inadequate wrong body part/modality, Endotracheal tube, Central venous/arterial line, Nasogastric tube, Sternotomy wires, Pacemaker, Other support apparatus).

    Image type annotated:

    Chest CT

    Annotation method:

    Retrospective expert (external to MIDRC)

    Annotation type:

    Categorical labels and segmentations

    Annotation level:

    Imaging study and image-level

    Availability:

    Searchable in the data explorer: no

    https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=80969742

    Publication

    https://pubs.rsna.org/doi/full/10.1148/radiol.2021203957

  • The Radiological Society of North America (RSNA) assembled the RSNA International COVID-19 Open Radiology Database (RICORD) collection of COVID-related imaging datasets and expert annotations to support research and education.

    Description

    This dataset was created through a collaboration between the RSNA and Society of Thoracic Radiology (STR). Clinical annotation by thoracic radiology subspecialists was performed for all COVID positive chest radiography (CXR) imaging studies using a labeling schema based upon guidelines for reporting classification of COVID-19 findings in CXRs (see Review of Chest Radiograph Findings of COVID-19 Pneumonia and Suggested Reporting Language, Journal of Thoracic Imaging).

    The RSNA International COVID-19 Open Annotated Radiology Database (RICORD) consists of 998 chest x-rays from 361 patients at four international sites annotated with diagnostic labels.

    Patient Selection: Patients at least 18 years in age receiving positive diagnosis for COVID-19.

    998 Chest x-ray examinations from 361 patients.

    Annotations with labels:

    Classification

    Typical Appearance

    Multifocal bilateral, peripheral opacities, and/or Opacities with rounded morphology

    Lower lung-predominant distribution (Required Feature - must be present with either or both of the first two opacity patterns)

    Indeterminate Appearance

    Absence of typical findings AND Unilateral, central or upper lung predominant distribution of airspace disease

    Atypical Appearance

    Pneumothorax or pleural effusion, Pulmonary Edema, Lobar Consolidation, Solitary lung nodule or mass, Diffuse tiny nodules, Cavity

    Negative for Pneumonia

    No lung opacities

    Airspace Disease Grading

    Lungs are divided on frontal chest xray into 3 zones per lung (6 zones total). The upper zone extends from the apices to the superior hilum. The mid zone spans between the superior and inferior hilar margins. The lower zone extends from the inferior hilar margins to the costophrenic sulci.

    Mild - Required if not negative for pneumonia

    Opacities in 1-2 lung zones

    Moderate - Required if not negative for pneumonia

    Opacities in 3-4 lung zones

    Severe - Required if not negative for pneumonia

    Opacities in >4 lung zones

    Image type annotated:

    Chest radiographs

    Annotation method:

    Retrospective expert (external to MIDRC)

    Annotation type:

    Categorical labels

    Annotation level:

    Imaging study

    Availability:

    Searchable in the data explorer: no

    https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70230281

    Publication:

    https://pubs.rsna.org/doi/full/10.1148/radiol.2021203957

  • Machine-generated annotations based on information produced retrospective to the subject’s care using automated means.

    Description:

    Smart imagery framing and Ttruthing (SIFT) system is an image annotation software system to assist annotators to identify abnormalities (or diseases) and their corresponding boundaries to train and test machine learning and deep learning artificial intelligence.

    SIFT is based on Multi-task, Optimal-recommendation, and Max-predictive Classification and Segmentation (MOM ClasSeg) technologies to detect and delineate 65 different abnormal regions of interest (ROI) on chest x-ray image, provide score for detected ROI, and various recommendation of abnormality for each ROI.

    The MOM ClasSeg System integrating Mask R-CNN and Decision Fusion Network is developed on a training dataset of over 300,000 adult chest x-ray, which contains over 240,000 confirmed abnormal images with over 300,000 confirmed ROIs corresponding to 65 different abnormalities and over 67,000 normal (i.e., “no finding”) images.

    Image type annotated:

    Chest radiographs

    Annotation method:

    Retrospective machine (AI) generated

    Annotation type:

    DICOM seg objects and categorical labels.

    Annotation level:

    Image

    Availability:

    Searchable within the data explorer: yes

  • Description:

    The annotations in this dataset contain analysis results produced using off-the-shelf open source tools - specifically BodyPartRegression and lungmask tools. The accuracy of the annotations produced by those tools were not verified by the experts.

    BodyPartRegression algorithm is described in this preprint: https://arxiv.org/abs/2110.09148.

    The lungmask algorithm is described in this publication: https://doi.org/10.1186/s41747-020-00173-2.

    Radiomics features were extracted for the segmentations using open source pyradiomics library, described in this publication: https://doi.org/10.1158/0008-5472.CAN-17-0339.

    Application of those tools to the selected MIDRC data is described in this publication: https://doi.org/10.1117/12.2653606.

    Image type annotated:

    CT (389 CT series)

    Annotation method:

    Retrospective machine (AI) generated

    Annotation type:

    Volumetric segmentations, slice-level body part assignment

    Annotation level:

    PIxel-based

    Links to source project:

    https://github.com/MIC-DKFZ/BodyPartRegression

    https://github.com/MIC-DKFZ/BodyPartRegression

    Availability:

    Searchable with the data explorer: No

    Publication: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12469/2653606/Integrating-deep-learning-algorithm-for-the-lung-segmentation-with-body/10.1117/12.2653606.short?SSO=1


How to find/view segmentation annotations.

Find and view DICOM annotations that are viewable in MIDRC's integrated OHIF viewer:

  1. Go to the data explorer at data.midrc.org/explorer

  2. Select the "Imaging Studies" main tab

  3. Select the "Annotations" filters tab

  4. Under the filter "Datafile Annotation Name" select an annotation type you’re interested in such as: midrc_bpr_landmarks, midrc_bpr_regions, midrc_lung_measures, midrc_lung_segs etc.

  5. Click the "Browse in DICOM Viewer" button in the explorer table to view a particular study.

  6. When the OHIF viewer tab opens, scroll down to the bottom of the list of imaging series, and double-click on one of the series labeled "SEG". 

  7. A message will ask "Do you want to open this Segmentation?", and click "Yes".


Questions? Check out our answers to frequently asked questions!

How to acknowledge 1) MIDRC funded research and 2) use of data downloaded from the MIDRC Data Commons