MIDRC data diversity.

Last updated March 17, 2024

The COVID-19 pandemic highlighted the level of health inequity, as it disproportionately affected specific racial and ethnic minority groups.  As a multi-institutional resource initially funded to provide rapid response to COVID-19 and now pivoting to cancer, MIDRC aims to accelerate the transfer of knowledge and innovation in medical imaging research enabling equitable and fair medical image analysis AI.

MIDRC data contributing sites in the US

A diverse data collection and curation strategy, as well as the mitigation of bias in data analysis within the MIDRC commons, are critically important to yield ethical AI algorithms that produce trustworthy results for all groups. MIDRC strives to mitigate bias in its study population, data collection, curation and analysis by:

  • pursuing efforts to ensure the data set is representative of the population of the region or country of origin, and actively seeking data contributions from rural and under-represented community hospitals and smaller healthcare systems;

  • developing ethical and trustworthy machine learning algorithms that account for and reduce data bias. Bias can be mitigated by selecting data points used to train, validate, and test these algorithms, by assessing algorithms’ outputs for bias and fairness, and by monitoring the performance of algorithms after deployment;

  • enabling the study of sources of bias (such as selection bias, missing information and non-response bias, detection bias, spectrum bias and healthcare access bias) and its impacts on machine learning, decision making, and health disparities;

  • pursuing targeted funding to support increased data diversity, conducting bias assessments, and developing methods to mitigate algorithmic bias. Learn more about MIDRC data diversity and our diversity calculator….

MIDRC seeks to 1) provide an unbiased, representative health data resource for all, lowering the possibility of statistical fallacies and representational errors and 2) to develop and share tools, machine learning algorithms, and analytical methods for discovery, visualization and understanding of diverse data sets. MIDRC strives to strengthen the reproducibility and applicability of data-centered research, enhancing the understanding of the etiology, epidemiology, and treatment of COVID-19 and cancer, ultimately extending to a wide variety of health conditions.

Read an interview with Maryellen Giger, PhD, which delves into the creation of the MIDRC imaging repository, how its data can be used to develop and evaluate AI algorithms, ways that bias can be introduced—and potentially mitigated—in medical imaging models, and what the future may hold.