MIDRC bias and diversity working group.
List of potential biases along the medical imaging AI/ML pipeline from data acquisition to model deployment.
Data collection biases.
-
Summary
Data generation bias is bias introduced when data comes from (i) limited acquisition sources, (ii) data collected under different standard processes, or (iii) from duplicated data due to repeat collection or acquisition.
-
Synthetic data is often offered as a mitigation technique to reduce the bias in that may be present real data, but do not assume that synthetic data generation can automatically remedy biased data collection.
-
If exclusions are not carefully described and justified, severe bias may occur both in model training and testing. Sample selection bias caused by choosing non-random data, where a subset of the data is systematically excluded.
-
The collected data, and therefore developed or tested models may show a bias if different social groups are managed differently at an institution. One needs to carefully review patient management at all institutions that provide data to minimize institutional bias.
-
Popularity bias occurs when current trends influence patients' decision making on medical imaging, which then subsequently affect data collection.
-
Population bias in medical imaging occurs when the characteristics of a population from which data is used for training an algorithm are different than the characteristics of a population for which data is used for testing or decision making-from an algorithm. The characteristics may include biological differences, demographic differences, social differences (such as the impacts of socioeconomic differences in access to health care), or technical differences in image acquisition that correlate to demographic difference.
-
Temporal bias is (a) bias that arises from differences in populations and behaviors over time, (b) bias that arises from the use of data that is not representative of diagnostic clinical data, or (c) bias that arises from the correlation of reader performance and state of knowledge of the disease. These are problematic because algorithms may not be generalizable over time, including global course of disease and individual patient trajectory and state of clinical knowledge.
-
The patient data used or training/tuning/testing a machine learning algorithm are not representative of the patient population to which the algorithm is intended to be applied. Such bias often arises from collection of data by convenience and availability, without sufficiently considering the clinical task and patient population.