A Primer on Machine Learning
Machine Learning (ML) and other artificial intelligence (AI) tools have become a staple in the non-medical and medical news as these techniques are applied to increasingly complex challenges. Much like the term "big data," these terms get loosely applied to varied projects, but it is important to know the fundamentals and situations where ML can be effectively applied.
Within computer science, AI is a category of techniques for computer reasoning and problem solving. ML, often described as a subset of AI, focuses on programming that allows a computer to generate a model of the world and modify that model as new data is encountered. ML is further subdivided into supervised and unsupervised learning. Supervised learning projects are common in radiology; a program is trained by a series of cases and diagnoses and the program is used to predict the diagnosis of a new dataset where the diagnosis is withheld. Unsupervised learning is more challenging as the program is required to take a stream of data and create a model on its own without training.
In many radiology applications, the challenge is developing a set of useful data. For supervised learning, a series of cases with valid diagnoses is needed to teach the program. These cases must be large enough to encompass the variation in the disease or process being studied. Even for unstructured learning, a dataset is needed to validate the program. This can be challenging in rare pathologies, or in diseases with a wide range of imaging appearances.
For ML and most programming to work, data needs to be highly structured and regular or a function is created to normalize it; for instance a condition must always be referred to with the same sequence of characters (i.e. diabetes, not DMt2, or diabetes mellitus). Data also has to be stored in a format for easy retrieval such as a table or series of comma separated values. When data is scattered through an image filled with noise, artifacts and other pathology, or buried in reports with qualifiers it can be hard for a machine to find and interpret.
This is an exciting time as the tools available are rapidly multiplying, allowing us to be more safe and accurate. Given such change, many radiologists are understandably concerned about their future — and some trainees are even directed by colleagues to other specialties. However, just as radiology today looks nothing like it did 50 years ago, the same will be true in 50 years. Imaging is increasingly central to medical care and while the role of the radiologist will change, we will need to focus on guiding our medical colleagues and patients on the implications of imaging findings in the diagnosis, care, and prognosis of our patients.
There will be growing need to synthesize input from supporting tools, verify algorithms are not confused by artifact, reconcile the imaging with the clinical context, and make personalized recommendations for future testing or treatment. It is crucial to understand that computers are most rapidly applied to lower complexity repetitive tasks particularly when there is a good signal to noise ratio, the scope is narrow, and the condition has a consistent appearance. Areas of radiology that already have applications in this arena such as pulmonary nodules or mammographic masses may leave you underwhelmed, but this will change.
If you are looking for more information, a variety of resources are available for those with interest including: radiology informatics fellowships, SIIM and the Journal of Digital Imaging, the JACR informatics resource center, and a variety of online information including courses like Coursera.
By Nathan Cross, MD, CIIP, Neuroradiology Fellow at University of Pennsylvania