A Human-Centered Evaluation of a Deep Learning System Deployed in Clinics for the Detection of Diabetic Retinopathy


A review of the paper for research, educational and recommendations. (You can rate this content, please, using the top star ⭐ system).


Diabetes is a growing problem around the world, including in Southeast Asia. As of 2016, 9.6% of Thailand’s population was living with diabetes, comparable to 9.1% of the population in the United States. With diabetes comes complications, including diabetic retinopathy (DR), a condition caused by chronically high blood sugar that damages blood vessels in the retina, the thin layer at the back of the eye responsible for sensing light and sending signals to the brain. These blood vessels can leak or hemorrhage, causing vision distortion or loss. DR is one of the leading causes of vision impairment in the world, and causes 5% of cases of blindness worldwide, excluding refractive errors. In Thailand, 34% of patients with diabetes have low vision or blindness in either eye. In early stages of DR, a patient often has no symptoms, making it important for people living with diabetes to be screened regularly, as this is the stage in which damage can be reversed— progression of DR can be stopped or significantly reduced by blood sugar control. Early detection is key to initiate timely treatment and mitigate the risk of blindness. Since 2013, the Ministry of Health in Thailand has set a goal to screen 60% of its diabetic population for diabetic retinopathy (DR). However, reaching this goal is a challenge due to a shortage of clinical specialists. In Thailand, there are 1500 ophthalmologists, including 200 retinal specialists, who provide ophthalmic care to approximately 4.5 million patients with diabetes—a ratio of about 1:3000, about double of what it is in the United States. The shortage of doctors limits the ability to screen patients and also creates a treatment backlog for those found to have DR. As a result, nurses conduct DR screenings when patients come in for diabetes check-ups, by taking photos of the retina and sending them to an ophthalmologist for review. Our team has developed a deep learning algorithm that can provide an assessment of diabetic retinopathy, bypassing the need to wait weeks for an ophthalmologist to review the retinal images [20]. This algorithm has been shown to have specialist-level accuracy (>90% sensitivity and specificity) for the detection of referable cases of diabetic retinopathy. Through a large-scale, retrospective study comparing the algorithm to human graders, the deep learning algorithm shows significant reduction in the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Currently, there are no requirements for AI systems to be evaluated through observational clinical studies, nor is it common practice. This is a problem because the success of a deep learning model does not rest solely on its accuracy, but also on its ability to improve patient care. Prospective studies that involve evaluations of deep learning models within a clinical environment are beginning to emerge. These studies are designed to provide additional evidence of model accuracy (sensitivity and specificity), but are not sufficient to evaluate true clinical effectiveness — that is, impact on patient care, nor do they explore socio-environmental factors that impact model performance in the wild. Furthermore, as Yang and colleagues note, when HCI researchers attempt to study AI systems in a hospital or clinic, they are often prevented from fully embedding into clinical workflows and from evaluating systems using authentic patient data.

This paper contributes the first human-centered observational study of a deep learning system deployed directly in clinical care with patients. Through field observations and interviews at eleven clinics across Thailand, we explored the expectations and realities that nurses encounter in bringing a deep learning model into their clinical practices. First, we outline typical eye-screening workflows and challenges that nurses experience when screening hundreds of patients. Then, we explore the expectations nurses have for an AI-assisted eye screening process. Next, we present a human-centered, observational study of the deep learning system used in clinical care, examining nurses’ experiences with the system, and the socio-environmental factors that impacted system performance. Finally, we conclude with a discussion around applications of HCI methods to the evaluation of deep learning algorithms in clinical environments.


🔘 Paper page: dl.acm.org/doi/abs/10.1145/3313831.3376718


Emma Beede, Elizabeth Baylor, Fred Hersch, Anna Iurchenko, Lauren Wilcox, Paisan Ruamviboonsuk, Laura M. Vardoulakis

Click to rate this post
[Total: 0 Average: 0]

Liked this post? Follow this blog to get more.