[Skip to Content]
[Skip to Content Landing]

How to Read Articles That Use Machine LearningUsers’ Guides to the Medical Literature

Educational Objective
To understand the value of machine learning models for clinical care.
1 Credit CME
Abstract

In recent years, many new clinical diagnostic tools have been developed using complicated machine learning methods. Irrespective of how a diagnostic tool is derived, it must be evaluated using a 3-step process of deriving, validating, and establishing the clinical effectiveness of the tool. Machine learning–based tools should also be assessed for the type of machine learning model used and its appropriateness for the input data type and data set size. Machine learning models also generally have additional prespecified settings called hyperparameters, which must be tuned on a data set independent of the validation set. On the validation set, the outcome against which the model is evaluated is termed the reference standard. The rigor of the reference standard must be assessed, such as against a universally accepted gold standard or expert grading.

Sign in to take quiz and track your certificates

Buy This Activity

JN Learning™ is the home for CME and MOC from the JAMA Network. Search by specialty or US state and earn AMA PRA Category 1 CME Credit™ from articles, audio, Clinical Challenges and more. Learn more about CME/MOC

Article Information

Corresponding Author: Yun Liu, PhD, Google Health, 3400 Hillview Ave, Palo Alto, CA 94304 (liuyun@google.com).

Author Contributions: Dr Liu had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Drs Liu and Chen contributed equally to this article.

Concept and design: All authors.
Acquisition, analysis, or interpretation of data: Liu, Chen, Peng.
Drafting of the manuscript: Liu, Chen.
Critical revision of the manuscript for important intellectual content: All authors.
Administrative, technical, or material support: Liu, Chen, Peng.
Supervision: Liu, Chen, Peng.
Other - machine learning expertise: Liu, Chen, Krause.

Conflict of Interest Disclosures: Dr Liu reported holding a patent in machine learning to each of the following: analyzing retinal fundus photographs and another for analyzing histopathology slides (status pending or granted), analyzing skin conditions (status pending), and analyzing physiological signals (status granted); and being an employee of Google and holding Alphabet stock (part of the compensation package). Dr Chen reported being an employee of Google and holding Alphabet stock (part of the compensation package). Dr Krause reported receipt of personal fees from Stanford University outside the submitted work and being an employee of Google and holding Alphabet stock (part of the compensation package). Dr Peng reported holding a patent to the each of the following: predicting cardiovascular risk factors in retinal fundus photographs using deep learning, fundus imagery machine learning systems, health predictions from histopathology slides, and pathology heatmap predictions (status pending for all); and being an employee of Google and holding Alphabet stock (part of the compensation package).

References
1.
American Academy of Ophthalmology.  Diabetic Retinopathy PPP—Updated 2017.https://www.aao.org/preferred-practice-pattern/diabetic-retinopathy-ppp-updated-2017. Published December 18, 2017. Accessed September 16, 2019.
2.
Walton  OB  IV, Garoon  RB, Weng  CY,  et al.  Evaluation of automated teleretinal screening program for diabetic retinopathy.  JAMA Ophthalmol. 2016;134(2):204-209. doi:10.1001/jamaophthalmol.2015.5083PubMedGoogle ScholarCrossref
3.
Gulshan  V, Peng  L, Coram  M,  et al.  Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.  JAMA. 2016;316(22):2402-2410. doi:10.1001/jama.2016.17216PubMedGoogle ScholarCrossref
4.
Ting  DSW, Cheung  CY-L, Lim  G,  et al.  Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes.  JAMA. 2017;318(22):2211-2223. doi:10.1001/jama.2017.18152PubMedGoogle ScholarCrossref
5.
Kanagasingam  Y, Xiao  D, Vignarajan  J, Preetham  A, Tay-Kearney  M-L, Mehrotra  A.  Evaluation of artificial intelligence-based grading of diabetic retinopathy in primary care.  JAMA Netw Open. 2018;1(5):e182665. doi:10.1001/jamanetworkopen.2018.2665PubMedGoogle Scholar
6.
Jaeschke  R, Guyatt  G, Sackett  DL; Evidence-Based Medicine Working Group.  Users’ guides to the medical literature, III: how to use an article about a diagnostic test, A: are the results of the study valid?  JAMA. 1994;271(5):389-391. doi:10.1001/jama.1994.03510290071040PubMedGoogle ScholarCrossref
7.
Jaeschke  R, Guyatt  GH, Sackett  DL; Evidence-Based Medicine Working Group.  Users’ guides to the medical literature, III: how to use an article about a diagnostic test, B: what are the results and will they help me in caring for my patients?  JAMA. 1994;271(9):703-707. doi:10.1001/jama.1994.03510330081039PubMedGoogle ScholarCrossref
8.
McGinn  TG, Guyatt  GH, Wyer  PC, Naylor  CD, Stiell  IG, Richardson  WS; Evidence-Based Medicine Working Group.  Users’ guides to the medical literature, XXII: how to use articles about clinical decision rules.  JAMA. 2000;284(1):79-84. doi:10.1001/jama.284.1.79PubMedGoogle ScholarCrossref
9.
Beam  AL, Kohane  IS.  Big data and machine learning in health care.  JAMA. 2018;319(13):1317-1318. doi:10.1001/jama.2017.18391PubMedGoogle ScholarCrossref
10.
Stiell  IG, Greenberg  GH, McKnight  RD, Nair  RC, McDowell  I, Worthington  JR.  A study to develop clinical decision rules for the use of radiography in acute ankle injuries.  Ann Emerg Med. 1992;21(4):384-390. doi:10.1016/S0196-0644(05)82656-3PubMedGoogle ScholarCrossref
11.
Lucchesi  GM, Jackson  RE, Peacock  WF, Cerasani  C, Swor  RA.  Sensitivity of the Ottawa rules.  Ann Emerg Med. 1995;26(1):1-5. doi:10.1016/S0196-0644(95)70229-6PubMedGoogle ScholarCrossref
12.
Kelly  AM, Richards  D, Kerr  L,  et al.  Failed validation of a clinical decision rule for the use of radiography in acute ankle injury.  N Z Med J. 1994;107(982):294-295.PubMedGoogle Scholar
13.
Stiell  I, Wells  G, Laupacia  A,  et al.  Multicentre trial to introduce the Ottawa ankle rules for use of radiography in acute ankle injuries: Multicentre Ankle Rule Study Group.  BMJ. 1995;311(7005):594-597. doi:10.1136/bmj.311.7005.594PubMedGoogle ScholarCrossref
14.
Auleley  G-R, Kerboull  L, Durieux  P, Cosquer  M, Courpied  J-P, Ravaud  P.  Validation of the Ottawa ankle rules in France: a study in the surgical emergency department of a teaching hospital.  Ann Emerg Med. 1998;32(1):14-18. doi:10.1016/S0196-0644(98)70093-9PubMedGoogle ScholarCrossref
15.
Papacostas  E, Malliaropoulos  N, Papadopoulos  A, Liouliakis  C.  Validation of Ottawa ankle rules protocol in Greek athletes: study in the emergency departments of a district general hospital and a sports injuries clinic.  Br J Sports Med. 2001;35(6):445-447. doi:10.1136/bjsm.35.6.445PubMedGoogle ScholarCrossref
16.
Auleley  GR, Ravaud  P, Giraudeau  B,  et al.  Implementation of the Ottawa ankle rules in France: a multicenter randomized controlled trial.  JAMA. 1997;277(24):1935-1939. doi:10.1001/jama.1997.03540480035035PubMedGoogle ScholarCrossref
17.
Alba  AC, Agoritsas  T, Walsh  M,  et al.  Discrimination and calibration of clinical prediction models: Users’ Guides to the Medical Literature.  JAMA. 2017;318(14):1377-1384. doi:10.1001/jama.2017.12126PubMedGoogle ScholarCrossref
18.
Collins  GS, Reitsma  JB, Altman  DG, Moons  KGM.  Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.  Ann Intern Med. 2015;162(1):55-63. doi:10.7326/M14-0697PubMedGoogle ScholarCrossref
19.
Hanley  JA, McNeil  BJ.  The meaning and use of the area under a receiver operating characteristic (ROC) curve.  Radiology. 1982;143(1):29-36. doi:10.1148/radiology.143.1.7063747PubMedGoogle ScholarCrossref
20.
Moons  KGM, Altman  DG, Reitsma  JB,  et al.  Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration.  Ann Intern Med. 2015;162(1):W1-73. doi:10.7326/M14-0698PubMedGoogle ScholarCrossref
21.
Ehteshami Bejnordi  B, Veta  M, Johannes van Diest  P,  et al; the CAMELYON16 Consortium.  Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.  JAMA. 2017;318(22):2199-2210. doi:10.1001/jama.2017.14585PubMedGoogle ScholarCrossref
22.
Antman  EM, Cohen  M, Bernink  PJ,  et al.  The TIMI risk score for unstable angina/non-ST elevation MI: a method for prognostication and therapeutic decision making.  JAMA. 2000;284(7):835-842. doi:10.1001/jama.284.7.835PubMedGoogle ScholarCrossref
23.
Olah  C, Mordvintsev  A, Schubert  L. Distill website.   Feature visualization: how neural networks build up their understanding of images.https://distill.pub/2017/feature-visualization/. Accessed October 25, 2019.
24.
Krizhevsky  A, Sutskever  I, Hinton  GE. ImageNet classification with deep convolutional neural networks. In: Pereira  F, Burges  CJC, Bottou  L, Weinberger  KQ, eds.  Advances in Neural Information Processing Systems 25 (NIPS 2012). Red Hook, NY: Curran Associates; 2012:1097-1105.
25.
LeCun  Y, Bengio  Y, Hinton  G.  Deep learning.  Nature. 2015;521(7553):436-444. doi:10.1038/nature14539PubMedGoogle ScholarCrossref
26.
Russakovsky  O, Deng  J, Su  H,  et al.  ImageNet large scale visual recognition challenge.  Int J Comput Vis. 2015;115(3):211-252. doi:10.1007/s11263-015-0816-yGoogle ScholarCrossref
27.
Vittinghoff  E, McCulloch  CE.  Relaxing the rule of ten events per variable in logistic and Cox regression.  Am J Epidemiol. 2007;165(6):710-718. doi:10.1093/aje/kwk052PubMedGoogle ScholarCrossref
28.
Zhang  C, Bengio  S, Hardt  M, Recht  B, Vinyals  O.  Understanding deep learning requires rethinking generalization. Paper presented at: 5th International Conference on Learning Representations, ICLR 2017; April 24-26, 2017; Toulon, France. https://dblp.org/db/conf/iclr/iclr2017. Accessed October 11, 2019.
29.
Krause  J, Gulshan  V, Rahimy  E,  et al.  Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy.  Ophthalmology. 2018;125(8):1264-1272. doi:10.1016/j.ophtha.2018.01.034PubMedGoogle ScholarCrossref
30.
Poplin  R, Varadarajan  AV, Blumer  K,  et al.  Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning.  Nat Biomed Eng. 2018;2(3):158-164. doi:10.1038/s41551-018-0195-0PubMedGoogle ScholarCrossref
31.
Ting  DSW, Wong  TY.  Eyeing cardiovascular risk factors.  Nat Biomed Eng. 2018;2(3):140-141. doi:10.1038/s41551-018-0210-5PubMedGoogle ScholarCrossref
32.
Litjens  G, Sánchez  CI, Timofeeva  N,  et al.  Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis.  Sci Rep. 2016;6:26286. doi:10.1038/srep26286PubMedGoogle ScholarCrossref
33.
Raumviboonsuk  P, Krause  J, Chotcomwongse  P,  et al.  Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program.  NPJ Digit Med. 2019;10(2):25. doi:10.1038/s41746-019-0099-8PubMedGoogle ScholarCrossref
34.
Steiner  DF, MacDonald  R, Liu  Y,  et al.  Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer.  Am J Surg Pathol. 2018;42(12):1636-1646. doi:10.1097/PAS.0000000000001151PubMedGoogle ScholarCrossref
35.
Sayres  R, Taly  A, Rahimy  E,  et al.  Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy.  Ophthalmology. (December 2018). doi:10.1016/j.ophtha.2018.11.016PubMedGoogle Scholar
36.
Chan  HP, Doi  K, Vyborny  CJ,  et al.  Improvement in radiologists’ detection of clustered microcalcifications on mammograms: the potential of computer-aided diagnosis.  Invest Radiol. 1990;25(10):1102-1110. doi:10.1097/00004424-199010000-00006PubMedGoogle ScholarCrossref
37.
Halligan  S, Mallett  S, Altman  DG,  et al.  Incremental benefit of computer-aided detection when used as a second and concurrent reader of CT colonographic data: multiobserver study.  Radiology. 2011;258(2):469-476. doi:10.1148/radiol.10100354PubMedGoogle ScholarCrossref
38.
White  CS, Pugatch  R, Koonce  T, Rust  SW, Dharaiya  E.  Lung nodule CAD software as a second reader: a multicenter study.  Acad Radiol. 2008;15(3):326-333. doi:10.1016/j.acra.2007.09.027PubMedGoogle ScholarCrossref
39.
US Food and Drug Aministration.  Digital health software precertification (Pre-Cert) program. https://www.fda.gov/MedicalDevices/DigitalHealth/DigitalHealthPreCertProgram/default.htm. Accessed January 27, 2019.
40.
Hinton  G.  Deep learning—a technology with the potential to transform health care.  JAMA. 2018;320(11):1101-1102. doi:10.1001/jama.2018.11100PubMedGoogle ScholarCrossref
If you are not a JN Learning subscriber, you can either:
Subscribe to JN Learning for one year
Buy this activity
jn-learning_Modal_LoginSubscribe_Purchase
Close
If you are not a JN Learning subscriber, you can either:
Subscribe to JN Learning for one year
Buy this activity
jn-learning_Modal_LoginSubscribe_Purchase
Close
With a personal account, you can:
  • Access free activities and track your credits
  • Personalize content alerts
  • Customize your interests
  • Fully personalize your learning experience
Education Center Collection Sign In Modal Right
Close

Name Your Search

Save Search
Close
With a personal account, you can:
  • Track your credits
  • Personalize content alerts
  • Customize your interests
  • Fully personalize your learning experience
jn-learning_Modal_SaveSearch_NoAccess_Purchase
Close

Lookup An Activity

or

Close

My Saved Searches

You currently have no searches saved.

Close
With a personal account, you can:
  • Access free activities and track your credits
  • Personalize content alerts
  • Customize your interests
  • Fully personalize your learning experience
Education Center Collection Sign In Modal Right
Close