[Skip to Content]
[Skip to Content Landing]

Application of Statistical Learning to Identify Omicron Mutations in SARS-CoV-2 Viral Genome Sequence Data From Populations in Africa and the United States

Educational Objective
To identify the key insights or developments described in this article
1 Credit CME
Key Points

Question  Could the SARS-CoV-2 Omicron variant have been detected earlier with existing surveillance data and a state-of-the-art statistical learning strategy?

Findings  In this case series of 2698 Omicron cases in Africa and 12 141 Omicron cases in the United States, a statistical learning strategy found that Omicron was dynamically expanding in Africa and the United States with trackable expansion over time. The results indicated that Omicron could have been detected 20 days earlier in Africa; similarly, 8 Omicron cases were detected in the United States by November 25, 2021, prior to the official US Centers for Disease Control and Prevention declaration.

Meaning  These findings suggest that novel data analytics such as statistical learning strategy may have applications for surveillance of SARS-CoV-2 variants.


Importance  With timely collection of SARS-CoV-2 viral genome sequences, it is important to apply efficient data analytics to detect emerging variants at the earliest time.

Objective  To evaluate the application of a statistical learning strategy (SLS) to improve early detection of novel SARS-CoV-2 variants using viral sequence data from global surveillance.

Design, Setting, and Participants  This case series applied an SLS to viral genomic sequence data collected from 63 686 individuals in Africa and 531 827 individuals in the United States with SARS-CoV-2. Data were collected from January 1, 2020, to December 28, 2021.

Main Outcomes and Measures  The outcome was an indicator of Omicron variant derived from viral sequences. Centering on a temporally collected outcome, the SLS used the generalized additive model to estimate locally averaged Omicron caseload percentages (OCPs) over time to characterize Omicron expansion and to estimate when OCP exceeded 10%, 25%, 50%, and 75% of the caseload. Additionally, an unsupervised learning technique was applied to visualize Omicron expansions, and temporal and spatial distributions of Omicron cases were investigated.

Results  In total, there were 2698 cases of Omicron in Africa and 12 141 in the United States. The SLS found that Omicron was detectable in South Africa as early as December 31, 2020. With 10% OCP as a threshold, it may have been possible to declare Omicron a variant of concern as early as November 4, 2021, in South Africa. In the United States, the application of SLS suggested that the first case was detectable on November 21, 2021.

Conclusions and Relevance  The application of SLS demonstrates how the Omicron variant may have emerged and expanded in Africa and the United States. Earlier detection could help the global effort in disease prevention and control. To optimize early detection, efficient data analytics, such as SLS, could assist in the rapid identification of new variants as soon as they emerge, with or without lineages designated, using viral sequence data from global surveillance.

Sign in to take quiz and track your certificates

Buy This Activity

JN Learning™ is the home for CME and MOC from the JAMA Network. Search by specialty or US state and earn AMA PRA Category 1 Credit(s)™ from articles, audio, Clinical Challenges and more. Learn more about CME/MOC

CME Disclosure Statement: Unless noted, all individuals in control of content reported no relevant financial relationships. If applicable, all relevant financial relationships have been mitigated.

Article Information

Accepted for Publication: July 21, 2022.

Published: September 7, 2022. doi:10.1001/jamanetworkopen.2022.30293

Correction: This article was corrected on October 6, 2022, to fix errors in Figure 2.

Open Access: This is an open access article distributed under the terms of the CC-BY License. © 2022 Zhao LP et al. JAMA Network Open.

Corresponding Author: Lue Ping Zhao, PhD, Public Health Sciences Division (lzhao@fredhutch.org), and Lawrence Corey, MD, Vaccine and Infectious Disease Division (lcorey@fredhutch.org), Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109.

Author Contributions: Dr Zhao had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Zhao, Lybrand, Payne, Corey.

Acquisition, analysis, or interpretation of data: Zhao, Lybrand, Gilbert, Madeleine, Cohen, Geraghty, Jerome.

Drafting of the manuscript: Zhao, Lybrand, Payne.

Critical revision of the manuscript for important intellectual content: Zhao, Lybrand, Gilbert, Madeleine, Cohen, Geraghty, Jerome, Corey.

Statistical analysis: Zhao, Lybrand.

Obtained funding: Gilbert, Geraghty, Jerome, Corey.

Administrative, technical, or material support: Geraghty, Jerome.

Supervision: Payne, Cohen, Geraghty, Jerome, Corey.

Conflict of Interest Disclosures: Dr Gilbert reported grants from the National Institutes of Health National Institute of Allergy and Infectious Diseases for statistical work on COVID-19 vaccine efficacy trials during the conduct of the study. No other disclosures were reported.

Funding/Support: This research was funded by grants UM1 AI68614 and UM1 AI068635 from the National Institutes of Health National Institute of Allergy and Infectious Diseases.

Role of the Funder/Sponsor: The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Additional Information: All sequence data analyzed here are publicly available at GISAID (https://www.gisaid.org/).

Karim  SSA , Karim  QA .  Omicron SARS-CoV-2 variant: a new chapter in the COVID-19 pandemic.   Lancet. 2021;398(10317):2126-2128. doi:10.1016/S0140-6736(21)02758-6PubMedGoogle ScholarCrossref
Ingraham  NE , Ingbar  DH .  The Omicron variant of SARS-CoV-2: understanding the known and living with unknowns.   Clin Transl Med. 2021;11(12):e685. doi:10.1002/ctm2.685PubMedGoogle ScholarCrossref
Shu  Y , McCauley  J .  GISAID: global initiative on sharing all influenza data—from vision to reality.   Euro Surveill. 2017;22(13):30494. doi:10.2807/1560-7917.ES.2017.22.13.30494PubMedGoogle ScholarCrossref
Khare  S , Gurry  C , Freitas  L ,  et al.  GISAID’s role in pandemic response.   China CDC Wkly. 2021;3(49):1049-1051. doi:10.46234/ccdcw2021.255PubMedGoogle ScholarCrossref
GISAID. Accessed August 8, 2022. https://gisaid.org/
US Centers for Disease Control and Prevention. SARS-CoV-2 variant classifications and definitions. Updated April 26, 2022. Accessed August 8, 2022. https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-classifications.html
Rambaut  A , Holmes  EC , O’Toole  Á ,  et al.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.   Nat Microbiol. 2020;5(11):1403-1407. doi:10.1038/s41564-020-0770-5PubMedGoogle ScholarCrossref
Zhao  LP , Lybrand  TP , Gilbert  PB ,  et al.  Tracking SARS-CoV-2 spike protein mutations in the United States (2020/01 – 2021/03) using a statistical learning strategy.   Viruses. 2022;14(1):9. doi:10.3390/v14010009Google ScholarCrossref
Hastie  T , Tibshirani  R .  Generalized additive models.   Stat Sci. 1986;1(3):297-318.Google Scholar
Greene  CS , Tan  J , Ung  M , Moore  JH , Cheng  C .  Big data bioinformatics.   J Cell Physiol. 2014;229(12):1896-1900. doi:10.1002/jcp.24662PubMedGoogle ScholarCrossref
Zhao  LP , Roychoudhury  P , Gilbert  P ,  et al.  Variants in nucleocapsid protein and endoRNase are found to associate with severe COVID-19 hospitalization risk in a case-control study in Washington State, USA.   Sci Rep. 2022;12:1206. doi:10.1038/s41598-021-04376-4Google ScholarCrossref
Kimchi-Sarfaty  C , Oh  JM , Kim  IW ,  et al.  A “silent” polymorphism in the MDR1 gene changes substrate specificity.   Science. 2007;315(5811):525-528. doi:10.1126/science.1135308PubMedGoogle ScholarCrossref
Hu  S , Wang  M , Cai  G , He  M .  Genetic code-guided protein synthesis and folding in Escherichia coli.   J Biol Chem. 2013;288(43):30855-30861. doi:10.1074/jbc.M113.467977PubMedGoogle ScholarCrossref
Mitra  S , Ray  SK , Banerjee  R .  Synonymous codons influencing gene expression in organisms.   Res Rep Biochem. 2016;6:57-65. doi:10.2147/RRBC.S83483Google ScholarCrossref
Kalia  K , Saberwal  G , Sharma  G .  The lag in SARS-CoV-2 genome submissions to GISAID.   Nat Biotechnol. 2021;39(9):1058-1060. doi:10.1038/s41587-021-01040-0PubMedGoogle ScholarCrossref
AMA CME Accreditation Information

Credit Designation Statement: The American Medical Association designates this Journal-based CME activity activity for a maximum of 1.00  AMA PRA Category 1 Credit(s)™. Physicians should claim only the credit commensurate with the extent of their participation in the activity.

Successful completion of this CME activity, which includes participation in the evaluation component, enables the participant to earn up to:

  • 1.00 Medical Knowledge MOC points in the American Board of Internal Medicine's (ABIM) Maintenance of Certification (MOC) program;;
  • 1.00 Self-Assessment points in the American Board of Otolaryngology – Head and Neck Surgery’s (ABOHNS) Continuing Certification program;
  • 1.00 MOC points in the American Board of Pediatrics’ (ABP) Maintenance of Certification (MOC) program;
  • 1.00 Lifelong Learning points in the American Board of Pathology’s (ABPath) Continuing Certification program; and
  • 1.00 CME points in the American Board of Surgery’s (ABS) Continuing Certification program

It is the CME activity provider's responsibility to submit participant completion information to ACCME for the purpose of granting MOC credit.

Want full access to the AMA Ed Hub?
After you sign up for AMA Membership, make sure you sign in or create a Physician account with the AMA in order to access all learning activities on the AMA Ed Hub
Buy this activity
Want full access to the AMA Ed Hub?
After you sign up for AMA Membership, make sure you sign in or create a Physician account with the AMA in order to access all learning activities on the AMA Ed Hub
Buy this activity
With a personal account, you can:
  • Access free activities and track your credits
  • Personalize content alerts
  • Customize your interests
  • Fully personalize your learning experience
Education Center Collection Sign In Modal Right

Name Your Search

Save Search
With a personal account, you can:
  • Access free activities and track your credits
  • Personalize content alerts
  • Customize your interests
  • Fully personalize your learning experience

Lookup An Activity


My Saved Searches

You currently have no searches saved.


My Saved Courses

You currently have no courses saved.