[Skip to Content]
[Skip to Content Landing]

Evaluation of Evidence of Statistical Support and Corroboration of Subgroup Claims in Randomized Clinical Trials

Educational Objective
To evaluate how often subgroup claims reported in the abstracts of randomized clinical trials (RCTs) are actually supported by statistical evidence (P < .05 from an interaction test) and corroborated by subsequent RCTs and meta-analyses.
1 Credit CME
Key Points

Question  How often are subgroup claims reported in the abstracts of randomized clinical trials supported by a statistically significant interaction test result and corroborated by subsequent randomized clinical trials and meta-analyses?

Findings  In this meta-epidemiological survey, a minority of subgroup claims (46 of 117) in the abstract of randomized clinical trials were supported by their own data. Only 5 of these 46 subgroup findings had at least 1 subsequent corroboration attempt, and none of the corroboration attempts had a statistically significant P value from an interaction test.

Meaning  Claims of subgroup differences in randomized clinical trials are typically spurious or chance findings.

Abstract

Importance  Many published randomized clinical trials (RCTs) make claims for subgroup differences.

Objective  To evaluate how often subgroup claims reported in the abstracts of RCTs are actually supported by statistical evidence (P < .05 from an interaction test) and corroborated by subsequent RCTs and meta-analyses.

Data Sources  This meta-epidemiological survey examines data sets of trials with at least 1 subgroup claim, including Subgroup Analysis of Trials Is Rarely Easy (SATIRE) articles and Discontinuation of Randomized Trials (DISCO) articles. We used Scopus (updated July 2016) to search for English-language articles citing each of the eligible index articles with at least 1 subgroup finding in the abstract.

Study Selection  Articles with a subgroup claim in the abstract with or without evidence of statistical heterogeneity (P < .05 from an interaction test) in the text and articles attempting to corroborate the subgroup findings.

Data Extraction and Synthesis  Study characteristics of trials with at least 1 subgroup claim in the abstract were recorded. Two reviewers extracted the data necessary to calculate subgroup-level effect sizes, standard errors, and the P values for interaction. For individual RCTs and meta-analyses that attempted to corroborate the subgroup findings from the index articles, trial characteristics were extracted. Cochran Q test was used to reevaluate heterogeneity with the data from all available trials.

Main Outcomes and Measures  The number of subgroup claims in the abstracts of RCTs, the number of subgroup claims in the abstracts of RCTs with statistical support (subgroup findings), and the number of subgroup findings corroborated by subsequent RCTs and meta-analyses.

Results  Sixty-four eligible RCTs made a total of 117 subgroup claims in their abstracts. Of these 117 claims, only 46 (39.3%) in 33 articles had evidence of statistically significant heterogeneity from a test for interaction. In addition, out of these 46 subgroup findings, only 16 (34.8%) ensured balance between randomization groups within the subgroups (eg, through stratified randomization), 13 (28.3%) entailed a prespecified subgroup analysis, and 1 (2.2%) was adjusted for multiple testing. Only 5 (10.9%) of the 46 subgroup findings had at least 1 subsequent pure corroboration attempt by a meta-analysis or an RCT. In all 5 cases, the corroboration attempts found no evidence of a statistically significant subgroup effect. In addition, all effect sizes from meta-analyses were attenuated toward the null.

Conclusions and Relevance  A minority of subgroup claims made in the abstracts of RCTs are supported by their own data (ie, a significant interaction effect). For those that have statistical support (P < .05 from an interaction test), most fail to meet other best practices for subgroup tests, including prespecification, stratified randomization, and adjustment for multiple testing. Attempts to corroborate statistically significant subgroup differences are rare; when done, the initially observed subgroup differences are not reproduced.

Sign in to take quiz and track your certificates

Buy This Activity

JN Learning™ from JAMA Network is your new home for CME and MOC from a source you trust. Earn AMA PRA Category 1 CME Credit™ from relevant articles, audio, and Clinical Challenge image quizzes, explore interactives and videos, and – depending on your specialty or state – have your MOC points automatically transferred to the relevant board. Learn more about CME

Article Information

Corresponding Author: John P. A. Ioannidis, MD, DSc, Stanford University, 1265 Welch Rd, Medical School Office Bldg, Room X306, Stanford, CA 94305 (jioannid@stanford.edu).

Accepted for Publication: November 7, 2016.

Published Online: February 13, 2017. doi:10.1001/jamainternmed.2016.9125

Author Contributions: Drs Wallach and Ioannidis had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Wallach, Sullivan, Steyerberg, Ioannidis.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Wallach, Sullivan, Sainani.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Wallach, Sullivan, Trepanowski, Sainani.

Study supervision: Wallach, Ioannidis.

Conflict of Interest Disclosures: None reported.

Funding/Support: The Meta-Research Innovation Center at Stanford (METRICS) is supported by a grant from the Laura and John Arnold Foundation. This work was conducted with support from the Stanford Clinical and Translational Science Award to Spectrum, an independent center within Stanford University that supports health-related research activities across Stanford University (grant UL1 TR001085 from the National Institutes of Health [NIH]). Dr Trepanowski is supported by grant T32 HL007034 from the NIH. Dr Steyerberg is partly supported by the PRICES project (grant U01 NS086294 from the NIH). Dr Ioannidis is supported by an unrestricted gift from Sue and Bob O’Donnell to the Stanford Prevention Research Center.

Role of the Funder/Sponsor: The funders were not involved in any aspect related to the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Additional Contributions: Benjamin Kasenda, MD, PhD (Basel Institute for Clinical Epidemiology and Biostatistics, University Hospital Basel, Basel, Switzerland) (and the Discontinuation of Randomized Trials [DISCO] study group) shared the articles and some raw data for the DISCO articles with at least 1 subgroup claim anywhere in the text. Xi Sun, PhD (Chinese Evidence-Based Medicine Center, West China Hospital, Sichuan University, Chengdu, China) (and the Subgroup Analysis of Trials Is Rarely Easy [SATIRE] study group) provided the names of the SATIRE articles with at least 1 subgroup claim anywhere in the text. Drs Kasenda and Sun did not receive any compensation for sharing their data.

References
1.
Hamburg  MA, Collins  FS.  The path to personalized medicine.  N Engl J Med. 2010;363(4):301-304.PubMedGoogle ScholarCrossref
2.
Collins  FS, Varmus  H.  A new initiative on precision medicine.  N Engl J Med. 2015;372(9):793-795.PubMedGoogle ScholarCrossref
3.
Sun  X, Briel  M, Busse  JW,  et al.  The influence of study characteristics on reporting of subgroup analyses in randomised controlled trials: systematic review.  BMJ. 2011;342:d1569.PubMedGoogle ScholarCrossref
4.
Hernández  AV, Boersma  E, Murray  GD, Habbema  JD, Steyerberg  EW.  Subgroup analyses in therapeutic cardiovascular clinical trials: are most of them misleading?  Am Heart J. 2006;151(2):257-264.PubMedGoogle ScholarCrossref
5.
Wang  R, Lagakos  SW, Ware  JH, Hunter  DJ, Drazen  JM.  Statistics in medicine: reporting of subgroup analyses in clinical trials.  N Engl J Med. 2007;357(21):2189-2194.PubMedGoogle ScholarCrossref
6.
Assmann  SF, Pocock  SJ, Enos  LE, Kasten  LE.  Subgroup analysis and other (mis)uses of baseline data in clinical trials.  Lancet. 2000;355(9209):1064-1069.PubMedGoogle ScholarCrossref
7.
Pocock  SJ, Assmann  SE, Enos  LE, Kasten  LE.  Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems.  Stat Med. 2002;21(19):2917-2930.PubMedGoogle ScholarCrossref
8.
Kasenda  B, Schandelmaier  S, Sun  X,  et al; DISCO Study Group.  Subgroup analyses in randomised controlled trials: cohort study on trial protocols and journal publications.  BMJ. 2014;349:g4539.PubMedGoogle ScholarCrossref
9.
Sun  X, Briel  M, Busse  JW,  et al.  Credibility of claims of subgroup effects in randomised controlled trials: systematic review.  BMJ. 2012;344:e1553. doi:10.1136/bmj.e1553PubMedGoogle ScholarCrossref
10.
Rothwell  PM.  Treating individuals, 2: subgroup analysis in randomised controlled trials: importance, indications, and interpretation.  Lancet. 2005;365(9454):176-186.PubMedGoogle ScholarCrossref
11.
Oxman  AD, Guyatt  GH.  A consumer’s guide to subgroup analyses.  Ann Intern Med. 1992;116(1):78-84.PubMedGoogle ScholarCrossref
12.
Sainani  K.  Misleading comparisons: the fallacy of comparing statistical significance.  PM&R. 2010;2(6):559-562.PubMedGoogle ScholarCrossref
13.
Ioannidis  JP.  Why most published research findings are false.  PLoS Med. 2005;2(8):e124. PubMedGoogle ScholarCrossref
14.
Iqbal  SA, Wallach  JD, Khoury  MJ, Schully  SD, Ioannidis  JP.  Reproducible research practices and transparency across the biomedical literature.  PLoS Biol. 2016;14(1):e1002333. PubMedGoogle ScholarCrossref
15.
Ioannidis  JP, Greenland  S, Hlatky  MA,  et al.  Increasing value and reducing waste in research design, conduct, and analysis.  Lancet. 2014;383(9912):166-175.PubMedGoogle ScholarCrossref
16.
Sun  X, Briel  M, Busse  JW,  et al.  Subgroup Analysis of Trials Is Rarely Easy (SATIRE): a study protocol for a systematic review to characterize the analysis, reporting, and claim of subgroup effects in randomized trials.  Trials. 2009;10:101.PubMedGoogle ScholarCrossref
17.
Girerd  N, Rabilloud  M, Pibarot  P, Mathieu  P, Roy  P.  Quantification of treatment effect modification on both an additive and multiplicative scale.  PLoS One. 2016;11(4):e0153010.PubMedGoogle ScholarCrossref
18.
VanderWeele  TJ.  On the distinction between interaction and effect modification [published corrections appear in Epidemiology. 2011;22(5):752 and 2010;21(1):162].  Epidemiology. 2009;20(6):863-871.PubMedGoogle ScholarCrossref
19.
VanderWeele  TJ, Robins  JM.  Four types of effect modification: a classification based on directed acyclic graphs.  Epidemiology. 2007;18(5):561-568.PubMedGoogle ScholarCrossref
20.
DerSimonian  R, Laird  N.  Meta-analysis in clinical trials revisited.  Contemp Clin Trials. 2015;45(pt A):139-145. PubMedGoogle Scholar
21.
Altman  DG, Bland  JM.  Interaction revisited: the difference between two estimates.  BMJ. 2003;326(7382):219.PubMedGoogle ScholarCrossref
22.
Spiegel  D, Butler  LD, Giese-Davis  J,  et al.  Effects of supportive-expressive group therapy on survival of patients with metastatic breast cancer: a randomized prospective trial.  Cancer. 2007;110(5):1130-1138.PubMedGoogle ScholarCrossref
23.
Kim  F, Olsufka  M, Longstreth  WT  Jr,  et al.  Pilot randomized clinical trial of prehospital induction of mild hypothermia in out-of-hospital cardiac arrest patients with a rapid infusion of 4°C normal saline.  Circulation. 2007;115(24):3064-3070.PubMedGoogle ScholarCrossref
24.
Nguyen  TH, Tran  TH, Thwaites  G,  et al.  Dexamethasone in Vietnamese adolescents and adults with bacterial meningitis.  N Engl J Med. 2007;357(24):2431-2440.PubMedGoogle ScholarCrossref
25.
Pfisterer  M, Buser  P, Rickli  H,  et al; TIME-CHF Investigators.  BNP-guided vs symptom-guided heart failure therapy: the Trial of Intensified vs Standard Medical Therapy in Elderly Patients With Congestive Heart Failure (TIME-CHF) randomized trial.  JAMA. 2009;301(4):383-392.PubMedGoogle ScholarCrossref
26.
Hunter  BR, O’Donnell  DP, Allgood  KL, Seupaul  RA.  No benefit to prehospital initiation of therapeutic hypothermia in out-of-hospital cardiac arrest: a systematic review and meta-analysis.  Acad Emerg Med. 2014;21(4):355-364.PubMedGoogle ScholarCrossref
27.
Huang  FY, Huang  BT, Wang  PJ,  et al.  The efficacy and safety of prehospital therapeutic hypothermia in patients with out-of-hospital cardiac arrest: a systematic review and meta-analysis.  Resuscitation. 2015;96:170-179.PubMedGoogle ScholarCrossref
28.
Kim  F, Nichol  G, Maynard  C,  et al.  Effect of prehospital induction of mild hypothermia on survival and neurological status among adults with cardiac arrest: a randomized clinical trial.  JAMA. 2014;311(1):45-52.PubMedGoogle ScholarCrossref
29.
Castrén  M, Nordberg  P, Svensson  L,  et al.  Intra-arrest transnasal evaporative cooling: a randomized, prehospital, multicenter study (PRINCE: Pre-ROSC Intranasal Cooling Effectiveness).  Circulation. 2010;122(7):729-736.PubMedGoogle ScholarCrossref
30.
Corwin  HL, Gettinger  A, Fabian  TC,  et al; EPO Critical Care Trials Group.  Efficacy and safety of epoetin alfa in critically ill patients.  N Engl J Med. 2007;357(10):965-976.PubMedGoogle ScholarCrossref
31.
Ebbeling  CB, Leidig  MM, Feldman  HA, Lovesky  MM, Ludwig  DS.  Effects of a low-glycemic load vs low-fat diet in obese young adults: a randomized trial.  JAMA. 2007;297(19):2092-2102.PubMedGoogle ScholarCrossref
32.
Löwenberg  B, Ossenkoppele  GJ, van Putten  W,  et al; Dutch-Belgian Cooperative Trial Group for Hemato-Oncology (HOVON); German AML Study Group (AMLSG); Swiss Group for Clinical Cancer Research (SAKK) Collaborative Group.  High-dose daunorubicin in older patients with acute myeloid leukemia [published correction appears in   N Engl J Med .  2010 ; 362 ( 12 ): 1155 ].  N Engl J Med. 2009;361(13):1235-1248.PubMedGoogle Scholar
33.
Corwin  HL, Gettinger  A, Pearl  RG,  et al; EPO Critical Care Trials Group.  Efficacy of recombinant human erythropoietin in critically ill patients: a randomized controlled trial.  JAMA. 2002;288(22):2827-2835.PubMedGoogle ScholarCrossref
34.
Pittas  AG, Das  SK, Hajduk  CL,  et al.  A low-glycemic load diet facilitates greater weight loss in overweight adults with high insulin secretion but not in overweight adults with low insulin secretion in the CALERIE Trial.  Diabetes Care. 2005;28(12):2939-2941.PubMedGoogle ScholarCrossref
35.
Ebrahim  S, Sohani  ZN, Montoya  L,  et al.  Reanalyses of randomized clinical trial data.  JAMA. 2014;312(10):1024-1032.PubMedGoogle ScholarCrossref
36.
Knol  MJ, VanderWeele  TJ.  Recommendations for presenting analyses of effect modification and interaction.  Int J Epidemiol. 2012;41(2):514-520.PubMedGoogle ScholarCrossref
If you are not a JN Learning subscriber, you can either:
Subscribe to JN Learning for one year
Buy this activity
jn-learning_Modal_LoginSubscribe_Purchase
If you are not a JN Learning subscriber, you can either:
Subscribe to JN Learning for one year
Buy this activity
jn-learning_Modal_LoginSubscribe_Purchase
With a personal account, you can:
  • Access free activities and track your credits
  • Personalize content alerts
  • Customize your interests
  • Fully personalize your learning experience
Education Center Collection Sign In Modal Right

Name Your Search

Save Search
With a personal account, you can:
  • Track your credits
  • Personalize content alerts
  • Customize your interests
  • Fully personalize your learning experience
jn-learning_Modal_SaveSearch_NoAccess_Purchase

Lookup An Activity

or

My Saved Searches

You currently have no searches saved.

With a personal account, you can:
  • Access free activities and track your credits
  • Personalize content alerts
  • Customize your interests
  • Fully personalize your learning experience
Education Center Collection Sign In Modal Right
Topics
State Requirements