Statistical significance is often understood as the most noteworthy indicator of difference. However, statistical significance alone provides an incomplete assessment of difference because it is reliant on sample size and effect size. Consequently, a large sample size can result in a significant p value despite the absence of a substantive effect. Effect size is a measure of the magnitude of difference between two groups and thus not sensitive to sample size.
The distinction between statistical significance and effect sizes becomes important in the context of intervention research because effect sizes can be used as a measure of impact and cost-effectiveness of an intervention. They also facilitate straightforward benchmarking against current best practices. The outcomes information provided by effect sizes is highly valuable to federal agencies, state governments, private industry, or local programs evaluating the efficacy or scalability of interventions. This meeting will focus on the computation, and applications, of effect sizes in research on children and families. It will include presentations and discussions on the following:
The meeting will convene federal staff and researchers with an interest in expanding and improving utilization of effect sizes in intervention research. The ultimate goals of the meeting are to 1) better understand the function and calculation of effect sizes; 2) identify the promises and challenges involved in utilizing them in analyses and in communicating intervention outcomes to policymakers; and 3) promote best practices for communicating intervention outcomes determined by effect sizes to stakeholders.
The discussion session focused on methods for standardizing the language used when reporting effect sizes and the information that is presented in journals and reports. To assist in the reporting of effect sizes, one panelist suggested collaborating with the publishing editors from professional associations to establish guidelines rather than working with individual journal editors. This suggestion was based on the observation that there are disciplinary differences in the reporting of effect sizes and in providing the context for the interpretation. When calculating effect sizes, common errors may result in multilevel models when researchers use statistical software packages without understanding the underlying formulas or functions—particularly if the variance component used is unknown or not clear. The panelists emphasized that effect sizes are not necessary when reporting natural units, though they may be useful for making comparisons across different measures. Also, percentage change may be an appropriate reporting mechanism when the measure has a natural zero point, such as dollars earned, but is of limited use in psychological studies (e.g., cognitive outcomes, affective measures). In conclusion, the suggested guidelines for reporting results included clearly defining the target population, providing a description of the comparability of measures to assess variability for contextual interpretation, and preparing tables containing standard deviations and regression coefficients.
The implications of measurement error differ depending on whether one is using natural or standardized units. One panelist noted that psychological outcomes are typically measured in arbitrary units, whereas economic indicators are typically measured in natural units. Measurement error for an outcome measured in natural units will affect standard errors and significance tests, but will not bias coefficient estimates. In contrast, measurement error for outcomes measured in standardized units is more problematic and introduces bias in coefficient estimates. In addition to reporting the measurement procedures, it is important to identify the intervention study design (e.g., random assignment versus pre-post) since these factors can influence the magnitude of the effect size expected. Standardized tests provide useful benchmarks from which to contextualize outcomes. How the contextualization markers are used depends on the goals of the comparison. When calculating effect sizes, researchers are comparing an impact to a standard deviation, but there are many standard deviations from which to select depending upon the research question. Some examples of reference points from which to select include: criterion reference points; social, normative, and moral standards; outcomes; and cost. Though usually not presented in detail, cost of an intervention is an important factor, though not the only factor, when interpreting the magnitude of the effect size. In terms of policy implications, intervention benefits for one population should not suggest that the intervention does not have potential negative effects for another population. Analyses must probe to find potential subgroup differences in effects.
Panelists indicated that intervention effects can be translated into financial benefits, but emphasized that this kind of translation introduces an additional form of uncertainty. In addition to the statistical uncertainty reflected in confidence intervals, projecting the long-term benefits of an intervention’s impact requires numerous assumptions. Panelists recommended examining the results of varying key assumptions. A theoretical framework can justify multiple outcomes, however, complications may arise when outcomes are all from the same family. When using a developmental trajectory approach for calculating benefits, it was noted that early outcomes may be difficult to quantify from a financial benefits perspective. Finally, when reporting effect size results, authors should take into account that stakeholders, such as policymakers and the public, have varying levels of knowledge on understanding and interpreting research results.
8:00 – 8:15
Naomi Goldstein, OPRE
Lauren Supplee, OPRE
8:15 – 10:15
James Griffin, NICHD
Slide Deck: What are effect sizes and why we need them?
Larry Hedges, Northwestern University
Slide Deck: How should we calculate effect sizes? What are common mistakes in the calculation of effect sizes? How does research design influence the calculation of effect sizes?
Howard Bloom, MDRC
Slide Deck: Issues in calculating average effect sizes in meta-analyses
Rebecca Maynard, University of Pennsylvania
Belinda Sims, NIDA
Slide Deck: Factors to consider in the interpretation of effect sizes
Carolyn Hill, Georgetown University
Slide Deck: How measurement of outcomes affects the interpretation and understanding of effect sizes
Margaret Burchinal, University of North Carolina
Slide Deck: Contextualizing effect sizes within substrata of the population, including differences by gender, race, ethnicity, community context, baseline risk, and developmental stage
Hendricks Brown, University of South Florida
1:30 – 3:00
Cheryl Boyce, NIMH
Slide Deck: What is best practice when using effect sizes to convey information about prevention and intervention research to practitioners and policymakers?
H. Steven Leff, Human Services Research Institute
Greg Duncan, Institute for Policy Research, Northwestern University
Harris Cooper, Duke University
Lauren Supplee, OPRE
Child Development Perspectives
Volume 2, Issue 2, December 2008
Special Section: The Application of Effect Sizes in Research on Children and Families (pages 164–166)
Abstracts available on http://link.springer.com/journal/11121/14/2
Introduction to the Special Section: The Application of Effect Sizes in Research on Children and Families
Lauren H Supplee
What Are Effect Sizes and Why Do We Need Them?
Larry V Hedges
Empirical Benchmarks for Interpreting Effect Sizes in Research
Carolyn J. Hill, Howard S. Bloom, Alison Rebeck Black and Mark W. Lipsey
How Measurement Error Affects the Interpretation and Understanding of Effect Sizes
Margaret R. Burchinal
The Search for Meaningful Ways to Express the Effects of Interventions
Averaging Effect Sizes Within and Across Studies of Interventions Aimed at Improving Child Outcomes
Nianbo Dong, Rebecca A. Maynard and Irma Perez-Johnson
Examining How Context Changes Intervention Impact: The Use of Effect Sizes in Multilevel Mixture Meta-Analysis
C. Hendricks Brown, Wei Wang and Irwin Sandler