Why is replication important in experimental design




















The repetition of the study procedures is an appealing definition of replication because it often corresponds to what researchers do when conducting a replication—i. But the reason for doing so is not because repeating procedures defines replication. Replications often repeat procedures because theories are too vague and methods too poorly understood to productively conduct replications and advance theoretical understanding otherwise [ 8 ]. We propose an alternative definition for replication that is more inclusive of all research and more relevant for the role of replication in advancing knowledge.

Replication is a study for which any outcome would be considered diagnostic evidence about a claim from prior research. To be a replication, 2 things must be true: outcomes consistent with a prior claim would increase confidence in the claim, and outcomes inconsistent with a prior claim would decrease confidence in the claim.

The symmetry promotes replication as a mechanism for confronting prior claims with new evidence. Therefore, declaring that a study is a replication is a theoretical commitment.

Replication provides the opportunity to test whether existing theories, hypotheses, or models are able to predict outcomes that have not yet been observed. Successful replications increase confidence in those models; unsuccessful replications decrease confidence and spur theoretical innovation to improve or discard the model. However, as a theoretical commitment, replication does imply precommitment to taking all outcomes seriously. Because replication is defined based on theoretical expectations, not everyone will agree that one study is a replication of another.

Moreover, it is not always possible to make precommitments to the diagnosticity of a study as a replication, often for the simple reason that study outcomes are already known. This can unproductively retard research progress by dismissing replication counterevidence. Simultaneously, replications can fail to meet their intended diagnostic aims because of error or malfunction in the procedure that is only identifiable after the fact. When there is uncertainty about the status of claims and the quality of methods, there is no easy solution to distinguishing between motivated and principled reasoning about evidence.

At its best, science minimizes the impact of ideological commitments and reasoning biases by being an open, social enterprise. To achieve that, researchers should be rewarded for articulating their theories clearly and a priori so that they can be productively confronted with evidence [ 4 , 6 ].

Better theories are those that make it clear how they can be supported and challenged by replication. Repeated replication is often necessary to resolve confidence in a claim, and, invariably, researchers will have plenty to argue about even when replication and precommitment are normative practices.

Theory advances in fits and starts with conceptual leaps, unexpected observations, and a patchwork of evidence. That is okay; it is fuzzy at the frontiers of knowledge. The dialogue between theory and evidence facilitates identification of contours, constraints, and expectations about the phenomena under study.

Replicable evidence provides anchors for that iterative process. If evidence is replicable, then theory must eventually account for it, even if only to dismiss it as irrelevant because of invalidity of the methods.

For example, the claims that there are more obese people in wealthier countries compared with poorer countries on average and that people in wealthier countries live longer than people in poorer countries on average could both be highly replicable.

All theoretical perspectives about the relations between wealth, obesity, and longevity would have to account for those replicable claims.

There is no such thing as exact replication. We cannot reproduce an earthquake, era, or election, but replication is not about repeating historical events. Replication is about identifying the conditions sufficient for assessing prior claims. Replication can occur in observational research when the conditions presumed essential for observing the evidence recur, such as when a new seismic event has the characteristics deemed necessary and sufficient to observe an outcome predicted by a prior theory or when a new method for reassessing a fossil offers an independent test of existing claims about that fossil.

Even in experimental research, original and replication studies inevitably differ in some aspects of the sample—or units—from which data are collected, the treatments that are administered, the outcomes that are measured, and the settings in which the studies are conducted [ 11 ]. Individual studies do not provide comprehensive or definitive evidence about all conditions for observing evidence about claims.

The gaps are filled with theory. A single study examines only a subset of units, treatments, outcomes, and settings. The study was conducted in a particular climate, at particular times of day, at a particular point in history, with a particular measurement method, using particular assessments, with a particular sample. Rarely do researchers limit their inference to precisely those conditions. If they did, scientific claims would be historical claims because those precise conditions will never recur.

If a claim is thought to reveal a regularity about the world, then it is inevitably generalizing to situations that have not yet been observed. The fundamental question is: of the innumerable variations in units, treatments, outcomes, and settings, which ones matter? Time-of-day for data collection may be expected to be irrelevant for a claim about personality and parenting or critical for a claim about circadian rhythms and inhibition.

When theories are too immature to make clear predictions, repetition of original procedures becomes very useful. Using the same procedures is an interim solution for not having clear theoretical specification of what is needed to produce evidence about a claim.

Replication is not about the procedures per se, but using similar procedures reduces uncertainty in the universe of possible units, treatments, outcomes, and settings that could be important for the claim. However, every generalizability test is not a replication. The generalizability space is large because of theoretical immaturity; there are many conditions in which the claim might be supported, but failures would not discredit the original claim.

The generalizability space has shrunk because some tests identified boundary conditions gray tests , and the replicability space has increased because successful replications and generalizations colored tests have improved theoretical specification for when replicability is expected. For underspecified theories, there is a larger space for which the claim may or may not be supported—the theory does not provide clear expectations.

These are generalizability tests. Testing replicability is a subset of testing generalizability. As theory specification improves moving from left panel to right panel , usually interactively with repeated testing, the generalizability and replicability space converge. Failures-to-replicate or generalize shrink the space dotted circle shows original plausible space. Successful replications and generalizations expand the replicability space—i.

Successful replication provides evidence of generalizability across the conditions that inevitably differ from the original study; unsuccessful replication indicates that the reliability of the finding may be more constrained than recognized previously.

Repeatedly testing replicability and generalizability across units, treatments, outcomes, and settings facilitates improvement in theoretical specificity and future prediction.

This experience prompted the study and the report, Evolution of Translational Omics: Lessons Learned and the Way Forward Institute of Medicine, , which in turn led to new guidelines for omics research at the National Cancer Institute.

Around the same time, in a case that came to light in the Netherlands, social psychologist Diederick Stapel had gone from manipulating to fabricating data over the course of a career with dozens of fraudulent publications. Similarly, highly publicized concerns about misconduct by Cornell University professor Brian Wansink highlight how consistent failure to adhere to best practices for collecting, analyzing, and reporting data—intentional or not—can blur the line between helpful and unhelpful sources of non-replicability.

A subsequent report, Fostering Integrity in Research National Academies of Sciences, Engineering, and Medicine, , emerged in this context, and several of its central themes are relevant to questions posed in this report. According to the definition adopted by the U. The federal policy requires that research institutions report all. Other detrimental research practices see National Academies of Sciences, Engineering, and Medicine, include failing to follow sponsor requirements or disciplinary standards for retaining data, authorship misrepresentation other than plagiarism, refusing to share data or methods, and misleading statistical analysis that falls short of falsification.

In addition to the behaviors of individual researchers, detrimental research practices also include actions taken by organizations, such as failure on the part of research institutions to maintain adequate policies, procedures, or capacity to foster research integrity and assess research misconduct allegations, and abusive or irresponsible publication practices by journal editors and peer review.

Just as information on rates of non-reproducibility and non-replicability in research is limited, knowledge about research misconduct and detrimental research practices is scarce.

As discussed above, new analyses of retraction trends have shed some light on the frequency of occurrence of fraud and misconduct. Allegations and findings of misconduct increased from the mids to the mids but may have leveled off in the past few years. Analysis of retractions of scientific articles in journals may also shed some light on the problem Steen et al.

One analysis of biomedical articles found that misconduct was responsible for more than two-thirds of retractions Fang et al. As mentioned earlier, a wider analysis of all retractions of scientific papers found about one-half attributable to misconduct or fraud Brainard, Others have found some differences according to discipline Grieneisen and Zhang, One theme of Fostering Integrity in Research is that research misconduct and detrimental research practices are a continuum of behaviors National Academies of Sciences, Engineering, and Medicine, While current policies and institutions aimed at preventing and dealing with research misconduct are certainly necessary, detrimental research practices likely arise from some of the same causes and may cost the research enterprise more than misconduct does in terms of resources wasted on the fabricated or falsified work, resources wasted on following up this work, harm to public health due to treatments based on acceptance of incorrect clinical results, reputational harm to collaborators and institutions, and others.

No branch of science is immune to research misconduct, and the committee did not find any basis to differentiate the relative level of occurrence. Some but not all researcher misconduct has been uncovered through reproducibility and replication attempts, which are the self-correcting mechanisms of science. From the available evidence, documented cases of researcher misconduct are relatively rare, as suggested by a rate of retractions in scientific papers of approximately 4 in 10, Brainard, The overall extent of non-replicability is an inadequate indicator of the health of science.

One of the pathways by which the scientific community confirms the validity of a new scientific discovery is by repeating the research that produced it. When a scientific effort fails to independently confirm the computations or results of a previous study, some fear that it may be a symptom of a lack of rigor in science, while others argue that such an observed inconsistency can be an important precursor to new discovery.

Concerns about reproducibility and replicability have been expressed in both scientific and popular media. As these concerns came to light, Congress requested that the National Academies of Sciences, Engineering, and Medicine conduct a study to assess the extent of issues related to reproducibility and replicability and to offer recommendations for improving rigor and transparency in scientific research.

Reproducibility and Replicability in Science defines reproducibility and replicability and examines the factors that may lead to non-reproducibility and non-replicability in research.

Unlike the typical expectation of reproducibility between two computations, expectations about replicability are more nuanced, and in some cases a lack of replicability can aid the process of scientific discovery. This report provides recommendations to researchers, academic institutions, journals, and funders on steps they can take to improve reproducibility and replicability in science.

Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website. Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

Switch between the Original Pages , where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text. To search the entire text of this book, type in your search term here and press Enter. Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Do you enjoy reading reports from the Academies online for free? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released. Reproducibility and Replicability in Science Chapter: 5 Replicability. Get This Book. Visit NAP. Looking for other ways to read this? No thanks. Suggested Citation: "5 Replicability. Reproducibility and Replicability in Science. Page 72 Share Cite. Page 73 Share Cite. Acknowledging the different approaches to assessing replicability across scientific disciplines, however, we emphasize eight core characteristics and principles: Attempts at replication of previous results are conducted following the methods and using similar equipment and analyses as described in the original study or under sufficiently similar conditions Cova et al.

The concept of replication between two results is inseparable from uncertainty, as is also the case for reproducibility as discussed in Chapter 4. Any determination of replication between two results needs to take account of both proximity i. To assess replicability, one must first specify exactly what attribute of a previous result is of interest.

For example, is only the direction of a possible effect of interest? Is the magnitude of effect of interest? Is surpassing a specified threshold of magnitude of interest?

Depending on the selected criteria e. Page 74 Share Cite. For example, one study may yield a p -value of 0. However, if the second study had yielded a p -value of 0.

Rather than focus on an arbitrary threshold such as statistical significance, it would be more revealing to consider the distributions of observations and to examine how similar these distributions are.

This examination would include summary measures, such as proportions, means, standard deviations or uncertainties , and additional metrics tailored to the subject matter. Page 75 Share Cite. NOTES: The figure shows the issue with using statistical significance as an attribute of comparison Point 8 on 74 of the main text ; the two results would be considered to have replicated if using a proximity-uncertainty attribute Points 3 and 4 on 73 of the main text.

Page 76 Share Cite. Approaches to assessing non-replicability rates include direct and indirect assessments of replicability; perspectives of researchers who have studied replicability; surveys of researchers; and retraction trends.

This section discusses each of these lines of evidence. Assessments of Replicability The most direct method to assess replicability is to perform a study following the original methods of a previous study and to compare the new results to the original ones. Page 77 Share Cite. Based on the content of the collected studies in Table , one can observe that the majority of the studies are in the social and behavioral sciences including economics or in biomedical fields, and methods of assessing replicability are inconsistent and the replicability percentages depend strongly on the methods used.

Page 78 Share Cite. Direct Experimental Psychology Klein et al. Page 79 Share Cite. The mean effect sizes were halved.

Direct Experimental Psychology Patil et al. Direct Experimental Psychology Camerer et al. Direct Empirical Economics Dewald et al. Direct Economics Duvendack et al. Page 80 Share Cite. Direct Chemistry Park et al.

Indirect Biology Reproducibility Project: Cancer Biology Large-scale replication project to replicate key results in 29 cancer papers published in Nature , Science , Cell, and other high-impact journals The first five articles have been published; two replicated important parts of the original papers, one did not replicate, and two were uninterpretable.

Direct Psychology, Statistical Checks Nuijten et al. Indirect Engineering, Computational Fluid Dynamics Mesnard and Barba, Full replication studies of previously published results on bluff-body aerodynamics, using four different computational methods Replication of the main result was achieved in three out of four of the computational efforts. Page 81 Share Cite. Direct Psychology Luttrell et al.

They randomly assigned participants to a version closer to the original or to Ebersole et al. The original study replicated when the original procedures were followed more closely, but not when the Ebersole et al. Direct Psychology Wagenmakers et al.

Direct Psychology Noah et al. Conducted a replication in which participants were randomly assigned to be videotaped or not. Direct Psychology Alogna et al. Replicated the original study. The effect size was much larger when the original study was replicated more faithfully the first set of replications inadvertently introduced a change in the procedure.

Page 82 Share Cite. Perspectives of Researchers Who Have Studied Replicability Several experts who have studied replicability within and across fields of science and engineering provided their perspectives to the committee. Page 83 Share Cite. Retraction Trends Retractions of published articles may be related to their non-replicability.

Page 84 Share Cite. Page 85 Share Cite. Non-Replicability That Is Potentially Helpful to Science Non-replicability is a normal part of the scientific process and can be due to the intrinsic variation and complexity of nature, the scope of current scientific knowledge, and the limits of current technologies. Page 86 Share Cite. The susceptibility of any line of scientific inquiry to sources of non-replicability depends on many factors, including factors inherent to the system under study, such as the complexity of the system under study; understanding of the number and relations among variables within the system under study; ability to control the variables; levels of noise within the system or signal to noise ratios ; mismatch of scale of the phenomena and the scale at which it can be measured; stability across time and space of the underlying principles; fidelity of the available measures to the underlying system under study e.

Page 87 Share Cite. Page 88 Share Cite. Page 89 Share Cite. FIGURE Controllability and complexity: Spectrum of studies with varying degrees of the combination of controllability and complexity. NOTE: See text for examples from the fields of engineering, physics, and psychology that illustrate various combinations of complexity and controllability that affect susceptibility to non-replication.

Page 90 Share Cite. Unhelpful Sources of Non-Replicability Non-replicability can also be the result of human error or poor researcher choices. Page 91 Share Cite. We consider here a selected set of such avoidable sources of non-replication: publication bias misaligned incentives inappropriate statistical inference poor study design errors incomplete reporting of a study We will discuss each source in turn. Publication Bias Both researchers and journals want to publish new, innovative, ground-breaking research.

Page 92 Share Cite. Page 93 Share Cite. FIGURE Funnel charts showing the estimated coefficient and standard error a if all hypothetical study experiments are reported and b if only statistically significant results are reported. Page 94 Share Cite. Page 95 Share Cite. Misaligned Incentives Academic incentives—such as tenure, grant money, and status—may influence scientists to compromise on good research practices Freeman, Page 96 Share Cite.

Inappropriate Statistical Inference Confirmatory research is research that starts with a well-defined research question and a priori hypotheses before collecting data; confirmatory research can also be called hypothesis testing research. Page 97 Share Cite. Page 98 Share Cite. Poor Study Design Before conducting an experiment, a researcher must make a number of decisions about study design. Page 99 Share Cite. Incomplete Reporting of a Study During the course of research, researchers make numerous choices about their studies.

Page Share Cite. Fraud and Misconduct At the extreme, sources of non-replicability that do not advance scientific knowledge—and do much to harm science—include misconduct and fraud in scientific research. Page 71 Share Cite. Login or Register to save! Stay Connected! A group of 20 research teams performed replication studies of 40 experimental philosophy studies published between and Performed replications of 78 previously published associations between the Big Five personality traits and consequential life outcomes b.

Multiple laboratories 23 in total conducted replications of a standardized ego-depletion protocol based on a sequential-task paradigm by Sripada et al. Attempt by researchers from Bayer HealthCare to validate data on potential drug targets obtained in 67 projects by copying models exactly or by adapting them to internal needs.

Oncology, Preclinical Studies Begley and Ellis, Other researchers might want to replicate the same study with younger smokers to see if they reach the same result. When studies are replicated and achieve the same or similar results as the original study, it gives greater validity to the findings.

When conducting a study or experiment , it is essential to have clearly defined operational definitions. In other words, what is the study attempting to measure? When replicating earlier researchers, experimenters will follow the same procedures but with a different group of participants. So what happens if the original results cannot be reproduced?

Does that mean that the experimenters conducted bad research or that, even worse, they lied or fabricated their data? In many cases, non-replicated research is caused by differences in the participants or in other extraneous variables that might influence the results of an experiment. For example, minor differences in things like the way questions are presented, the weather, or even the time of day the study is conducted might have an unexpected impact on the results of an experiment.

Researchers might strive to perfectly reproduce the original study, but variations are expected and often impossible to avoid. In , a group of researchers published the results of their five-year effort to replicate different experimental studies previously published in three top psychology journals. The results were less than stellar. As one might expect, these dismal findings caused quite a stir. So why are psychology results so difficult to replicate?

Writing for The Guardian , John Ioannidis suggested that there are a number of reasons why this might happen, including competition for research funds and the powerful pressure to obtain significant results. There is little incentive to retest, so many results obtained purely by chance are simply accepted without further research or scrutiny.

The project authors suggest that there are three potential reasons why the original findings could not be replicated. The Nobel Prize-winning psychologist Daniel Kahneman has suggested that because published studies are often too vague in describing methods used, replications should involve the authors of the original studies in order to more carefully mirror the methods and procedures used in the original research.

While some might be tempted to look at the results of such replication projects and assume that psychology is rubbish, many suggest that such findings actually help make psychology a stronger science. Human thought and behavior is a remarkably subtle and ever-changing subject to study, so variations are to be expected when observing diverse populations and participants. Some research findings might be wrong, but digging deeper, pointing out the flaws, and designing better experiments helps strengthen the field.

Ever wonder what your personality type means?



0コメント

  • 1000 / 1000