What is the difference between anecdotal evidence and the results of an experiment




















There were two versions of each article: one with an anecdote e. The presentation order and the article version were randomized, such that each participant read one article with an anecdote and the other article without an anecdote. After reading each article, participants completed a comprehension check by responding to one multiple choice question about the article. Based on this study, how likely is it that you would incorporate physical exercise into your lessons?

We also asked participants to explain their reasoning using an open-ended response for the evidence rating e. At the end of the survey, participants provided basic background information, including gender, age and the highest level of statistics class taken. The media articles used in this study were fictional and described research studies on the effectiveness of two educational interventions—learning while exercising and learning in a tidy classroom see "Appendix 1 " for example articles.

The exercise intervention article was adopted from a previous study showing that exercising while studying improves second language learning compared to not exercising Liu et al. The articles started with a brief introduction or an anecdotal story related to the study, followed by a brief description of the research study.

We deliberately planted experimental flaws in the scientific methods i. Specifically, in the description of the exercise study, participants were assigned to exercise or control groups according to their preferences non-random assignment and performance was measured via self-report instead of quantitatively invalid measure. In the description of the tidy classroom study, the participant groups were unequal, such that half of the participants came from math class and half came from English class before taking a math exam participant confound , and participants were primed to believe that being in a messy room might hurt their performance before taking the exam priming confound.

In the anecdote versions of the articles, the anecdote consisted of a single story that favored the new teaching intervention.

For the exercise intervention article, the story featured two Chinese boys learning English as a second language, with one boy who exercised while studying outperforming the other boy, who did not exercise while studying, on an English vocabulary test.

For the tidy classroom intervention article, the story was about a boy whose messy desk negatively impacted his mood and interfered with his ability to do math homework. The no-anecdote versions of the articles included descriptive text related to the topic of each intervention that was similar in length to the anecdotal stories. We hypothesized that participants would give higher ratings for the article that included an anecdote in terms of evidence strength, persuasiveness and likelihood of implementing the learning intervention.

Additionally, we predicted that participants would be less likely to mention methodological flaws in their open-ended responses about the article that included an anecdote.

Error bars represent within-subjects standard error of the mean Cousineau, We next asked to what extent the evidence strength and likelihood of implementing ratings were correlated for each intervention Fig. For example, for evidence ratings of 3, most participants gave incorporate likelihood ratings of 4 or 5 for the tidy classroom intervention, whereas most participants gave incorporate likelihood ratings of 4 for the exercise intervention.

Together, these data suggest that flawed scientific evidence was underweighed in decisions to implement the tidy classroom intervention. Correlations between ratings of evidence strength x-axis and likelihood of implementing the intervention y-axis for the exercise blue and tidy classroom orange interventions.

The width of the bubbles represents the number of participants for each data point. Two raters including one author, Y.

Specifically, we were interested in measuring how frequently participants mentioned certain aspects of the study e. Because so few participants noticed the other specific flaws in the articles, we did not statistically analyze the effect of anecdotes on those responses.

Together, these findings suggest that participants were more likely to notice methodological flaws in the tidy classroom than the exercise intervention article, perhaps because the flaws were more obvious. To test whether intervention effects on evidence strength and persuasiveness ratings were indirectly impacted by the salience of the flaws in the studies, we conducted mediation analyses in which intervention type was the independent variable, mentions of study flaws was the mediator, and evidence strength or persuasiveness rating was the dependent variable.

This analysis revealed that the effects of intervention type on both evidence strength and persuasiveness ratings were significantly mediated by study flaw detection Fig. Mediation analysis showing direct and indirect effects via detection rates of study flaws of intervention type on evidence strength ratings a and on persuasiveness ratings b. For explanations about decisions to implement the intervention e. Additionally, if participants mentioned their personal experience or prior beliefs e.

Thus, although participants were more likely to notice methodological flaws in the description of the tidy classroom intervention study, at the same time, participants were more likely to cite their own personal experiences, opinions and beliefs as reasons for adopting the tidy classroom than the exercise intervention.

We thus speculated that the two interventions may have differed in their baseline plausibility. Participants responded to three questions about each intervention presentation order was randomized using a Likert scale: 1 how much experience they themselves have had with the intervention e.

Thus, baseline differences in personal experience with and plausibility of the interventions could explain why participants in Experiment 1 tended to implement the tidy classroom intervention, despite acknowledging that the evidence in favor of the technique was weak. Our findings are consistent with previous work showing that personal experiences and prior beliefs are weighted heavily in decisions Garcia-Retamero et al.

In particular, people struggle to update their beliefs when presented with new compelling evidence that conflicts with their initial belief Ecker et al. In fact, the opposite may occur; people may cling to their initial beliefs more strongly when faced with conflicting evidence, perhaps by discounting the data Lord et al.

Here we find that even when people do not discount the evidence i. Contrary to our prediction, we did not find that the presence of anecdotes affected either evaluations of or decisions based on low-quality research in Experiment 1. The lack of an anecdote effect was surprising, given that a substantial amount of research reviewed above has established the relative importance of anecdotes in both evidence evaluation and decision-making.

In particular, we were unable to replicate previous findings that the presence of anecdotes reduced evidence evaluation and scientific reasoning Rodriguez, Rhodes, et al. However, further inspection revealed that there were substantial differences between the current study and the Rodriguez, Rhodes, et al.

First, whereas we included anecdote presence as a within-subjects factor, anecdote presence was completely between-subjects in the Rodriguez, Rhodes, et al. Additionally, whereas participants only rated a single article in the presence of an anecdote in the current study, participants in the Rodriguez, Rhodes, et al. Thus, it is possible that anecdotes may only influence evaluations of evidence quality under certain testing conditions.

However, because we collected plausibility ratings about the interventions from a separate sample, we were unable to conduct a mediation analysis for data in Experiment 1. Thus, in Experiment 2, we conducted an experiment in which we first asked participants about their beliefs about the plausibility of these interventions before having them evaluate the flawed studies about the interventions. Second, because we did not anticipate that incorporate likelihood ratings would differ between the two learning interventions, we did not specifically test a range of interventions that systematically varied in their plausibility.

It is thus unclear whether our results are generalizable to other contexts beyond the tidy classroom and exercise interventions. In Experiment 2, we additionally tested whether participants would be more likely to incorporate other learning interventions considered to be highly plausible compared to less plausible interventions.

Thus, we wanted to replicate our findings with a larger sample size to achieve a sufficient level of power. To address these limitations, we conducted a second experiment with two parts.

In an initial pretest, our goal was to find two additional interventions that varied strongly in their baseline plausibility to extend our findings from Experiment 1 to other contexts. In Experiment 1, we found that the exercise intervention was perceived as implausible for two reasons: participants did not believe it would be effective for improving learning, and they thought it was impractical e.

Thus, in the pretest for Experiment 2 we tested three possible learning interventions that we thought might be perceived as both ineffective and impractical: napping at school, singing learned material, and doodling while learning.

For each intervention, participants used a Likert scale to respond 1 how effective they thought the intervention would be compared to a control condition e. Participants were asked about each of the four learning interventions in random order. Based on previous testing, we hypothesized that participants would rate the virtual reality intervention as highly effective for learning. We first conducted a one-way repeated measures ANOVA on intervention effectiveness ratings, with intervention type as a within-subjects factor.

As predicted, the virtual reality intervention had the highest average effectiveness rating see Table 3 for a summary of descriptive statistics for all intervention and question types. The virtual reality and doodling interventions had the highest average practicality ratings 3.

However, since we were mainly interested in differences in prior beliefs about the plausibility of these interventions, we only considered the effectiveness and practicality effects from this experiment. Because the differences in effectiveness and practicality were largest between the virtual reality and napping interventions, we chose to use these interventions in our replication of Experiment 1.

The virtual reality intervention was rated as the more effective and more practical intervention; thus, we chose to use it as a second example of a high plausible learning intervention, similar to the tidy classroom intervention from Experiment 1. Because the napping intervention was rated as significantly less effective and less practical than the virtual reality intervention, we chose to include it as a second example of a low plausible learning intervention, similar to the exercise intervention from Experiment 1.

The goals of Experiment 2 goals were threefold: first, we wanted to test the hypothesis that the effect of intervention type on incorporate likelihood ratings from Experiment 1 was mediated by prior beliefs about the plausibility of the interventions. Because we did not collect intervention plausibility ratings and article evaluation ratings from the same sample in Experiment 1, in Experiment 2, we ran an experiment in which the same participants rated both their prior beliefs about the interventions and their evaluations and decisions about the articles.

Second, we wanted to test whether our findings from Experiment 1 would extend to other learning interventions beyond the tidy classroom and exercise interventions. Finally, we wanted to replicate our findings from Experiment 1 with a larger sample size, given that the intervention effect on incorporate likelihood ratings was underpowered in Experiment 1. Using the pwr package v. Participants completed an online survey using Qualtrics.

The question and response formats were identical to those used in the pretest to Experiment 2, and the interventions were presented in random order. Next, participants read four fictitious articles again presented in random order that were similar in format to the non-anecdote articles used in Experiment 1. Each article featured one of the classroom interventions asked about in the prior belief pretest described above see "Appendix 2 " for examples of the virtual reality and napping articles.

Participants responded to the same questions asked in Experiment 1: they rated each article in terms of its evidence strength, persuasiveness, and the likelihood that they would implement the intervention in a hypothetical classroom, and they explained their reasoning for evidence strength and incorporate likelihood ratings in an open-ended way.

The tidy classroom and exercise intervention articles were identical to the non-anecdote versions of the articles from Experiment 1. The two major methodological flaws we planted were that participants were assigned to groups based on skill level non-random assignment and there were an uneven number of participants in each group i.

For the napping article, the description of the study was based off of a study done by Cabral et al. We hypothesized that participants would be more likely to incorporate the high plausible interventions than the low plausible interventions; furthermore, we expected to observe a dissociation between evidence strength ratings and decisions to incorporate the interventions, such that incorporate likelihood ratings would be higher than evidence strength ratings for the high plausible interventions replicating findings from Experiment 1.

Additionally, we hypothesized that prior beliefs about effectiveness and practicality would mediate any effects of intervention plausibility on incorporate likelihood ratings. We thus replicated the effect of intervention type on incorporate likelihood ratings from Experiment 1. In contrast to Experiment 1, however, we found that evidence strength ratings were higher for the high plausible than low plausible interventions in Experiment 2.

We address possible explanations for this discrepancy later on in our analysis of open-ended explanations of evidence strength ratings see Table 5 for a summary of descriptive statistics of open-ended responses. As shown in Fig. Mediation analysis showing direct and indirect effects of intervention type as mediated by prior beliefs about intervention effectiveness a and intervention practicality b on incorporate likelihood ratings.

We next tested relationships between evidence strength and incorporate likelihood ratings as a function of intervention plausibility.

Thus, similar to Experiment 1, participants were more likely to incorporate high plausible learning interventions given their evidence strength ratings. In contrast to Experiment 1, we found that participants were less likely to incorporate low plausible learning interventions given their evidence strength ratings.

Means of ratings for evidence strength and incorporate likelihood as a function of intervention plausibility for Experiment 2. We again confirmed that correlations between evidence strength and incorporate likelihood ratings differed as a function of intervention plausibility.

However, as in Experiment 1, for a given evidence strength rating, incorporate likelihood ratings were higher for the high plausible than low plausible interventions, as indicated by the higher overall trendline for the high plausible interventions.

Together, these findings suggest that participants underweighed evidence strength as a factor in their decisions to implement learning interventions; rather, the plausibility of the learning interventions was the stronger predictor. Size of bubbles reflects the number of participants for each data point.

On average, the two raters achieved We first analyzed the number and types of methodological flaws that participants noticed for each intervention. Although we did not specifically manipulate the sample size as a methodological flaw, we nevertheless noticed that many participants cited sample size either as being too low or sufficiently high as part of their explanations for their evidence strength ratings; thus, we also analyzed the number of participants who mentioned sample size in their explanations.

We first analyzed evidence strength explanations for the low plausible interventions exercise and napping. We next analyzed mentions of methodological flaws for the high plausible interventions tidy classroom and virtual reality. Thus, evidence strength ratings were higher for high plausible than low plausible interventions despite the fact that there were no differences in flaw detection rates for the two types of interventions. This is in contrast to our findings in Experiment 1, in which participants were more likely to notice flaws in the more plausible intervention tidy classroom , and flaw detection rates significantly mediated the effect of intervention type on evidence strength ratings.

Here, we examined how people simultaneously weigh poor quality evidence i. In Experiment 1, we tested whether the presence of an anecdote would inflate the perceived quality of evidence and increase the likelihood that participants would act on flawed studies about two learning interventions: taking an exam in a tidy classroom and exercising while learning. At the same time, we found that participants were more likely to reference their personal experiences and beliefs when explaining their decision to implement the tidy classroom.

Consistent with these findings, in a separate sample of participants, we found that people had stronger prior beliefs about the tidy classroom intervention, which participants rated as both more plausible and more likely to reflect their own personal experience than the exercise intervention.

Based on pretesting of prior beliefs about various learning interventions, we chose an additional low plausible intervention napping to improve learning at school and an additional high plausible intervention using virtual reality to learn science. Consistent with our findings in Experiment 1, participants in Experiment 2 were more likely to implement the high plausible tidy classroom and virtual reality than low plausible interventions exercise and napping.

Additionally, decisions to implement the interventions were significantly mediated by prior beliefs about both the effectiveness and practicality of the interventions. Importantly, we again found that perceptions of evidence quality were dissociated from decisions, such that implementation likelihood ratings were greater than evidence strength ratings for the high plausible interventions but lower than evidence strength ratings for the low plausible interventions.

We again confirmed that participants were more likely to mention prior beliefs and personal experience as the basis for their decision to implement high plausible interventions than low plausible interventions; additionally, participants were more likely to reference prior beliefs than the study itself as the basis for their decision, but only for the high plausible interventions.

One important difference between the two experiments was that while evidence strength ratings were lower for the more plausible intervention in Experiment 1, evidence strength ratings were higher for the more plausible interventions in Experiment 2. Additionally, whereas participants were more likely to identify specific methodological flaws in the more plausible study in Experiment 1, there were no differences in flaw detection rates for high versus low plausible interventions in Experiment 2.

Thus, it is unclear why participants rated the high plausible interventions as having greater evidence quality in Experiment 2. One possibility is that they did not evaluate the evidence as critically because of their prior beliefs e. Additionally, the participant samples differed between the two experiments, with an undergraduate sample in Experiment 1 and Prolific participants in Experiment 2; thus, there may be baseline differences in propensities to critically evaluate scientific evidence between these two samples.

However, the fact that we still observed a dissociation between perceived evidence quality and decisions to implement the learning interventions for both low and high plausible interventions in Experiment 2 suggests that evidence quality was again underweighed as a decision factor. The choice of an educational context for the present studies was deliberate. As Halpern argues, much of the fault lies in science communication.

A better understanding of how best to communicate the science of education, such that stakeholders will consider and also critically evaluate science-based recommendations, is crucial. However, increasing critical evaluation of evidence alone may not be sufficient, as our studies suggest that making decisions about highly plausible learning interventions overrides low-quality evidence as a factor in hypothetical implementation decisions.

Here, we presented participants with conclusions that were not supported by the evidence. For example, many educators continue to incorporate the learning styles theory into their pedagogy, despite the consistent lack of evidence that teaching students in their preferred learning style improves learning e.

However, further research is also necessary to address the issue highlighted by Halpern and Seidenberg —how to convince stakeholders to rely on high quality evidence in the face of personal beliefs supporting a view not consistent with the science. People might recognize flaws in a study and nonetheless choose to implement the recommendations based on the study—particularly if those recommendations are consistent with their prior beliefs. People may also struggle to critically evaluate education studies in particular because of strong prior beliefs about the effectiveness of certain learning interventions.

Consistent with our own findings, the findings of Newton and Miah imply that, for many educators, there is a disconnect between their belief about the scientific support for a learning theory and their practice. Educators may persist in using the learning styles theory despite their awareness of the strong body of evidence against the effectiveness of learning style interventions, possibly due to positive personal experiences with implementing the learning styles theory.

The present results are limited by use of only four exemplar scenarios in a single context—educational achievement. Future research should systematically consider the conditions under which flawed evidence is nonetheless considered to support implementation decisions in different domains.

Our focus in the present studies was to examine how the plausibility of an intervention influenced evaluation of flawed evidence and, ultimately, implementation likelihood. However, we did not explicitly test how other baseline conditions affect implementation likelihood, such as high quality science evidence in the context of low or high plausibility , or no evidence i.

Including the full set of possible conditions under a variety of controlled contexts is necessary for a more complete understanding of how scientific evidence and prior beliefs influence decision-making. Another limitation is that participants in Experiment 2 may have been biased by our initial questions asking about their beliefs about the plausibility and practicality of the learning interventions. Further work is necessary to test the extent to which prior belief assessments affect later critical analysis of evidence as well as evidence-based decisions.

A final limitation is that the implementation judgments used in our studies were hypothetical and perhaps not relevant to the participants in our study. To what extent might implementation decisions be influenced by anecdotes and prior beliefs when making actual decisions or at least hypothetical decisions that might be more relevant to the participants?

It is possible that individuals with more domain knowledge are generally more critical of evidence regardless of the presence of anecdotes; for example, teachers might be more likely to consider the possibility that coming from a math class to take a math test could present a confound, and they could weigh the flaws more heavily in their implementation judgments.

On the other hand, given the findings of Newton and Miah and our own findings, teachers might persist in implementing an intervention even if they acknowledge that it is backed by flawed science, particularly if the intervention jibes with their own personal experience or the experiences of other instructors.

In conclusion, our studies show that decisions to implement interventions backed by flawed scientific evidence are strongly influenced by prior beliefs about the intervention, particularly in regards to personal experience and plausibility. Moreover, identifying the flawed evidence behind the interventions was not enough to dissuade participants from implementing the interventions. American Association of Poison Control Centers.

Track emerging hazards. Beck, D. The appeal of the brain in the popular press. Perspectives on Psychological Science, 5 6 , — Article PubMed Google Scholar. Blackman, H. Center for Research Use in Education. Borgida, E. The differential impact of abstract vs. Journal of Applied Social Psychology, 7 3 , — If there are flaws in the way that empirical data is collected, the research will not be considered valid.

The scientific method often involves lab experiments that are repeated over and over, and these experiments result in quantitative data in the form of numbers and statistics. However, that is not the only process used for gathering information to support or refute a theory.

There are two research methods used to gather empirical measurements and data: qualitative and quantitative. Qualitative research, often used in the social sciences, examines the reasons behind human behavior, according to Oklahoma State University. It involves data that can be found using the human senses.

This type of research is often done in the beginning of an experiment. Quantitative research involves methods that are used to collect numerical data and analyze it using statistical methods, according to the IT University of Copenhagen. Quantitative numerical data can be any data that uses measurements, including mass, size or volume, according to Midwestern State University, in Wichita Falls, Texas.

This type of research is often used at the end of an experiment to refine and test the previous research. Identifying empirical evidence in another researcher's experiments can sometimes be difficult. According to the Pennsylvania State University Libraries, there are some things one can look for when determining if evidence is empirical:. The objective of science is that all empirical data that has been gathered through observation, experience and experimentation is without bias.

Anecdotal evidence proves based on personal experience and isolated examples whether this or that development or story is true or false. Since personal experience is the key basis for anecdotal evidence, unlike scientific evidence, it cannot be verified independently.

When people strongly believe that their opinions are true, they refer only to that information which further confirms their beliefs. We come across anecdotal evidence in our everyday life and any person we meet can be its source: a family member, neighbor, local shop cashier, hairdresser, taxi driver, etc.



0コメント

  • 1000 / 1000