The Three Criteria for the evaluation of Social Research

According to Alan Bryman, three of the most prominent criteria for the evaluation of social research are reliability, replication, and validity.


Reliability is concerned with the question of whether the results of a study are repeatable. The term is commonly used in relation to the question of whether the measures that are devised for concepts in the social sciences (such as poverty, racial prejudice, deskilling, and religious orthodoxy) are consistent, in particular the different ways in which they can be conceptualized. Reliability is particularly at issue in connection with quantitative research. The quantitative researcher is likely to be concerned with the question of whether a measure is stable or not. After all, if we found that IQ tests, which were designed as measures of intelligence, were found to fluctuate, so that people’s IQ scores were often wildly different when administered on two or more occasions, we would be concerned about it as a measure. We would consider it an unreliable measure—we could not have faith in its consistency.


The idea of reliability is very close to another criterion of research—replication and, more especially, replicability. It sometimes happens that researchers choose to replicate the findings of others. There may be a host of different reasons for doing so, such as a feeling that the original results do not match other evidence that is relevant to the domain in question. In order for replication to take place, a study must be capable of replication—it must be replicable.

This is a very obvious point: if a researcher does not spell out his or her procedures in great detail, replication is impossible. Similarly, in order for us to assess the reliability of a measure of a concept, the procedures that constitute that measure must be replicable by someone else. Ironically, replication in social research is not common. In fact, it is probably truer to say that it is quite rare. When Burawoy (1979) found that, by accident, he was conducting case study research in a US factory that had been studied three decades earlier by another researcher (Donald Roy), he thought about treating his own investigation as a replication. However, the low status of replication in academic life persuaded him to resist this option. He writes: “I knew that replicating Roy’s study would not earn me a dissertation, let alone a job. In academia, the real reward comes not from replication but from originality! ” (Burawoy 2003: 650). Nonetheless, an investigation’s capacity to be replicated—replicability—is highly valued by many social researchers working within a quantitative research tradition.


A further and, in many ways, the most important criterion of research is validity. Validity concerns the integrity of the conclusions that are generated from a piece of research. As we shall do for reliability, we will be examining the idea of validity in greater detail in later chapters, but in the meantime, it is important to be aware of the main types of validity that are typically distinguished:

  • Measuring validity. Measurement validity applies primarily to quantitative research and to the search for measures of social scientific concepts. Measurement validity is also often referred to as construct validity. Essentially, it has to do with the question of whether a measure that is devised for a concept really does reflect the concept that it is supposed to be denoting. Does the IQ test really measure variations in intelligence? Three concepts needed to be measured in order to test the hypotheses: national religiosity, religious orthodoxy, and family religious orientation. The question then is: do the measures really represent the concepts they are supposed to be tapping into? If they do not, the study’s findings will be questionable. It should be appreciated that measurement validity is related to reliability; if a measure of a concept is unstable in that it fluctuates, and hence is unreliable, it simply cannot provide a valid measure of the concept in question. In other words, the assessment of measurement validity presupposes that a measure is reliable. If a measure is unreliable because it does not give a stable reading of the underlying concept, it cannot be valid because a valid measure reflects the concept it is supposed to be measuring.
  • internal validity. Internal validity relates mainly to the issue of causality. Internal validity concerns the question of whether a conclusion that incorporates a causal relationship between two or more variables holds water. If we suggest that x causes y, can we be sure that it is x that is responsible for the variation in y and not something else that is producing an apparent causal relationship? The authors were quoted as concluding that “the religious environment of a nation has a major impact on the beliefs of its citizens” (Kelley and De Graaf 1997: 654). Internal validity raises the question: can we be sure that national religiosity really does cause variation in religious orientation and that this apparent causal relationship is genuine and not produced by something else? In discussing issues of causality, it is common to refer to the factor that has a causal impact as the independent variable and the effect as the dependent variable. In the case of Kelley and De Graaf’s research, the “religious environment of a nation” was an independent variable, and “religious belief” was the dependent variable. Thus, internal validity raises the question: how confident can we be that the independent variable really is at least in part responsible for the variation that has been identified in the dependent variable?
  • External validity. External validity concerns the question of whether the results of a study can be generalized beyond the specific research context. In the research by Poortinga et al. (2004), data was collected from 229 respondents in Bude and 244 respondents in Norwich. Can their findings about the attitudes toward the handling of the outbreak be generalized beyond these respondents? In other words, if the research was not externally valid, it would not apply to the 473 respondents alone. If it were externally valid, we would expect it to apply more generally to the populations of these two towns at the time of the outbreak of the disease. It is in this context that the issue of how people are selected to participate in research becomes crucial. This is one of the main reasons why quantitative researchers are so keen on generating representative samples.
  • Ecological validity. Ecological validity is concerned with the question of whether social scientific findings are applicable to people’s every day, natural social settings. As Cicourel (1982: 15) has put it: “Do our instruments capture the daily life conditions, opinions, values, attitudes, and knowledge base of those we study as expressed in their natural habitat?” This criterion is concerned with the question of whether social research sometimes produces findings that may be technically valid but have little to do with what happens in people’s everyday lives. If research findings are ecologically invalid, they are, in a sense, artefacts of the social scientist’s arsenal of data collection and analytic tools. The more a social scientist intervenes in natural settings or creates unnatural ones, such as a laboratory or even a special room to carry out interviews, the more likely it is that findings will be ecologically invalid. The findings deriving from a study using questionnaires may have measurement validity and a reasonable level of internal validity, and they may be externally valid in the sense that they can be generalized to other samples confronted by the same questionnaire, but the unnaturalness of the fact of having to answer a questionnaire may mean that the findings have limited ecological validity.

Relationship with research strategy

One feature that is striking about most of the discussion so far is that it seems to be geared mainly to quantitative rather than qualitative research. Both reliability and measurement validity are essentially concerned with the adequacy of measures, which is most obviously a concern in quantitative research. Internal validity concerns the soundness of findings that specify a causal connection, an issue that is most commonly of concern to quantitative researchers. External validity may be relevant to qualitative research, but the whole question of representativeness of research subjects with which the issue is concerned has a more obvious application to the realm of quantitative research, with its preoccupation with sampling procedures that maximize the opportunity for generating a representative sample.

The issue of ecological validity relates to the naturalness of the research approach and seems to have considerable relevance to both qualitative and quantitative research. Some writers have sought to apply the concepts of reliability and validity to the practice of qualitative research (e.g., LeCompte and Goetz 1982; Kirk and Miller 1986; Peräkylä 1997), but others argue that the grounding of these ideas in quantitative research renders them inapplicable to or inappropriate for qualitative research.

Writers like Kirk and Miller (1986) have applied concepts of validity and reliability to qualitative research but have changed the sense in which the terms are used very slightly. Some qualitative researchers sometimes propose that the studies they produce should be judged or evaluated according to different criteria from those used in relation to quantitative research. Lincoln and Guba (1985) propose that alternative terms and ways of assessing qualitative research are required. For example, they propose trustworthiness as a criterion of how good a qualitative study is. Each aspect of trustworthiness has a parallel with the quantitative research criteria.

  • Credibility, which parallels internal validity—that is, how believable are the findings?
  • Transferability, which parallels external validity—that is, do the findings apply to other contexts?
  • Dependability, which parallels reliability—that is, are the findings likely to apply at other times?
  • Has the investigator, who parallels objectivity, allowed his or her values to intrude to a high degree?

Hammersley (1992a) occupies a kind of middle position here, in that, while he proposes validity as an important criterion (in the sense that an empirical account must be plausible and credible and should take into account the amount and kind of evidence used in relation to an account), he also proposes relevance as a criterion. Relevance is taken to be assessed from the vantage point of the importance of a topic within its substantive field or the contribution it makes to the literature on that field. The issues raised by these different views have to do with the different objectives that many qualitative researchers argue are distinctive to their craft. However, it should also be borne in mind that one of the criteria previously cited—ecological validity—may have been formulated largely in the context of quantitative research, but is in fact a feature in relation to which qualitative research fares rather well. Qualitative research often involves a naturalistic stance.

This means that the researcher seeks to collect data in naturally occurring situations and environments as opposed to fabricated, artificial ones. This characteristic probably applies particularly well to ethnographic research, in which participant observation is a prominent element of data collection, but it is sometimes suggested that it also applies to the sort of interview approach typically used by qualitative researchers, which is less directive than the kind used in quantitative research.

We might expect that much qualitative research is stronger than quantitative investigations in terms of ecological validity. By and large, these issues in social research have been presented because some of them will emerge in the context of the discussion of research designs in the next section, but in a number of ways, they also represent background considerations for some of the issues to be examined. They will be returned to later in the book.

