# Reliability and Validity

Reliability and validity are presented together, here, because they are related, and are often
confused with one another.

In public health, sometimes we are interested in the actual number of health events, but more often we use observed measures such as birth or death rates to indicate the underlying risk of illness or disability in a population. But the observed measures of risk fluctuate even when the true underlying risk of disease does not. The reasons for the variability usually include one or more of the following factors: 1) the health event is relatively rare, 2) the population size is relatively small, and 3) the health events do not occur at regular time intervals.

Even for complete count datasets, such as birth and death certificate datasets, random fluctuations over time will yield estimates that are not reliable. Consider the case of low birth weight in a small community. In this community one low birth weight infant is born each month, on average. But health events such as low birth weight do not occur at regular time intervals - there is randomness in the timing of low birth weight occurrence. In our small community, if three mothers give birth to low birth weight infants in December of Year 1, and none do in January or February of Year 2, it may appear as though the risk of low birth weight births has declined from Year 1 to Year 2. Actually the true underlying risk did not change, the rates were merely subject to randomness in the timing of the low birthweight births.

The terms "reliability," "precision," and "stability" are used to refer to the amount of random error that is likely to be included in an observed measure. Fortunately, we can use statistical techniques to assess the stability of a given rate. The confidence interval is a common statistical measure that conveys the reliability of an estimate. It may be thought of as the range of probable true values for a statistic. A wide confidence interval (wide in relation to the rate) indicates that the rate is likely to include a lot of random error.

Another, related, measure is called "Relative Standard Error," (RSE). RSE is the ratio of the standard error to the point estimate (e.g., rate, average), and is commonly expressed as a percentage.

Measures of statistical stability are related to one another. The confidence interval is based on the standard error, and the standard error is based on the variance of the statistic and the population or sample size.

In public health, we are lucky because the validity of most of our measures is really quite good. "Cause of death" on death certificates is certified by a physician. Survey measures have been tested to maximize validity. Birthweight is measured and reported at the birth hospital. There are some measures that we question, for instance self-reported body weight, but on the whole, the measures we use have a high degree of validity.

Please feel free to contact us
if you have questions, or suggestions for additions or improvements to this web page or the MT-IBIS website.

Contents |

1 Reliability a. Concept b. Relative Standard Error (RSE) 2 Validity 3 The Bull's-eye Analogy References |

# Reliability

Reliability is a property of a measure that refers to its statistical stability, or the degree to which multiple observations of identical phenomena yield identical results.In public health, sometimes we are interested in the actual number of health events, but more often we use observed measures such as birth or death rates to indicate the underlying risk of illness or disability in a population. But the observed measures of risk fluctuate even when the true underlying risk of disease does not. The reasons for the variability usually include one or more of the following factors: 1) the health event is relatively rare, 2) the population size is relatively small, and 3) the health events do not occur at regular time intervals.

Even for complete count datasets, such as birth and death certificate datasets, random fluctuations over time will yield estimates that are not reliable. Consider the case of low birth weight in a small community. In this community one low birth weight infant is born each month, on average. But health events such as low birth weight do not occur at regular time intervals - there is randomness in the timing of low birth weight occurrence. In our small community, if three mothers give birth to low birth weight infants in December of Year 1, and none do in January or February of Year 2, it may appear as though the risk of low birth weight births has declined from Year 1 to Year 2. Actually the true underlying risk did not change, the rates were merely subject to randomness in the timing of the low birthweight births.

Rates that fluctuate over time, in the absence of changes in underlying risk, are considered
"unreliable" or "unstable." Since the underlying risk typically changes very slowly, the term,
"unstable" is used to refer to any observed rates that fluctuate widely in the absence of changes
in the true underlying risk.

The terms "reliability," "precision," and "stability" are used to refer to the amount of random error that is likely to be included in an observed measure. Fortunately, we can use statistical techniques to assess the stability of a given rate. The confidence interval is a common statistical measure that conveys the reliability of an estimate. It may be thought of as the range of probable true values for a statistic. A wide confidence interval (wide in relation to the rate) indicates that the rate is likely to include a lot of random error.

## Relative Standard Error (RSE)

Another, related, measure is called "Relative Standard Error," (RSE). RSE is the ratio of the standard error to the point estimate (e.g., rate, average), and is commonly expressed as a percentage.

RSE is on a continuum, from 0 to 1. Cut-off points of 0.30 and 0.50 are used as conventions for
interpreting the RSE. A rate associated with an RSE of 0.30 (the standard error is 30% as large as the
estimate) is deemed by most public health epidemiologists as too unstable to report.

Measures of statistical stability are related to one another. The confidence interval is based on the standard error, and the standard error is based on the variance of the statistic and the population or sample size.

The 95% confidence interval for large samples is calculated as the standard error of a statistic multiplied by 1.96.

# Validity

Validity is a property of a measurement that refers to its accuracy, or the degree to which observations reflect the true value of a phenomenon. It is possible to have a measure that is very reliable, but not at all valid.In public health, we are lucky because the validity of most of our measures is really quite good. "Cause of death" on death certificates is certified by a physician. Survey measures have been tested to maximize validity. Birthweight is measured and reported at the birth hospital. There are some measures that we question, for instance self-reported body weight, but on the whole, the measures we use have a high degree of validity.

# The Bulls-eye Analogy

In the three figures, below, the bulls-eye of the target represents the true underlying risk of disease in a population, and the holes in the target represent multiple observed measurements of the risk. In the first figure, the measure is reliable - it measures nearly the same value each time. But the measure in Figure 1 is not valid - the average of the scores is not close to the true underlying risk. In the second figure, the scores are not very reliable - there is a lot of variability in the scores, but they center around the true risk value, so they are valid (at least on average). In the third figure, the measure is both reliable and valid.
The term "accuracy" is often used in relation to validity, while the term, "precision" is used
to describe reliability.