A unit in a statistical analysis refers to one member of a set of entities being studied. It is the material source for the mathematical abstraction of a "random variable". Common examples of a unit would be a single person, animal, plant, or manufactured item that belongs to a larger collection of such entities being studied.
Units are often referred to as being either experimental units, sampling units or, more generally, units of observation:
- An "experimental unit" is typically thought of as one member of a set of objects that are initially equivalent, with each object then subjected to one of several experimental treatments.
- A "sampling unit" is typically thought of as an object that has been sampled from a statistical population. This term is commonly used in opinion polling and survey sampling.
In most statistical studies, the goal is to generalize from the observed units to a larger set consisting of all comparable units that exist but are not directly observed. For example, if we randomly sample 100 people and ask them which candidate they intend to vote for in an election, our main interest is in the voting behavior of all eligible voters, not exclusively on the 100 observed units.
In some cases, the observed units may not form a sample from any meaningful population, but rather constitute a convenience sample, or may represent the entire population of interest. In this situation, we may study the units descriptively, or we may study their dynamics over time. But it typically does not make sense to talk about generalizing to a larger population of such units. Studies involving countries or business firms are often of this type. Clinical trials also typically use convenience samples, however the aim is often to make inferences about the efficacy of treatments in other patients, and given the inclusion and exclusion criteria for some clinical trials, the sample may not be representative of the majority of patients with the condition or disease.
In simple data sets, the units are in one-to-one correspondence with the data values. In more complex data sets, multiple measurements are made for each unit. For example, if blood pressure measurements are made daily for a week on each subject in a study, there would be seven data values for each statistical unit. Multiple measurements taken on an individual are not independent (they will be more alike compared to measurements taken on different individuals). Ignoring these dependencies during the analysis can lead to an inflated sample size or pseudoreplication.
While a unit is often the lowest level at which observations are made, in some cases, a unit can be further decomposed as a statistical assembly.
- Bailey, R. A (2008). Design of Comparative Experiments. Cambridge University Press. ISBN 978-0-521-68357-9. Pre-publication chapters are available on-line.
- Hinkelmann, Klaus and Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (Second ed.). Wiley. ISBN 978-0-471-72756-9.
- Cochran, William G. (1977). Sampling Techniques (Third ed.). Wiley. ISBN 0-471-16240-X.
- Särndal, Carl-Erik, and Swensson, Bengt, and Wretman, Jan (1992). Model Assisted Survey Sampling. Springer-Verlag. ISBN 0-387-40620-4.