Balance Works
Evidence Explainer

Stereotypes & Statistical Distributions

Why group averages cannot justify individual treatment — the statistical case against stereotyping, with interactive distribution examples.

01 — Definitions

What Is a Stereotype?

A stereotype is a generalisation from a group average to an individual member — an inference about what a specific person is like based on a characteristic of the group they belong to. This page explains why that inference is usually statistically unjustified and legally problematic.

Two types of generalisation

Not all generalisations are equally problematic. The distinction that matters is between a statistical fact about a group and a prediction about an individual.

Statistical fact (valid)

"On average, men in England are 13.5 cm taller than women."

This is a verifiable empirical statement about group averages. It is relevant for decisions genuinely calibrated to a group average — for example, setting the default height of wash basins, mirrors, and worktops in sex-segregated facilities, or defining population health reference ranges for clinical screening.

Individual prediction (invalid)

"This person is a woman, therefore she is shorter than the average man."

This is an inference about an individual based on group membership. Because height distributions overlap substantially, it will be wrong for a meaningful proportion of individuals.

Stereotyping — working definition

Treating an individual as if they possess a characteristic of the average member of a group to which they belong, without evidence about the individual themselves. Stereotyping conflates descriptive statistics (what the group looks like on average) with predictive facts about a particular person.

Where stereotypes come from

Stereotypes are not always invented from nothing. Many are grounded in real, measurable group differences — differences in average height, in average time spent on unpaid care, in representation across occupational grades. The problem is not that the group-level data is false. The problem is what happens when that data is applied to individuals:

The base rate error: Most group-level differences are small relative to the variation within each group. Applying a group average to an individual will be correct for some people and wrong for many others — but you cannot know which without looking at the individual.
The distribution error: Even when a group difference is large (by social science standards), the distributions of the two groups overlap considerably. A substantial proportion of Group B will score higher than many members of Group A on the very dimension where Group A has the higher average.
The ecological fallacy: A pattern that holds at the group level does not necessarily hold at the individual level. Group membership is a weak predictor of individual characteristics whenever within-group variation is large — which is almost always.

The central statistical argument

For the vast majority of traits relevant to employment, service delivery, and social policy, within-group variation is substantially larger than between-group differences. This means group membership carries very little predictive power for any individual — even when a real group difference exists.

02 — Interactive

The Overlap Principle

The interactive charts below show how even large group differences produce substantially overlapping distributions. Use the sliders to explore how the degree of overlap changes as the gap between the two group averages increases or decreases.

Reading the charts: overlap and effect size explained

What "overlap" means here: The darker shaded region where the two curves meet marks the range of values shared by both groups — the zone where you cannot reliably tell from this characteristic alone which group someone belongs to. Anyone whose value falls in that zone could plausibly belong to either group. The overlap percentage shown in the statistics below each chart is the share of the combined distribution area that sits in that shared zone. An overlap of 60% means that for 60% of the joint area, group membership gives you no reliable information about the individual's value.

What "effect size (d)" means: Effect size measures how far apart the two group averages are, expressed as a multiple of the typical spread of individual scores within each group. If most people in each group vary by about 7 cm from their group's average (mean), and the two averages are 14 cm apart, the effect size is 2.0. As a rough guide: below 0.2 is negligible; 0.2 is small; 0.5 is medium; 0.8 is large. Height — at around 2.0 — is one of the largest differences ever reliably measured between social groups. For most characteristics relevant to employment and service decisions, the effect size is considerably smaller and the overlap considerably greater.

Example 1 — Height and Sex
UK adult heights (NHS Health Survey for England). One of the largest reliable sex differences in any socially relevant trait. Even so, the distributions overlap considerably — and for most workplace and service decisions, the overlap is the point.
Men (average (mean): 175.3 cm)
Women (average (mean): 161.8 cm)
Shared range (overlap)
32%
Distribution
overlap
2.0
Cohen's d
(effect size)
1.7%
Women above
male average
2.9%
Men below
female average
Default values from the NHS Health Survey for England (2019): men mean 175.3 cm (SD 7.1 cm), women mean 161.8 cm (SD 6.4 cm). Height is one of the largest sex differences in any human trait — Cohen's d ≈ 2.0. Even at this scale, the distributions share ~32% of their area. For most characteristics relevant to employment or service decisions, d is much smaller and overlap is much greater.
Example 2 — Unpaid Care Hours and Sex
Daily unpaid care and domestic work (ONS Time Use Survey, UK). Women spend on average more time on unpaid care than men — a real and well-documented difference. But the distributions overlap substantially, meaning many men carry more caring responsibilities than many women.
Men (average (mean): 1.9 hrs/day)
Women (average (mean): 3.5 hrs/day)
Shared range (overlap)
62%
Distribution
overlap
0.9
Cohen's d
(effect size)
18%
Men above
female average
24%
Women below
male average
Based on ONS Time Use Survey (2015) data on unpaid care and domestic work: men ≈ 1.9 hrs/day (SD ≈ 1.7), women ≈ 3.5 hrs/day (SD ≈ 2.0). The distributions are right-skewed in reality (many people report zero or near-zero); the normal approximation is used here for illustration. The employment implication: assuming that a woman employee has significant caring responsibilities — or that a male employee does not — will be wrong for a substantial proportion of people. A policy that treats all employees as if they fit the group average — for example, assuming women will need career breaks or flexibility, or that men will not — is built on a stereotype, and will be wrong for the many individuals who do not match it. This is exactly the mechanism that can give rise to an indirect sex discrimination claim under s.19 EA 2010 — see Section 04: Indirect Discrimination for the legal framework.

The key insight from both examples

Even for height — one of the largest and most reliable sex differences in any human trait — around 32% of the combined distribution area overlaps. For characteristics more relevant to employment decisions (commitment, performance, availability, ambition), effect sizes are typically much smaller and overlap much greater.

When you make a decision about an individual based on a group average, you are working with weak information at best. The larger the within-group variance relative to the between-group difference, the weaker that information is — and individual-level evidence about the actual person will almost always be more reliable.

03 — Applied Example

From Statistics to Discrimination

When a stereotype — a group-level generalisation — is used to make a decision about an individual, it discriminates against the majority of members of that group who do not conform to the stereotype. The interactive chart below uses job performance and age as a case study.

What the research actually shows: age and job performance

Age-related stereotypes in the workplace are among the most persistent and consequential. The stereotype: older workers are less productive, less adaptable, and less committed. The evidence tells a very different story.

Ng & Feldman's (2008) meta-analysis — drawing on 380 studies covering over 400,000 workers — found that age correlates near-zero with core task performance (r = −0.01). The correlations with overall performance, organisational citizenship behaviour, and creative performance were similarly negligible. Older workers showed lower rates of counterproductive behaviour and absenteeism than younger workers.

−0.01
Age × task
performance (r)
+0.04
Age × citizenship
behaviour (r)
−0.06
Age × counter-
productive behaviour (r)
380
Studies in
meta-analysis

A correlation of −0.01 is, for practical purposes, zero. Individual variation in job performance within any age group dwarfs any age-related average difference.

Example 3 — Job Performance and Age
Performance distributions (normalised units) for four age bands. Default values reflect the meta-analytic finding (near-zero age effect). Adjust the slider to see how even a pessimistic assumed decline leaves distributions overwhelmingly overlapping.
Age 25–34
Age 35–44
Age 45–54
Age 55–64
98%
Overlap:
25–34 vs 55–64
0.06
Cohen's d
25–34 vs 55–64
0.02
Assumed decline
per decade (SD)
47%
55–64ers above
25–34 average
Based on Ng & Feldman (2008), "The Relationship of Age to Ten Dimensions of Job Performance", Journal of Applied Psychology, 93(2). At the meta-analytic estimate of r = −0.01 (default setting), the age effect over three decades is approximately 0.06 SD — producing distributions that are 98% overlapping. Even at the maximum pessimistic setting (0.5 SD/decade), the oldest and youngest groups still overlap by around 57%. In no realistic scenario does age function as a meaningful discriminator of individual job performance.

The discrimination mechanism

When an organisation uses age as a proxy for capability or productivity, it treats every individual in the older age group as if they are the (stereotyped) average — excluding people who are above average and would have performed well. This is not a hypothetical: if 47% of workers aged 55–64 outperform the average 25–34 year old, any age-based exclusion is wrong about nearly half the people it affects.

The precision problem

The statistical argument against stereotyping is ultimately about the precision of inference. Group membership is a low-precision predictor for almost all individual outcomes. Using it as the basis for decisions — rather than individual-level information — introduces systematic error.

Better predictors almost always exist: past performance records, skills assessments, structured interviews, work samples. When individual-level evidence is available, falling back on group membership as a shortcut is not just statistically indefensible — it is the definition of prejudice.

05 — Distinction

Positive Action vs Stereotyping

Positive action and stereotyping are sometimes confused — both involve taking protected characteristics into account. They differ fundamentally in their basis, their target, and their legality.

Stereotyping
Assumes an individual has the average characteristics of a group based solely on group membership
Applies group-level generalisations to individuals without evidence
Based on presumed individual characteristics
Typically disadvantages members of the target group
Not time-limited — persists indefinitely
Unlawful where it causes particular disadvantage (indirect discrimination) or treats an individual less favourably (direct discrimination)
Positive Action (ss.158–159 EA2010)
Responds to evidenced group-level disadvantage or under-representation, not to assumptions about individuals
Addresses documented systemic barriers, not presumed individual characteristics
Based on group-level data about outcomes, not individual-level stereotypes
Must not involve treating individuals with the protected characteristic more favourably than others in a way that amounts to a quota (s.159)
Should be proportionate and regularly reviewed — it is a remedy for a documented disparity, not a permanent arrangement
Lawful where the conditions in ss.158–159 are met

The three conditions for lawful positive action (s.158)

Positive action is permitted where the organisation reasonably thinks that:

(a) Disadvantage: Persons sharing the protected characteristic suffer a disadvantage connected to that characteristic; or

(b) Under-representation: Persons sharing the characteristic have needs different from those without it, or participation in an activity by persons sharing the characteristic is disproportionately low; and

(c) Proportionality: The positive action taken is a proportionate means of achieving the aim of enabling or encouraging persons sharing the characteristic to overcome or minimise the disadvantage, or to participate in the activity.

The key difference in statistical terms: Positive action is grounded in observed group-level outcome data — actual evidence of under-representation or disadvantage. It does not assume individuals have particular characteristics; it addresses documented barriers. Stereotyping infers individual characteristics from group averages. One is evidence-based; the other is a shortcut that the evidence does not support.
Section 158 is not limited to employment. It applies broadly to any activity — including service provision, education, and community engagement. An organisation can take positive action under s.158 to address disadvantage or under-representation among the people it serves, not only its workforce. Section 159, by contrast, applies specifically to employment decisions about recruitment and promotion — it permits treating a candidate with a protected characteristic more favourably than an equally-qualified candidate where the conditions are met, but only in that employment context. It does not extend to service delivery.

Practical illustration: recruitment

Unlawful stereotyping

"We will not shortlist candidates over 50 because older workers are less adaptable to our technology." This assumes an individual characteristic (low adaptability) from group membership (age), without evidence. It causes direct age discrimination and relies on a stereotype unsupported by the research on age and performance.

Lawful positive action

"Our workforce data shows that workers over 50 are under-represented at senior grades relative to the workforce as a whole. We will include targeted outreach in our senior recruitment campaign to encourage applications from older workers." This responds to documented evidence of under-representation, does not impose quotas, and addresses the barrier rather than assuming individual characteristics.

06 — Summary

Key Takeaways

What practitioners need to know when evaluating whether a policy, rule, or practice relies on stereotyping — and when group-level data can legitimately inform action.

Within-group variation almost always exceeds between-group differences

For virtually every characteristic relevant to employment and service delivery, individuals within any group vary more than the groups differ from each other. Group membership is therefore a weak predictor of individual characteristics — even when a real group difference exists.

Even large effects produce substantial overlap

Height — one of the largest sex differences in any human trait — produces distributions that overlap by around 32%. For job performance, commitment, adaptability, and most other employment-relevant traits, effect sizes are far smaller and overlap is far greater.

Group-level statistics cannot justify individual treatment

Applying a group average to an individual is statistically unjustified whenever within-group variance is large. Where individual-level evidence is available, it will almost always be a more reliable basis for decisions than group membership.

Indirect discrimination does not require intent

A policy built on a stereotype can be indirectly discriminatory even if no individual act of prejudice was intended. The test is effect: does the PCP put a protected group at a particular disadvantage? Policies justified by group stereotypes often fail the proportionality test.

The justification defence requires evidence

To justify a PCP that causes group disadvantage, organisations must show the aim is legitimate and the means are proportionate. Where the stereotyped characteristic is a poor predictor of the operational need, less discriminatory alternatives will usually exist — and will be required by the proportionality test.

Positive action and stereotyping are opposites

Positive action is grounded in evidence of actual group disadvantage and addresses systemic barriers. Stereotyping infers individual characteristics from group averages. One responds to real data; the other substitutes assumed characteristics for individual assessment.

References

Ng, T. W. H., & Feldman, D. C. (2008). The relationship of age to ten dimensions of job performance. Journal of Applied Psychology, 93(2), 392–423.

Office for National Statistics (2016). Changes in the value and division of unpaid care work in the UK: 2000 to 2015. ONS.

NHS Digital (2020). Health Survey for England 2019. NHS.

Equality Act 2010, ss. 19, 158, 159. legislation.gov.uk.

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.

Distribution overlap coefficients computed by numerical integration of overlapping normal PDFs. Cohen's d computed using pooled standard deviation. Statistical illustrations use normal approximations; real-world distributions are often right-skewed.