Stereotypes & Statistical Distributions

01 — Definitions

What Is a Stereotype?

A stereotype is a generalisation from a group average to an individual member — an inference about what a specific person is like based on a characteristic of the group they belong to. This page explains why that inference is usually statistically unjustified and legally problematic.

Two types of generalisation

Not all generalisations are equally problematic. The distinction that matters is between a statistical fact about a group and a prediction about an individual.

Statistical fact (valid)

"On average, men in England are 13.5 cm taller than women."

This is a verifiable empirical statement about group averages. It is relevant for decisions genuinely calibrated to a group average — for example, setting the default height of wash basins, mirrors, and worktops in sex-segregated facilities, or defining population health reference ranges for clinical screening.

Individual prediction (invalid)

"This person is a woman, therefore she is shorter than the average man."

This is an inference about an individual based on group membership. Because height distributions overlap substantially, it will be wrong for a meaningful proportion of individuals.

Stereotyping — working definition

Treating an individual as if they possess a characteristic of the average member of a group to which they belong, without evidence about the individual themselves. Stereotyping conflates descriptive statistics (what the group looks like on average) with predictive facts about a particular person.

Where stereotypes come from

Stereotypes are not always invented from nothing. Many are grounded in real, measurable group differences — differences in average height, in average time spent on unpaid care, in representation across occupational grades. The problem is not that the group-level data is false. The problem is what happens when that data is applied to individuals:

▲

The base rate error: Most group-level differences are small relative to the variation within each group. Applying a group average to an individual will be correct for some people and wrong for many others — but you cannot know which without looking at the individual.

▲

The distribution error: Even when a group difference is large (by social science standards), the distributions of the two groups overlap considerably. A substantial proportion of Group B will score higher than many members of Group A on the very dimension where Group A has the higher average.

▲

The ecological fallacy: A pattern that holds at the group level does not necessarily hold at the individual level. Group membership is a weak predictor of individual characteristics whenever within-group variation is large — which is almost always.

The central statistical argument

For the vast majority of traits relevant to employment, service delivery, and social policy, within-group variation is substantially larger than between-group differences. This means group membership carries very little predictive power for any individual — even when a real group difference exists.

02 — Interactive

The Overlap Principle

The interactive charts below show how even large group differences produce substantially overlapping distributions. Use the sliders to explore how the degree of overlap changes as the gap between the two group averages increases or decreases.

Reading the charts: overlap and effect size explained

What "overlap" means here: The darker shaded region where the two curves meet marks the range of values shared by both groups — the zone where you cannot reliably tell from this characteristic alone which group someone belongs to. Anyone whose value falls in that zone could plausibly belong to either group. The overlap percentage shown in the statistics below each chart is the share of the combined distribution area that sits in that shared zone. An overlap of 60% means that for 60% of the joint area, group membership gives you no reliable information about the individual's value.

What "effect size (d)" means: Effect size measures how far apart the two group averages are, expressed as a multiple of the typical spread of individual scores within each group. If most people in each group vary by about 7 cm from their group's average (mean), and the two averages are 14 cm apart, the effect size is 2.0. As a rough guide: below 0.2 is negligible; 0.2 is small; 0.5 is medium; 0.8 is large. Height — at around 2.0 — is one of the largest differences ever reliably measured between social groups. For most characteristics relevant to employment and service decisions, the effect size is considerably smaller and the overlap considerably greater.

Example 1 — Height and Sex

UK adult heights (NHS Health Survey for England). One of the largest reliable sex differences in any socially relevant trait. Even so, the distributions overlap considerably — and for most workplace and service decisions, the overlap is the point.

Men (average (mean): 175.3 cm)

Women (average (mean): 161.8 cm)

Shared range (overlap)

32%

Distribution
overlap

2.0

Cohen's d
(effect size)

1.7%

Women above
male average

2.9%

Men below
female average

Gap between group averages (means) 13.5 cm (real data)

Default values from the NHS Health Survey for England (2019): men mean 175.3 cm (SD 7.1 cm), women mean 161.8 cm (SD 6.4 cm). Height is one of the largest sex differences in any human trait — Cohen's d ≈ 2.0. Even at this scale, the distributions share ~32% of their area. For most characteristics relevant to employment or service decisions, d is much smaller and overlap is much greater.

Example 2 — Unpaid Care Hours and Sex

Daily unpaid care and domestic work (ONS Time Use Survey, UK). Women spend on average more time on unpaid care than men — a real and well-documented difference. But the distributions overlap substantially, meaning many men carry more caring responsibilities than many women.

Men (average (mean): 1.9 hrs/day)

Women (average (mean): 3.5 hrs/day)

Shared range (overlap)

62%

Distribution
overlap

0.9

Cohen's d
(effect size)

18%

Men above
female average

24%

Women below
male average

Gap between group averages (means) 1.6 hrs/day (real data)

Based on ONS Time Use Survey (2015) data on unpaid care and domestic work: men ≈ 1.9 hrs/day (SD ≈ 1.7), women ≈ 3.5 hrs/day (SD ≈ 2.0). The distributions are right-skewed in reality (many people report zero or near-zero); the normal approximation is used here for illustration. The employment implication: assuming that a woman employee has significant caring responsibilities — or that a male employee does not — will be wrong for a substantial proportion of people. A policy that treats all employees as if they fit the group average — for example, assuming women will need career breaks or flexibility, or that men will not — is built on a stereotype, and will be wrong for the many individuals who do not match it. This is exactly the mechanism that can give rise to an indirect sex discrimination claim under s.19 EA 2010 — see Section 04: Indirect Discrimination for the legal framework.

The key insight from both examples

Even for height — one of the largest and most reliable sex differences in any human trait — around 32% of the combined distribution area overlaps. For characteristics more relevant to employment decisions (commitment, performance, availability, ambition), effect sizes are typically much smaller and overlap much greater.

When you make a decision about an individual based on a group average, you are working with weak information at best. The larger the within-group variance relative to the between-group difference, the weaker that information is — and individual-level evidence about the actual person will almost always be more reliable.

03 — Applied Example

From Statistics to Discrimination

When a stereotype — a group-level generalisation — is used to make a decision about an individual, it discriminates against the majority of members of that group who do not conform to the stereotype. The interactive chart below uses job performance and age as a case study.

What the research actually shows: age and job performance

Age-related stereotypes in the workplace are among the most persistent and consequential. The stereotype: older workers are less productive, less adaptable, and less committed. The evidence tells a very different story.

Ng & Feldman's (2008) meta-analysis — drawing on 380 studies covering over 400,000 workers — found that age correlates near-zero with core task performance (r = −0.01). The correlations with overall performance, organisational citizenship behaviour, and creative performance were similarly negligible. Older workers showed lower rates of counterproductive behaviour and absenteeism than younger workers.

−0.01

Age × task
performance (r)

+0.04

Age × citizenship
behaviour (r)

−0.06

Age × counter-
productive behaviour (r)

380

Studies in
meta-analysis

A correlation of −0.01 is, for practical purposes, zero. Individual variation in job performance within any age group dwarfs any age-related average difference.

Example 3 — Job Performance and Age

Performance distributions (normalised units) for four age bands. Default values reflect the meta-analytic finding (near-zero age effect). Adjust the slider to see how even a pessimistic assumed decline leaves distributions overwhelmingly overlapping.

Age 25–34

Age 35–44

Age 45–54

Age 55–64

98%

Overlap:
25–34 vs 55–64

0.06

Cohen's d
25–34 vs 55–64

0.02

Assumed decline
per decade (SD)

47%

55–64ers above
25–34 average

Assumed performance decline per decade 0.02 (meta-analytic estimate)

Based on Ng & Feldman (2008), "The Relationship of Age to Ten Dimensions of Job Performance", Journal of Applied Psychology, 93(2). At the meta-analytic estimate of r = −0.01 (default setting), the age effect over three decades is approximately 0.06 SD — producing distributions that are 98% overlapping. Even at the maximum pessimistic setting (0.5 SD/decade), the oldest and youngest groups still overlap by around 57%. In no realistic scenario does age function as a meaningful discriminator of individual job performance.

The discrimination mechanism

When an organisation uses age as a proxy for capability or productivity, it treats every individual in the older age group as if they are the (stereotyped) average — excluding people who are above average and would have performed well. This is not a hypothetical: if 47% of workers aged 55–64 outperform the average 25–34 year old, any age-based exclusion is wrong about nearly half the people it affects.

The precision problem

The statistical argument against stereotyping is ultimately about the precision of inference. Group membership is a low-precision predictor for almost all individual outcomes. Using it as the basis for decisions — rather than individual-level information — introduces systematic error.

Better predictors almost always exist: past performance records, skills assessments, structured interviews, work samples. When individual-level evidence is available, falling back on group membership as a shortcut is not just statistically indefensible — it is the definition of prejudice.

04 — Legal Framework

Indirect Discrimination

A stereotype embedded in a policy, rule, or practice — rather than a one-off individual act — can give rise to a claim of indirect discrimination. There are two broad patterns: a neutral rule that has unequal effects on a protected group, and a rule that explicitly uses a protected characteristic (or a close proxy for it) as its criterion. In both cases, the organisation may be able to avoid liability if it can show the PCP is a proportionate means of achieving a legitimate aim — the justification defence, explored in the third card below.

Indirect Discrimination — Equality Act 2010, section 19

A person (A) discriminates against another (B) if A applies to B a provision, criterion or practice (PCP) which is applied equally to persons not sharing B's protected characteristic but which puts or would put persons with that characteristic at a particular disadvantage compared with persons not sharing it — unless A can show it is a proportionate means of achieving a legitimate aim.

Neutral rules with unequal effects

The most common form of indirect discrimination involves a rule or requirement that says nothing about any protected characteristic on its face, but whose practical effect falls unequally across groups. The organisation did not set out to disadvantage anyone — but because the rule's implicit assumptions do not fit all groups equally, some groups are put at a particular disadvantage.

The distribution data from Example 2 illustrates the mechanism directly: because women on average carry more unpaid care than men, any requirement that clashes with significant caring commitments will exclude women at higher rates — not by intent, but as a consequence of the underlying difference in averages.

Policy or rule (PCP)	Protected characteristic	Why it causes unequal effects
Requires full-time hours	Sex	Women carry the majority of unpaid care in aggregate (ONS data). A blanket full-time requirement excludes those who need part-time work at higher rates — even though the rule itself makes no reference to sex.
Requires a continuous employment history with no gaps	Sex / Pregnancy & maternity	Career gaps caused by maternity leave or childcare fall disproportionately on women. A requirement for an unbroken work history may be impossible to meet without penalising the reason for the gap.
Mandatory early-morning or Friday-evening shifts	Religion or belief	Shift patterns that conflict with Sabbath or other religious observance put certain groups at a particular disadvantage — even when the scheduling rule applies to all staff equally.

These rules are facially neutral. The indirect discrimination arises from their application to a population that does not conform uniformly to the rule's implicit assumptions.

When the characteristic itself becomes the criterion

A more direct form arises when the rule explicitly uses a protected characteristic — or a characteristic so closely correlated with a protected group that it functions as a stand-in. Here the link to stereotyping from Section 3 is most visible: the policy treats every member of a group as if they share the group average, when many do not.

Policy or rule (PCP)	Protected characteristic	Why justification is harder
Maximum recruitment age (e.g. no applications accepted from anyone over 50)	Age	Age is the explicit criterion. As Example 3 shows, age correlates near-zero with job performance — so using age as a cut-off excludes many people who would have performed well. The justification defence is very difficult to sustain.
Minimum height requirement above genuine operational need	Sex	Height is strongly sex-linked (d ≈ 2.0, Example 1). A blanket height threshold functions as a near-proxy for sex. Unless the threshold is genuinely operationally necessary, it amounts to indirect sex discrimination.
Physical fitness standard set above the genuine operational requirement	Sex / Age / Disability	Women on average have lower aerobic capacity than men; older workers and disabled people also perform differently on physical tests. If the standard exceeds what the role genuinely requires, the surplus requirement cannot be justified. See the justification defence below for when a fitness standard is lawfully justifiable.

The two-part justification test

A PCP that causes group disadvantage is not automatically unlawful. The organisation can justify it by satisfying both elements:

1. Legitimate aim: The PCP pursues a real, concrete organisational need — not one that is itself discriminatory. Cost saving alone is generally insufficient as a justification. Genuine operational effectiveness, health and safety, and service quality are typically acceptable aims.

2. Proportionate means: The PCP is no more discriminatory than necessary to achieve that aim. If a less discriminatory alternative achieves the same result, the more discriminatory option cannot be proportionate. This is where the statistical argument from Sections 2 and 3 is most directly applicable: where a characteristic is a poor predictor of the operational need, an individual-level assessment will almost always be a less discriminatory alternative — and proportionality requires it.

◆

The link to distributions: The proportionality test asks whether a group-level rule is genuinely doing the job attributed to it. Where within-group variation is large, a group-level cut-off will exclude many individuals who would have met the genuine operational need. The larger the overlap between distributions, the weaker the case that a group-based criterion is proportionate — and the stronger the argument that individual assessment is feasible and required.

The Justification Defence — when disparate impact is lawful

Not every policy with a disparate impact is unlawful. Where a requirement reflects a genuine operational necessity that cannot be met by less discriminatory means, the justification defence can succeed. The following examples from policing illustrate this.

▲

The police Job-Related Fitness Test (JRFT). The JRFT (a 15-metre shuttle run to level 5:4 for most response roles) is designed to represent the minimum aerobic capacity needed for safe operational deployment. A fitness standard set at this level is more likely to survive a proportionality challenge than one set above it, because public safety and officer safety are legitimate aims and physical capacity is genuinely relevant to the role. The standard must be set at the minimum operationally necessary level and cannot be calibrated to exclude a demographic group. Each officer who does not meet the standard should be assessed individually, with reasonable adjustments considered for those with a disability.

▲

Vision standards for police recruitment. The College of Policing medical standards for initial police recruitment require corrected visual acuity of at least 6/12 in both eyes and uncorrected acuity no worse than 6/36 in the better eye (College of Policing, Statutory guidance on vetting and associated medical standards guidance, current edition — practitioners should verify against the current published standards at college.police.uk). A candidate whose vision falls below these thresholds would not ordinarily meet the standard for recruitment as a police officer. This is likely justifiable as a proportionate means of ensuring officers can operate safely and protect the public. Important notes: these are recruitment standards; specific roles carry different requirements — firearms officers, for example, face more stringent visual acuity standards. Standards must be applied through individual assessment rather than assumed from a diagnosis or category of impairment.

The principle in both cases is the same: the requirement must be genuinely necessary for the role, set at the minimum level needed, and applied to the individual through assessment — not used as a shortcut to exclude groups whose members are, on average, less likely to meet it.

Showing particular disadvantage

Indirect discrimination requires showing that the PCP puts the protected group at a "particular disadvantage" — not just that a single individual is affected. In practice this is established by statistics, or where statistics are unavailable, by evidence that the PCP is "inherently likely" to have a disproportionate effect.

The overlap distributions in Section 2 demonstrate directly how this disadvantage arises: a rule built on a group average will catch many members of the disadvantaged group who do not conform to that average — exactly the proportion represented by the overlap region. The overlap percentage is, in effect, a measure of the rule's error rate when applied to individuals.

05 — Distinction

Positive Action vs Stereotyping

Positive action and stereotyping are sometimes confused — both involve taking protected characteristics into account. They differ fundamentally in their basis, their target, and their legality.

Stereotyping

Assumes an individual has the average characteristics of a group based solely on group membership

Applies group-level generalisations to individuals without evidence

Based on presumed individual characteristics

Typically disadvantages members of the target group

Not time-limited — persists indefinitely

Unlawful where it causes particular disadvantage (indirect discrimination) or treats an individual less favourably (direct discrimination)

Positive Action (ss.158–159 EA2010)

Responds to evidenced group-level disadvantage or under-representation, not to assumptions about individuals

Addresses documented systemic barriers, not presumed individual characteristics

Based on group-level data about outcomes, not individual-level stereotypes

Must not involve treating individuals with the protected characteristic more favourably than others in a way that amounts to a quota (s.159)

Should be proportionate and regularly reviewed — it is a remedy for a documented disparity, not a permanent arrangement

Lawful where the conditions in ss.158–159 are met

The three conditions for lawful positive action (s.158)

Positive action is permitted where the organisation reasonably thinks that:

(a) Disadvantage: Persons sharing the protected characteristic suffer a disadvantage connected to that characteristic; or

(b) Under-representation: Persons sharing the characteristic have needs different from those without it, or participation in an activity by persons sharing the characteristic is disproportionately low; and

(c) Proportionality: The positive action taken is a proportionate means of achieving the aim of enabling or encouraging persons sharing the characteristic to overcome or minimise the disadvantage, or to participate in the activity.

◆

The key difference in statistical terms: Positive action is grounded in observed group-level outcome data — actual evidence of under-representation or disadvantage. It does not assume individuals have particular characteristics; it addresses documented barriers. Stereotyping infers individual characteristics from group averages. One is evidence-based; the other is a shortcut that the evidence does not support.

◆

Section 158 is not limited to employment. It applies broadly to any activity — including service provision, education, and community engagement. An organisation can take positive action under s.158 to address disadvantage or under-representation among the people it serves, not only its workforce. Section 159, by contrast, applies specifically to employment decisions about recruitment and promotion — it permits treating a candidate with a protected characteristic more favourably than an equally-qualified candidate where the conditions are met, but only in that employment context. It does not extend to service delivery.

Practical illustration: recruitment

Unlawful stereotyping

"We will not shortlist candidates over 50 because older workers are less adaptable to our technology." This assumes an individual characteristic (low adaptability) from group membership (age), without evidence. It causes direct age discrimination and relies on a stereotype unsupported by the research on age and performance.

Lawful positive action

"Our workforce data shows that workers over 50 are under-represented at senior grades relative to the workforce as a whole. We will include targeted outreach in our senior recruitment campaign to encourage applications from older workers." This responds to documented evidence of under-representation, does not impose quotas, and addresses the barrier rather than assuming individual characteristics.

06 — Summary

Key Takeaways

What practitioners need to know when evaluating whether a policy, rule, or practice relies on stereotyping — and when group-level data can legitimately inform action.

Within-group variation almost always exceeds between-group differences

For virtually every characteristic relevant to employment and service delivery, individuals within any group vary more than the groups differ from each other. Group membership is therefore a weak predictor of individual characteristics — even when a real group difference exists.

Even large effects produce substantial overlap

Height — one of the largest sex differences in any human trait — produces distributions that overlap by around 32%. For job performance, commitment, adaptability, and most other employment-relevant traits, effect sizes are far smaller and overlap is far greater.

Group-level statistics cannot justify individual treatment

Applying a group average to an individual is statistically unjustified whenever within-group variance is large. Where individual-level evidence is available, it will almost always be a more reliable basis for decisions than group membership.

Indirect discrimination does not require intent

A policy built on a stereotype can be indirectly discriminatory even if no individual act of prejudice was intended. The test is effect: does the PCP put a protected group at a particular disadvantage? Policies justified by group stereotypes often fail the proportionality test.

The justification defence requires evidence

To justify a PCP that causes group disadvantage, organisations must show the aim is legitimate and the means are proportionate. Where the stereotyped characteristic is a poor predictor of the operational need, less discriminatory alternatives will usually exist — and will be required by the proportionality test.

Positive action and stereotyping are opposites

Positive action is grounded in evidence of actual group disadvantage and addresses systemic barriers. Stereotyping infers individual characteristics from group averages. One responds to real data; the other substitutes assumed characteristics for individual assessment.

References

Ng, T. W. H., & Feldman, D. C. (2008). The relationship of age to ten dimensions of job performance. Journal of Applied Psychology, 93(2), 392–423.

Office for National Statistics (2016). Changes in the value and division of unpaid care work in the UK: 2000 to 2015. ONS.

NHS Digital (2020). Health Survey for England 2019. NHS.

Equality Act 2010, ss. 19, 158, 159. legislation.gov.uk.

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.

Distribution overlap coefficients computed by numerical integration of overlapping normal PDFs. Cohen's d computed using pooled standard deviation. Statistical illustrations use normal approximations; real-world distributions are often right-skewed.