Aggregation Effects
Most urban data you’ll encounter is aggregated: for example, individual records grouped into neighbourhoods, postcodes, or census tracts. Aggregation protects privacy and makes data manageable, but it also means you’re working with averages and totals rather than individual observations.
How data is grouped affects what conclusions you can draw from it:
- Loss of detail: Individual-level information disappears when combined into area averages. The average age for a country tells you nothing about the average age in a specific postcode.
- Scale effects: Smaller units (streets, postcodes) preserve more local variation than larger units (cities, countries). As spatial scale increases, context-specific patterns get smoothed out.
The way data is aggregated can change the outcomes of analysis and mapping. Two well-known effects are the Modifiable Areal Unit Problem (MAUP) and the ecological fallacy. Keep these in mind when interpreting results in the next section, where you’ll aggregate census data into neighbourhoods.
MAUP
The Modifiable Areal Unit Problem (MAUP) is the idea that your results can change when you change the units of aggregation — even if the underlying individual data stays the same.
It has two common components:
- Scale effect: aggregating to larger units (e.g., from blocks → neighbourhoods → districts) changes averages, variances, and correlations.
- Zoning effect: keeping the same number/size of units but drawing the boundaries differently can change the groupings of the underlying data.
Gerrymandering is a well-known real-world illustration of the zoning effect: changing electoral district boundaries (while keeping the same voters) can change aggregated counts and therefore the outcome of an election.
Ecological Fallacy
The ecological fallacy is an inference error: you observe a relationship in group-level (aggregate) data and then assume the same relationship holds for individuals within those groups. Because aggregation hides within-group variation, an ecological correlation can differ from (or even contradict) the individual-level relationship.
This doesn’t mean aggregate patterns are “wrong” — it means they answer a different question. A correlation between neighbourhood averages is about neighbourhoods, not people.
A related (but distinct) issue is Simpson’s paradox, where an overall association reverses after you stratify/adjust for a confounder (often because groups have different compositions). The Whickham smoking/survival example below is best described as Simpson’s paradox due to confounding by age.
As a general rule, if you care about individuals, use individual-level (unaggregated) data where possible, or model/stratify key confounders rather than relying on area averages.
- The Ecological Fallacy (YouTube)
Check Your Understanding
A 20-year follow-up study in Whickham, England found that smokers had higher survival rates than non-smokers. Does this mean smoking is protective?
No. The result is misleading because it ignores a confounder: age.
In the original survey, smokers were younger on average than non-smokers. Over a 20‑year period, age is a strong predictor of survival. If you compare survival without accounting for age, smokers can appear to do better simply because the smoker group starts out younger.
When you compare people within the same age group (or statistically adjust for age), the pattern reverses: at a given age, smokers have lower survival than non-smokers. This reversal is a classic example of Simpson’s paradox (Appleton, French, and Vanderpump, 1996).