Weights in Population Analysis
This guide explains how to use observation weights in population_margins, how weighted averaging and delta‑method standard errors are computed, and how weights interact with groups and scenarios.
Scope and Policy
- Weights are supported in
population_marginsvia theweightskeyword. profile_marginsdoes not accept weights — profiles evaluate scenarios at reference points without averaging over the sample.- Statistical correctness: Weighted quantities use proper normalization and delta‑method SEs use the averaged gradient with the model’s full covariance matrix Σ.
Supported Forms
weights = nothing(default): Unweighted analysis.weights = :colname(Symbol): Column indatawith weights (sampling or frequency).weights = vector::AbstractVector{<:Real}: Vector of weights withlength == nrow(data).
Weighted Computation
Let w_i ≥ 0 be weights for observation i in the current context (after grouping filters). Then:
- Weighted mean effect:
AME = (∑ w_i · Δ_i) / (∑ w_i) - Weighted averaged gradient:
ḡ = (∑ w_i · g_i) / (∑ w_i) - Standard error (delta method):
se = sqrt(ḡ' · Σ · ḡ)
Where Δ_i is the per‑row effect (continuous derivative or categorical contrast) and g_i is the corresponding per‑row parameter gradient; Σ is the model covariance matrix.
These formulas are used consistently in:
- Ungrouped population effects and predictions
- Grouped analyses (applied within each subgroup)
- Scenario analyses (applied within each scenario × subgroup context)
Examples
using Random
using DataFrames, CategoricalArrays, GLM, Margins
Random.seed!(123)
n = 200
df = DataFrame(
y = randn(n),
x = randn(n),
z = randn(n),
group = categorical(rand(["A","B"], n)),
samp_w = rand(0.5:0.1:2.0, n), # sampling weights
freq_w = rand([1,2,3,4], n) # frequency weights
)
model = lm(@formula(y ~ x + z + group), df)
# 1) Unweighted population AME
ame_unw = population_margins(model, df; type=:effects, vars=[:x, :z])
# 2) Sampling weights via column name
ame_samp = population_margins(model, df; type=:effects, vars=[:x, :z], weights=:samp_w)
# 3) Frequency weights via column name
ame_freq = population_margins(model, df; type=:effects, vars=[:x, :z], weights=:freq_w)
# 4) Explicit weight vector
wvec = Float64.(df.samp_w)
ame_vec = population_margins(model, df; type=:effects, vars=[:x, :z], weights=wvec)
# 5) Grouped weighted analysis
grp_samp = population_margins(model, df; type=:effects, vars=[:x], groups=:group, weights=:samp_w)
# 6) Scenarios with weights (counterfactual z values)
scen_w = population_margins(model, df; type=:effects, vars=[:x], scenarios=(z=[-1.0, 0.0, 1.0]), weights=:samp_w)All results use weighted averaging with proper normalization by the total weight in each context and delta‑method SEs computed from the averaged gradient and full covariance Σ.
Best Practices
- Provide non‑negative weights; zero weights effectively drop observations.
- For grouped analyses, ensure the weight column/vector aligns with the original data (the implementation indexes weights by original row indices).
- Confirm units/interpretation: sampling vs frequency weights may yield different magnitudes depending on the empirical distribution they imply.
- Use stable data types (Float64 for weight vectors) to avoid implicit conversions.
Error Handling
- Length mismatch for
weights::Vectorvsnrow(data)→ error. - Invalid weight column name → error.
- Using a variable as both a weight and a simultaneous effect variable or grouping key may error if it creates an internal contradiction; prefer distinct columns.