Weights in Population Analysis

This guide explains how to use observation weights in population_margins, how weighted averaging and delta‑method standard errors are computed, and how weights interact with groups and scenarios.

Scope and Policy

Weights are supported in population_margins via the weights keyword.
profile_margins does not accept weights — profiles evaluate scenarios at reference points without averaging over the sample.
Statistical correctness: Weighted quantities use proper normalization and delta‑method SEs use the averaged gradient with the model’s full covariance matrix Σ.

Supported Forms

weights = nothing (default): Unweighted analysis.
weights = :colname (Symbol): Column in data with weights (sampling or frequency).
weights = vector::AbstractVector{<:Real}: Vector of weights with length == nrow(data).

Weighted Computation

Let w_i ≥ 0 be weights for observation i in the current context (after grouping filters). Then:

Weighted mean effect: AME = (∑ w_i · Δ_i) / (∑ w_i)
Weighted averaged gradient: ḡ = (∑ w_i · g_i) / (∑ w_i)
Standard error (delta method): se = sqrt(ḡ' · Σ · ḡ)

Where Δ_i is the per‑row effect (continuous derivative or categorical contrast) and g_i is the corresponding per‑row parameter gradient; Σ is the model covariance matrix.

These formulas are used consistently in:

Ungrouped population effects and predictions
Grouped analyses (applied within each subgroup)
Scenario analyses (applied within each scenario × subgroup context)

Examples

using Random
using DataFrames, CategoricalArrays, GLM, Margins

Random.seed!(123)
n = 200
df = DataFrame(
    y = randn(n),
    x = randn(n),
    z = randn(n),
    group = categorical(rand(["A","B"], n)),
    samp_w = rand(0.5:0.1:2.0, n),             # sampling weights
    freq_w = rand([1,2,3,4], n)                 # frequency weights
)

model = lm(@formula(y ~ x + z + group), df)

# 1) Unweighted population AME
ame_unw = population_margins(model, df; type=:effects, vars=[:x, :z])

# 2) Sampling weights via column name
ame_samp = population_margins(model, df; type=:effects, vars=[:x, :z], weights=:samp_w)

# 3) Frequency weights via column name
ame_freq = population_margins(model, df; type=:effects, vars=[:x, :z], weights=:freq_w)

# 4) Explicit weight vector
wvec = Float64.(df.samp_w)
ame_vec = population_margins(model, df; type=:effects, vars=[:x, :z], weights=wvec)

# 5) Grouped weighted analysis
grp_samp = population_margins(model, df; type=:effects, vars=[:x], groups=:group, weights=:samp_w)

# 6) Scenarios with weights (counterfactual z values)
scen_w = population_margins(model, df; type=:effects, vars=[:x], scenarios=(z=[-1.0, 0.0, 1.0]), weights=:samp_w)

All results use weighted averaging with proper normalization by the total weight in each context and delta‑method SEs computed from the averaged gradient and full covariance Σ.

Best Practices

Provide non‑negative weights; zero weights effectively drop observations.
For grouped analyses, ensure the weight column/vector aligns with the original data (the implementation indexes weights by original row indices).
Confirm units/interpretation: sampling vs frequency weights may yield different magnitudes depending on the empirical distribution they imply.
Use stable data types (Float64 for weight vectors) to avoid implicit conversions.

Error Handling

Length mismatch for weights::Vector vs nrow(data) → error.
Invalid weight column name → error.
Using a variable as both a weight and a simultaneous effect variable or grouping key may error if it creates an internal contradiction; prefer distinct columns.