Population Grouping Framework
Comprehensive hierarchical analysis for stratified marginal effects
Conceptual Foundation
Margins.jl implements a population-based grouping framework that computes average marginal effects (AME) and average adjusted predictions (AAP) within stratified subgroups of the observed data.
Core Design Principles
Population-Based Analysis
All operations maintain population averaging semantics - computing effects by averaging across actual or modified populations, not evaluating at synthetic representative points.
Orthogonal Parameters
Three independent dimensions combine multiplicatively:
vars: Which variables to compute marginal effects forgroups: How to stratify the analysis (data structure)scenarios: What counterfactual scenarios to consider (data modification)
Single Fundamental Operation
All grouping reduces to: stratify data into subgroups, compute population margins within each subgroup.
Basic Grouping Patterns
Simple Categorical Grouping
Compute effects separately within each category of a grouping variable:
using Margins, DataFrames, GLM
# Effects by education level
education_effects = population_margins(model, data;
type=:effects,
groups=:education)
# Results: separate effects for each education category
DataFrame(education_effects)Cross-Tabulated Grouping
Analyze effects across combinations of multiple categorical variables:
# Effects by education × gender combinations
demographic_effects = population_margins(model, data;
type=:effects,
groups=[:education, :gender])
# Results: effects for (HS,Male), (HS,Female), (College,Male), (College,Female), etc.Advanced Hierarchical Grouping
Nested Grouping with => Operator
The => operator creates hierarchical nesting where the right side is computed within each level of the left side:
# Region first, then education within each region
nested_effects = population_margins(model, data;
type=:effects,
groups=:region => :education)
# Results: (North,HS), (North,College), (South,HS), (South,College)Deep Hierarchical Nesting
Multiple levels of nesting support complex organizational structures:
# Three-level hierarchy: country → region → education
deep_hierarchy = population_margins(model, data;
type=:effects,
groups=:country => (:region => :education))
# Four-level hierarchy: sector → company → department → position
organizational = population_margins(model, data;
type=:effects,
groups=:sector => (:company => (:department => :position)))Parallel Grouping Within Hierarchy
Complex patterns combining hierarchical and cross-tabulated structures:
# Region first, then education×gender cross-tab within each region
parallel_nested = population_margins(model, data;
type=:effects,
groups=(:region => [:education, :gender]))
# Region first, then separate analyses for education levels AND income quartiles
mixed_parallel = population_margins(model, data;
type=:effects,
groups=(:region => [:education, (:income, 4)]))Continuous Variable Binning
Quantile-Based Binning
Automatic binning using quantiles with professional statistical terminology:
# Quartile analysis (Q1, Q2, Q3, Q4)
income_quartiles = population_margins(model, data;
type=:effects,
groups=(:income, 4))
# Tertile analysis (T1, T2, T3)
score_tertiles = population_margins(model, data;
type=:effects,
groups=(:test_score, 3))
# Quintile analysis (P1, P2, P3, P4, P5)
wealth_quintiles = population_margins(model, data;
type=:effects,
groups=(:wealth, 5))Custom Threshold Binning
Policy-relevant thresholds using mathematical interval notation:
# Income brackets for tax policy analysis
tax_brackets = population_margins(model, data;
type=:effects,
groups=(:income, [25000, 50000, 75000]))
# Results: ["< 25000", "[25000, 50000)", "[50000, 75000)", ">= 75000"]
# Poverty line analysis
poverty_analysis = population_margins(model, data;
type=:effects,
groups=(:income, [federal_poverty_line]))
# Results: ["< 12880", ">= 12880"] (using 2023 federal poverty guideline)Mixed Categorical and Continuous Grouping
Combine categorical variables with binned continuous variables:
# Education levels × income quartiles
education_income = population_margins(model, data;
type=:effects,
groups=[:education, (:income, 4)])
# Results: (HS,Q1), (HS,Q2), (HS,Q3), (HS,Q4), (College,Q1), etc.
# Geographic region × age quintiles × gender
complex_demographics = population_margins(model, data;
type=:effects,
groups=[:region, (:age, 5), :gender])Counterfactual Scenario Analysis
See Population Scenarios for detailed semantics and implementation notes on scenarios in population analysis.
Policy Scenario Framework
The scenarios parameter modifies variable values for the entire population, creating counterfactual analyses:
# Binary treatment analysis
treatment_effects = population_margins(model, data;
type=:effects,
scenarios=(:treatment = [0, 1]))
# Multi-level policy scenarios
policy_scenarios = population_margins(model, data;
type=:effects,
scenarios=(:policy_level = ["none", "moderate", "aggressive"]))Multi-Variable Scenarios
Cartesian product expansion for complex policy analysis:
# Treatment × policy combinations
comprehensive_policy = population_margins(model, data;
type=:effects,
scenarios=(:treatment = [0, 1],
:policy = ["current", "reform"]))
# Results: 4 scenarios (2×2 combinations)
# Three-dimensional policy space
complex_scenarios = population_margins(model, data;
type=:effects,
scenarios=(:treatment = [0, 1],
:funding = [0.8, 1.0, 1.2],
:regulation = ["light", "standard", "strict"]))
# Results: 18 scenarios (2×3×3 combinations)Combined Groups and Scenarios
Comprehensive Policy Analysis
Groups and scenarios combine multiplicatively for complete analytical coverage:
# Demographics × policy scenarios
full_analysis = population_margins(model, data;
type=:effects,
groups=[:education, :region],
scenarios=(:treatment = [0, 1]))
# Results: Each education×region combination under both treatment scenariosAdvanced Applications
# Healthcare policy evaluation
healthcare_comprehensive = population_margins(health_model, health_data;
type=:effects,
groups=(:state => (:urban_rural => [:insurance_type, (:income, 3)])),
scenarios=(:aca_expansion = [0, 1], :medicaid_funding = [0.8, 1.2])
)
# Results: State × Urban/Rural × (Insurance×Income-Tertiles) × ACA×Medicaid scenarios
# Total combinations: 4 states × 2 urban/rural × 12 insurance×income × 4 policy scenarios = 384 resultsImportant: Skip Rule for Statistical Validity
Critical Constraint: For population analysis, computing the effect of a variable while simultaneously holding it fixed (via scenarios) or using it to define subgroups (via groups) is contradictory and statistically meaningless.
The Skip Rule
To preserve statistical correctness and interpretability, population_margins() automatically skips variables that appear in vars if they also appear in groups or scenarios.
# Example: x appears in both vars and scenarios
result = population_margins(model, data;
type=:effects,
vars=[:x, :z], # Request effects for x and z
scenarios=(:x = [0, 1]) # But fix x at specific values
)
# Result: Only z effect is computed. x is skipped because it's in scenarios.
# The package silently handles this to avoid statistical errors.Why This Rule Exists
Conceptual Problem:
- Marginal effect asks: "What happens when x changes naturally?"
- Scenario/Group says: "Hold x fixed at specific values" or "Stratify by x levels"
- These two concepts are mutually exclusive
Examples of Invalid Requests:
# INVALID: "What's the effect of income while holding income fixed?"
population_margins(model, data;
vars=[:income], # Effect of income changing
scenarios=(:income = [30000, 50000]) # But income is fixed
)
# → income is skipped from vars
# INVALID: "What's the effect of education within education groups?"
population_margins(model, data;
vars=[:education], # Effect of education changing
groups=:education # But stratified by education levels
)
# → education is skipped from varsPractical Alternatives
Alternative 1: Profile Analysis (for Stata users)
If you want Stata-style dydx(x) over(x) (derivative of x at different values of x), use profile analysis:
# Instead of: population_margins(model, data; vars=[:x], groups=:x) # INVALID
# Use profile margins:
result = profile_margins(model, data, cartesian_grid(x=[10, 20, 30, 40]);
type=:effects,
vars=[:x]
)
# Computes marginal effect of x AT each specific value of xAlternative 2: Effects Within Strata
If you want effects within strata of x, group by a different variable or compute effects of other variables:
# GOOD: Effects of z within education groups
result = population_margins(model, data;
type=:effects,
vars=[:z], # Effect of z (not education)
groups=:education # Stratified by education
)
# GOOD: Effects within income quintiles
result = population_margins(model, data;
type=:effects,
vars=[:treatment], # Effect of treatment (not income)
groups=(:income, 5) # Within income quintiles
)Alternative 3: Counterfactual Predictions
If you want to see how outcomes change as x varies, use predictions with scenarios:
# Instead of: population_margins(model, data; vars=[:x], scenarios=(:x = [...]))
# Use predictions:
result = population_margins(model, data;
type=:predictions, # Not effects!
scenarios=(:x = [10, 20, 30, 40])
)
# Shows predicted outcomes at each value of xUser Notification
Current Behavior: The skip rule operates silently - variables are removed from computation without warning.
How to Check: Compare requested vars against result:
result = population_margins(model, data;
vars=[:x, :z],
scenarios=(:x = [0, 1])
)
df = DataFrame(result)
unique(df.variable) # Will only show "z" (x was skipped)Performance
Computational Complexity
Population grouping maintains efficient O(n) scaling within each subgroup:
using BenchmarkTools
# Simple grouping: O(n/k) per group for k groups
@btime population_margins($model, $data; groups=:education)
# Complex hierarchical grouping: O(n/k) per final subgroup
@btime population_margins($model, $data; groups=(:region => (:education => :gender)))
# With scenarios: same O(n/k) complexity repeated for each scenario
@btime population_margins($model, $data; groups=:education, scenarios=(:treatment = [0, 1]))Memory Efficiency
The grouping framework avoids data duplication through efficient indexing:
- Subgroup filtering: Uses DataFrame indexing, not data copying
- Scenario modification: Temporary overrides without permanent data changes
- Result aggregation: Minimal memory footprint for result compilation
Large Dataset Considerations
# For datasets >100k observations with many groups
# Consider selective analysis of key variables
key_analysis = population_margins(model, large_data;
type=:effects,
vars=[:primary_outcome], # Limit variables
groups=(:income, 4)) # Manageable grouping
# Complex patterns still feasible for large n
complex_large = population_margins(model, large_data;
type=:effects,
groups=(:region => [:education, (:income, 4)]))Best Practices
When to Use Different Grouping Patterns
Simple Grouping (groups=:var):
- Single dimension analysis
- Clear categorical divisions
- Straightforward interpretation needs
Cross-Tabulation (groups=[:var1, :var2]):
- Interaction effects important
- Policy targets multiple demographics simultaneously
- Comprehensive coverage needed
Hierarchical Grouping (groups=:var1 => :var2):
- Natural organizational structure exists
- Context matters (e.g., regions have different education systems)
- Nested decision-making processes
Continuous Binning (groups=(:var, n)):
- Policy-relevant thresholds exist
- Distribution-based analysis needed
- Quantile-based interpretation valuable
Avoiding Common Pitfalls
Combination Explosion
# Dangerous: could create 1000s of combinations
# groups=[:var1, :var2, :var3, (:var4, 10), (:var5, 5)]
# Better: use hierarchical structure
groups=:var1 => [:var2, (:var4, 4)]Empty Subgroups
# The framework automatically detects and errors on empty subgroups
# to maintain statistical validitySkip Rule Reference
See the dedicated section "Important: Skip Rule for Statistical Validity" above for complete documentation on how population_margins() handles variables that appear in both vars and groups/scenarios.
Interpretation Complexity
# For presentation, consider simpler patterns:
presentation_analysis = population_margins(model, data;
groups=:education,
scenarios=(:policy = [0, 1]))
# For comprehensive analysis, use full complexity:
research_analysis = population_margins(model, data;
groups=(:region => [:education, (:income, 4)]),
scenarios=(:policy = [0, 1], :funding = [0.8, 1.2]))The population grouping framework enables sophisticated econometric analysis while maintaining computational efficiency and statistical rigor. For related details on scenarios and reference grids, see Reference Grids and for performance optimization, see Performance Guide.