Examples

Comprehensive workflow examples and implementation patterns

Conceptual Overview

Example Organization

This guide demonstrates practical implementation of the two-dimensional analytical framework through concrete examples. Examples progress from basic usage patterns to advanced specification techniques, illustrating both population and profile analysis approaches across diverse econometric applications.

Basic Implementation

Fundamental Usage Pattern

using Random
using Margins, DataFrames, GLM

# Generate sample data
n = 1000
Random.seed!(06515)
df = DataFrame(
    y = randn(n),
    x1 = randn(n), 
    x2 = randn(n),
    group = rand(["A", "B", "C"], n)
)

# Fit model
model = lm(@formula(y ~ x1 + x2 + group), df)

# Population average marginal effects (AME)
ame_result = population_margins(model, df; type=:effects)
DataFrame(ame_result)

# Marginal effects at sample means (MEM)  
mem_result = profile_margins(model, df, means_grid(df); type=:effects)
DataFrame(mem_result)

Advanced Implementation Patterns

Profile Specification Methods

Margins.jl provides multiple approaches to specify evaluation profiles for profile_margins(), each optimized for different analytical requirements.

1. Table-Based Reference Grid (Maximum Control)

For exact control over evaluation points, pass a DataFrame directly:

using DataFrames

# Custom reference grid
reference_grid = DataFrame(
    x1 = [-1.0, 0.0, 1.0],
    x2 = [10, 15, 20],
    group = ["A", "B", "A"]  # Row order preserved as specified
)

# Predictions at specific points
predictions = profile_margins(model, df, reference_grid; type=:predictions)

# Effects at specific points  
effects = profile_margins(model, df, reference_grid; type=:effects, vars=[:x1, :x2])

2. Cartesian Product Specification

For systematic scenario construction, use grid builders to specify value combinations:

# Systematic scenario grid (Cartesian product)
scenarios = cartesian_grid(x1=[-1.0, 0.0, 1.0], x2=[10, 20], group=["A", "B"])  
# Creates 3×2×2 = 12 evaluation points

# Effects across all scenarios
scenario_effects = profile_margins(model, df, scenarios; type=:effects, vars=[:x1])
DataFrame(scenario_effects)

3. At Sample Means (Most Common)

For representative case analysis:

# Effects at sample means - most interpretable approach
means_effects = profile_margins(model, df, means_grid(df); type=:effects)

# Predictions at sample means
means_predictions = profile_margins(model, df, means_grid(df); type=:predictions)

4. Explicit Profile Tables

For irregular or custom evaluation points, pass an explicit DataFrame:

custom_profiles = DataFrame(
    x1 = [-1.0, 0.0, 1.0],
    x2 = [10, 15, 20],
    group = ["A", "B", "A"]
)

results = profile_margins(model, df, custom_profiles; type=:effects)

5. Categorical Mixtures for Policy Analysis

For realistic population scenarios using categorical mixtures:

using CategoricalArrays

# Realistic policy scenarios with population composition
mixture_grid = DataFrame(group=[mix("A" => 0.5, "B" => 0.3, "C" => 0.2)])
policy_scenario = profile_margins(model, df, mixture_grid; type=:predictions)

# Multiple policy scenarios
policy_grid = DataFrame(
    x1 = [0, 1],  # Policy intervention levels
    group = [mix("A" => 0.6, "B" => 0.4), mix("A" => 0.6, "B" => 0.4)]
)
policy_effects = profile_margins(model, df, policy_grid; type=:effects)

Economic Analysis Workflow

Wage Determination Analysis

Complete econometric workflow using human capital theory:

using GLM, CategoricalArrays, Random

# Generate realistic econometric dataset
Random.seed!(06515)
n = 2000

data = DataFrame(
    # Demographics
    age = rand(25:65, n),
    female = rand([0, 1], n),
    education = categorical(rand(["HS", "College", "Graduate"], n)),
    
    # Economic variables
    experience = rand(0:40, n),
    urban = rand([0, 1], n),
    unemployment_rate = rand(3.0:0.1:12.0, n)
)

# Generate realistic log wages
education_effects = Dict("HS" => 0.0, "College" => 0.4, "Graduate" => 0.8)
edu_numeric = [education_effects[string(edu)] for edu in data.education]

data.log_wage = 1.5 .+ 
                0.05 .* data.age .+ 
                edu_numeric .+ 
                0.02 .* data.experience .- 
                0.15 .* data.female .+ 
                0.10 .* data.urban .- 
                0.03 .* data.unemployment_rate .+ 
                0.3 .* randn(n)

# Fit wage equation
wage_model = lm(@formula(log_wage ~ age + education + experience + 
                        female + urban + unemployment_rate), data)

Population Analysis

# Population average marginal effects
ame_results = population_margins(wage_model, data; type=:effects)
println("Population Average Marginal Effects:")
println(DataFrame(ame_results))

# Effects by gender subgroups  
gender_effects = population_margins(wage_model, data; 
                                  type=:effects, 
                                  groups=:female)
println("Effects by gender:")
println(DataFrame(gender_effects))

Profile Analysis

# Effects at sample means (representative person)
mem_results = profile_margins(wage_model, data, means_grid(data); type=:effects)
println("Effects for typical person:")
println(DataFrame(mem_results))

# Policy scenarios: education and unemployment effects
policy_grid = cartesian_grid(education=["HS", "College", "Graduate"],
                             unemployment_rate=[3.0, 6.0, 9.0])
policy_analysis = profile_margins(wage_model, data, policy_grid; type=:predictions)
println("Policy scenario predictions:")
println(DataFrame(policy_analysis))

Logistic Regression Example

Binary outcome analysis with proper probability interpretation:

# Generate binary outcome data
data.manager = [rand() < (1/(1+exp(-(-1.0 + 0.03*age + 0.5*edu + 0.02*exp - 0.3*fem)))) ? 1 : 0 
                for (age,edu,exp,fem) in zip(data.age, edu_numeric, data.experience, data.female)]

# Fit logistic model
logit_model = glm(@formula(manager ~ age + education + experience + female), 
                  data, Binomial(), LogitLink())

# Effects on probability scale (most interpretable)
prob_effects = population_margins(logit_model, data; 
                                type=:effects, 
                                scale=:response)
println("Effects on probability of management position:")
println(DataFrame(prob_effects))

# Gender gap analysis across education levels
gender_grid = cartesian_grid(education=["HS", "College", "Graduate"], female=[0, 1])
gender_gap = profile_margins(logit_model, data, gender_grid;
    type=:predictions, scale=:response)
println("Gender gap in management probability by education:")
println(DataFrame(gender_gap))

Elasticity Analysis

Basic Elasticity Computation

# Population average elasticities
elasticities = population_margins(wage_model, data; 
                                type=:effects, 
                                measure=:elasticity,
                                vars=[:age, :experience])
println("Population average elasticities:")
println(DataFrame(elasticities))

# Elasticities at different education levels
edu_grid = cartesian_grid(education=["HS", "College", "Graduate"]) 
edu_elasticities = profile_margins(wage_model, data, edu_grid;
    type=:effects, measure=:elasticity, vars=[:age, :experience])
println("Elasticities by education level:")
println(DataFrame(edu_elasticities))

Semi-Elasticities

# Semi-elasticity: % change in wages per unit change in unemployment
unemployment_semi = population_margins(wage_model, data;
                                     measure=:semielasticity_dyex,
                                     vars=[:unemployment_rate])
println("Unemployment semi-elasticity (% wage change per point):")
println(DataFrame(unemployment_semi))

Advanced Features

Advanced Grouping and Stratification

# Basic categorical grouping
urban_analysis = population_margins(wage_model, data; 
                                  type=:effects, 
                                  groups=:urban)

# Cross-tabulated grouping
education_urban = population_margins(wage_model, data; 
                                   type=:effects, 
                                   groups=[:education, :urban])

# Hierarchical grouping: education → urban within each education level
nested_analysis = population_margins(wage_model, data;
                                   type=:effects,
                                   groups=:education => :urban)

# Continuous binning: age quartiles
age_quartiles = population_margins(wage_model, data;
                                 type=:effects,
                                 groups=(:age, 4))

# Custom thresholds for policy analysis
income_thresholds = population_margins(wage_model, data;
                                     type=:effects,
                                     groups=(:log_wage, [2.0, 2.5, 3.0]))

# Mixed categorical and continuous
complex_groups = population_margins(wage_model, data;
                                  type=:effects,
                                  groups=[:education, (:age, 4)])

Counterfactual Scenario Analysis

Skip Rule Example: dydx(x) across x strata using a derived bin variable

using Statistics
using CategoricalArrays

# Suppose we want dydx(age) across age strata without holding age fixed or using it as the grouping key directly.
# Create an "age_bin" column (quartiles), then group by that derived column:
edges = quantile(data.age, 0:0.25:1.0)
labels = ["Q1", "Q2", "Q3", "Q4"]
data.age_bin = cut(data.age, edges; labels=labels, extend=true)

age_effects_by_bin = population_margins(wage_model, data;
    type=:effects,
    vars=[:age],
    groups=:age_bin)

DataFrame(age_effects_by_bin)
# Policy scenarios: unemployment rate effects
recession_scenarios = population_margins(wage_model, data;
                                       type=:effects,
                                       scenarios=(:unemployment_rate => [3.0, 6.0, 12.0]))

# Combined grouping and scenarios
education_recession = population_margins(wage_model, data;
                                       type=:effects,
                                       groups=:education,
                                       scenarios=(:unemployment_rate => [3.0, 12.0]))

# Multi-variable scenarios
complex_policy = population_margins(wage_model, data;
                                  type=:effects,
                                  scenarios=(:urban => [0, 1], 
                                               :unemployment_rate => [3.0, 9.0]))

Robust Standard Errors

using CovarianceMatrices

# Heteroskedasticity-robust standard errors (HC1)
robust_effects = population_margins(wage_model, data; vcov=HC1(), type=:effects)
println("Robust standard errors:")
println(DataFrame(robust_effects))

Performance Comparison

using BenchmarkTools

# Profile margins: O(1) constant time
println("Profile margins performance (constant time):")
@btime profile_margins($wage_model, $data, means_grid($data); type=:effects)

# Population margins: O(n) scaling  
println("Population margins performance (scales with n):")
@btime population_margins($wage_model, $data; type=:effects)

# Complex scenario analysis (still O(1) for profiles)
complex_scenarios = cartesian_grid(age=[25, 35, 45, 55],
                                   education=["HS", "College", "Graduate"],
                                   urban=[0, 1])
println("Complex scenario performance (24 profiles, still O(1)):")
@btime profile_margins($wage_model, $data, $complex_scenarios; type=:effects)

Stata Migration Examples

Direct equivalency for economists familiar with Stata:

# Stata: margins, dydx(*)
stata_ame = population_margins(wage_model, data; type=:effects)

# Stata: margins, at(means) dydx(*)  
stata_mem = profile_margins(wage_model, data, means_grid(data); type=:effects)

# Stata: margins, at(age=(25 35 45) education=(1 2 3))
stata_grid = cartesian_grid(age=[25, 35, 45], education=["HS", "College", "Graduate"]) 
stata_scenarios = profile_margins(wage_model, data, stata_grid; type=:effects)

# Stata: margins, over(female)
stata_subgroups = population_margins(wage_model, data; 
                                   type=:effects, 
                                   groups=:female)

MixedModels.jl Examples

Minimal linear and generalized linear mixed models with population analysis.

# Illustrative example (not executed in docs CI): MixedModels integration
using Random
using DataFrames, CategoricalArrays, MixedModels, StatsModels, Margins

# Synthetic random-intercept dataset
Random.seed!(42)
n_groups = 20; n_per = 30; n = n_groups * n_per
group = repeat(1:n_groups, inner=n_per)
x = randn(n)
u = randn(n_groups)  # random intercepts
y = 1.0 .+ 0.5 .* x .+ u[group] .+ 0.2 .* randn(n)
df = DataFrame(y=y, x=x, group=categorical(string.(group)))

# Linear mixed model
lmm = fit(MixedModel, @formula(y ~ 1 + x + (1 | group)), df)

# Population AME for x (averaged across sample distribution)
ame_lmm = population_margins(lmm, df; type=:effects, vars=[:x])

# Generalized linear mixed model (binary outcome)
η = -0.5 .+ 1.2 .* x .+ u[group]
p = 1.0 ./ (1 .+ exp.(-η))
ybin = rand.(Bernoulli.(p))
df_bin = DataFrame(y=ybin, x=x, group=df.group)

glmm = GeneralizedLinearMixedModel(@formula(y ~ 1 + x + (1 | group)), df_bin, Binomial()) |> fit!

# Probability-scale effects
prob_effects_glmm = population_margins(glmm, df_bin; type=:effects, vars=[:x], scale=:response)

Best Practices

When to Use Population vs Profile

Choose Population Analysis When:

  • Estimating true average effects across your sample
  • Sample heterogeneity is important for policy
  • External validity to similar populations is the goal
  • Broad policy applications affecting diverse groups

Choose Profile Analysis When:

  • Understanding specific, concrete scenarios
  • Communicating results to non-technical audiences
  • Sample is relatively homogeneous
  • Policy targets specific demographic profiles

Performance Guidelines

# For large datasets (>100k observations)
# Profile margins remain fast regardless of size
large_data_profiles = profile_margins(model, large_data, means_grid(large_data); type=:effects)

# Population margins scale linearly - use selectively for very large data
key_population_effects = population_margins(model, large_data; 
                                          vars=[:key_variable], 
                                          type=:effects)

Error Handling Best Practices

Important: Margins.jl follows an error-first philosophy - when statistical correctness cannot be guaranteed, the package errors explicitly rather than producing potentially invalid results. This ensures users are aware of problems rather than receiving plausible-but-wrong statistical output.

# GOOD: Let errors propagate to inform users of issues
function analyze_margins(model, data, vars)
    # Errors will propagate with clear messages
    result = population_margins(model, data; type=:effects, vars=vars)
    return DataFrame(result)
end

# GOOD: Validate inputs before computation
function safe_margins_analysis(model, data, vars)
    # Check that variables exist in data
    data_vars = Set(Symbol.(names(data)))
    missing_vars = setdiff(Set(vars), data_vars)

    if !isempty(missing_vars)
        error("Variables not found in data: $(collect(missing_vars))")
    end

    # Let computation errors propagate naturally
    return population_margins(model, data; type=:effects, vars=vars)
end

# BAD: Silent fallbacks violate error-first philosophy
# function bad_margins_analysis(model, data)
#     try
#         return population_margins(model, data; backend=:ad)
#     catch e
#         @warn "AD failed, using FD"  # User doesn't know why AD failed!
#         return population_margins(model, data; backend=:fd)  # May produce different results!
#     end
# end
#
# Why this is bad:
# 1. Silently switches backends without user awareness
# 2. May hide underlying data quality issues
# 3. Results from AD vs FD may differ slightly
# 4. Violates principle: "Error out rather than approximate"

Guideline: If you encounter errors during marginal effects computation, investigate and fix the root cause rather than implementing silent fallbacks. Common issues include:

  • Domain errors (use variables that stay positive for log/sqrt)
  • Missing variables (validate inputs before computation)
  • Model convergence issues (check model fit quality)
  • Data quality problems (check for NaN/Inf values)

These examples demonstrate the full range of Margins.jl capabilities. For detailed API documentation, see API Reference. For performance optimization, see Performance Guide.