Advanced Features

Elasticities, robust standard errors, and specialized analysis techniques

Elasticities and Semi-Elasticities

Margins.jl provides comprehensive elasticity support through the measure parameter, following the same Population vs Profile framework as standard marginal effects. For conceptual background on the 2×2 framework, see Mathematical Foundation.

Types of Elasticity Measures

Standard Elasticity

Definition: Percent change in Y per percent change in X Formula: (∂Y/∂X) × (X/Y) Interpretation: "A 1% increase in X leads to an ε% change in Y"

# Population average elasticities
population_margins(model, data; type=:effects, measure=:elasticity)

# Elasticities at sample means
profile_margins(model, data, means_grid(data); type=:effects, measure=:elasticity)

Semi-Elasticity with respect to X

Definition: Percent change in Y per unit change in X Formula: (∂Y/∂X) × (1/Y) Interpretation: "A 1-unit increase in X leads to a (100×ε)% change in Y"

# Population average semi-elasticities (X)
population_margins(model, data; type=:effects, measure=:semielasticity_dyex)

# Semi-elasticities at specific scenarios
profile_margins(model, data, cartesian_grid(x1=[0,1,2]); type=:effects, measure=:semielasticity_dyex)

Semi-Elasticity with respect to Y

Definition: Unit change in Y per percent change in X Formula: (∂Y/∂X) × X Interpretation: "A 1% increase in X leads to an ε-unit change in Y"

# Population average semi-elasticities (Y)
population_margins(model, data; type=:effects, measure=:semielasticity_eydx)

Elasticity Framework Application

Elasticities follow the same Population vs Profile distinction as marginal effects (see Mathematical Foundation for detailed framework explanation):

MeasurePopulation ApproachProfile Approach
ElasticityAverage of (∂Y/∂X) × (Xᵢ/Yᵢ) across sample(∂Y/∂X) × (X̄/Ȳ) at representative values
Semi-elasticity (X)Average of (∂Y/∂X) × (1/Yᵢ) across sample(∂Y/∂X) × (1/Ȳ) at representative values
Semi-elasticity (Y)Average of (∂Y/∂X) × Xᵢ across sample(∂Y/∂X) × X̄ at representative values

Practical Example: Wage Elasticities

using Margins, DataFrames, GLM

# Economic data: wages, education, experience
df = DataFrame(
    log_wage = randn(1000) .+ 2.5, # Log wages
    education = rand(12:20, 1000), # Years of education  
    experience = rand(0:30, 1000), # Years of experience
    age = rand(25:55, 1000)
)

model = lm(@formula(log_wage ~ education + experience + age), df)

# Education elasticity of wages (population average)
edu_elasticity = population_margins(model, df; 
                                   vars=[:education], 
                                   measure=:elasticity)
println("Population average education elasticity: ", DataFrame(edu_elasticity))

# Education elasticity at different experience levels (profile analysis)  
exp_scenarios = profile_margins(
    model, df,
    cartesian_grid(experience=[0, 10, 20, 30]);
    vars = [:education],
    measure = :elasticity
)
println("Education elasticity by experience level:")
println(DataFrame(exp_scenarios))

# Semi-elasticity: percent wage change per year of education
edu_semielast = population_margins(
    model, df;
    vars = [:education],
    measure = :semielasticity_dyex
)
println("Education semi-elasticity: ", DataFrame(edu_semielast))

When Profile ≠ Population for Elasticities

In GLMs with non-identity links, population and profile elasticities can differ substantially:

# Logistic model example
logit_model = glm(@formula(employed ~ education + experience), df, Binomial(), LogitLink())

# Population average employment elasticity w.r.t. education
pop_elastic = population_margins(logit_model, df; vars=[:education], measure=:elasticity)

# Employment elasticity at sample means
prof_elastic = profile_margins(logit_model, df, means_grid(df); vars=[:education], measure=:elasticity)

# These will differ because logistic function is nonlinear
println("Population elasticity: ", DataFrame(pop_elastic).estimate[1])
println("Profile elasticity: ", DataFrame(prof_elastic).estimate[1])

Robust Standard Errors

Margins.jl integrates CovarianceMatrices.jl for robust standard error computation.

Basic Robust Standard Errors

Heteroskedasticity-Robust (White/Huber-White)

using CovarianceMatrices

# Apply robust covariance via vcov parameter
robust_effects = population_margins(
    model, data; vcov=HC1(), type=:effects
)

Available Robust Estimators

# Different heteroskedasticity-robust variants
HC0()  # Basic White estimator
HC1()  # Degrees-of-freedom adjusted (most common)
HC2()  # Leverage-adjusted  
HC3()  # Jackknife-type
HC4()  # High-leverage robust
HC5()  # Outlier-robust

# Example with HC3
result = population_margins(model, data; vcov=HC3())

Clustered Standard Errors

Single-Level Clustering

# Cluster by firm ID
clustered_effects = population_margins(model, data;
    vcov=Clustered(:firm_id), type=:effects)

Multi-Level Clustering

# Two-way clustering (firm and year)
result = population_margins(model, data; vcov=Clustered([:firm_id, :year]))

HAC (Heteroskedasticity and Autocorrelation Consistent) Standard Errors

# Newey-West HAC estimator
effects_hac = population_margins(model, data;
    vcov=HAC(Bartlett()), type=:effects)

Custom Covariance Providers

Function-Based Covariance

# Custom covariance function (must return an AbstractMatrix)
function my_robust_vcov(model)
    # ... compute covariance from model ...
    return Σ::AbstractMatrix
end

# Use custom function directly
result = population_margins(model, data; vcov=my_robust_vcov)

Robust Standard Errors with Elasticities

Robust standard errors work seamlessly with all elasticity measures:

# Robust elasticity estimates
robust_elasticities = population_margins(model, data;
    vcov=HC1(),
    measure=:elasticity, type=:effects)

# Profile elasticities with clustered SEs
profile_elasticities = profile_margins(model, data,
    means_grid(data); vcov = Clustered(:cluster_var),
    measure = :elasticity)

Standard Errors for Elasticities: Delta Method vs Bootstrap

Conditional vs Unconditional Inference

Standard errors for elasticity measures computed via the delta method (default) represent conditional inference; they assume the observed data (X, Y) are fixed and only account for uncertainty in parameter estimates β̂. Bootstrap standard errors represent unconditional inference and may be larger because they also account for sampling variation in the data.

Understanding the Difference

For elasticity measures like ε = (x̄/ȳ) × (∂y/∂x), the transformation involves sample moments (x̄, ȳ) that are treated differently by different inference methods:

Delta Method (Default):

  • Assumption: Observed data X and Y are fixed constants
  • Uncertainty source: Only parameter estimates β̂
  • Variance: Var[ε(β̂) | X, Y]
  • Advantages: Fast, analytically exact (given the conditional assumption)
  • Implementation: Uses the quotient rule to account for mean(ŷ) depending on β

Bootstrap:

  • Assumption: Data are sampled from a population
  • Uncertainty sources: Both β̂ and the sample moments x̄, ȳ
  • Variance: Var[ε(β̂, x̄, ȳ)]
  • Advantages: Accounts for full sampling variation
  • Trade-off: Computationally intensive (requires refitting model many times)

Why They Differ

The key distinction is that elasticity formulas involve ratios of sample statistics:

# For elasticity: ε = (x̄ / mean(ŷ)) × AME
# Where:
#   x̄ = sample mean of predictor (varies across bootstrap samples)
#   mean(ŷ) = sample mean of predictions (varies with both β and resampled data)
#   AME = average marginal effect (varies with β)

When you bootstrap:

  1. Different observations are sampled → x̄ changes
  2. Model is refit → β̂ changes → AME changes
  3. Predictions are recomputed → mean(ŷ) changes

The delta method only captures variation from (2), treating (1) and parts of (3) as fixed.

This is Not a Bug

This behavior matches other statistical software:

  • R's marginaleffects: Documentation states "For nonlinear models, the delta method is only an approximation" and recommends bootstrap for transformations
  • Stata's margins: Documents that delta method "assumes that the values at which the covariates are evaluated are fixed"

Example: Comparing Methods

using Margins, GLM, DataFrames, Bootstrap

# Fit model
model = lm(@formula(y ~ x + z), data)

# Delta method SEs (default - fast, conditional)
delta_result = population_margins(model, data;
    vars=[:x],
    measure=:elasticity)
println("Delta method SE: ", DataFrame(delta_result).se[1])

# Bootstrap SEs (slower, unconditional)
# Note: Built-in bootstrap support coming soon
# For now, use manual bootstrap:
function boot_elasticity(data, indices)
    boot_data = data[indices, :]
    boot_model = lm(@formula(y ~ x + z), boot_data)
    boot_result = population_margins(boot_model, boot_data;
        vars=[:x], measure=:elasticity)
    return DataFrame(boot_result).estimate[1]
end

bs = bootstrap(boot_elasticity, data, BasicSampling(1000))
boot_se = std(bs.t[1])
println("Bootstrap SE: ", boot_se)

# Bootstrap SE will typically be larger, especially in small samples

Technical Details

Margins.jl implements the full quotient rule for elasticity derivatives:

\[\frac{\partial \varepsilon}{\partial \beta} = \frac{\bar{x}}{\bar{y}} \frac{\partial \text{AME}}{\partial \beta} - \frac{\varepsilon}{\bar{y}} \frac{\partial \bar{y}}{\partial \beta}\]

Here, the second term accounts for mean(ŷ) depending on β.

References
  • Krinsky, I., & Robb, A. L. (1986). "On Approximating the Statistical Properties of Elasticities." Review of Economics and Statistics, 68(4), 715-719.
  • Arel-Bundock, V. (2023). "marginaleffects: Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests." R package. Documentation on inference
  • Greene, W. H. (2018). Econometric Analysis (8th ed.), Section 3.6 on the Delta Method.

Standardized Predictors

Margins.jl seamlessly integrates with StandardizedPredictors.jl for models fit with standardized, centered, or scaled variables. All three transformations are supported: ZScore(), Center(), and Scale(). Marginal effects are automatically reported on the original (raw) scale, requiring no manual back-transformation.

Why Standardize, Center, or Scale Predictors?

StandardizedPredictors.jl provides three transformations:

# ZScore: mean=0, std=1 — full standardization
x_zscore = (x - mean(x)) / std(x)

# Center: mean=0, preserves original scale — useful for interpretable intercepts
x_centered = x - mean(x)

# Scale: std=1, preserves original location — useful for comparable coefficient magnitudes
x_scaled = x / std(x)

Integration with Margins.jl

using Margins, GLM, StandardizedPredictors, DataFrames

# Fit model with mixed transformations
df = DataFrame(
    sales = randn(1000) .* 10000 .+ 50000,
    income = randn(1000) .* 20000 .+ 50000,  # mean ≈ \$50k, std ≈ \$20k
    age = randn(1000) .* 10 .+ 40,
    experience = rand(0:30, 1000)
)

model = lm(@formula(sales ~ income + age + experience), df,
           contrasts = Dict(
               :income => ZScore(),      # Full standardization
               :age => Center(),         # Center only
               :experience => Scale()    # Scale only
           ))

# Marginal effects are automatically on ORIGINAL scale regardless of transformation
result = population_margins(model, df; type=:effects, vars=[:income, :age, :experience])
DataFrame(result)

# income effect: change in sales per \$1 increase in income (not per SD!)
# age effect: change in sales per 1-year increase in age (not per SD!)
# experience effect: change in sales per 1-year increase in experience

How Automatic Back-Transformation Works

When computing marginal effects, Margins.jl uses FormulaCompiler.jl's derivative system, which automatically applies the chain rule through the standardization transformation:

Mathematical detail for each transformation:

ZScore (x_std = (x - μ) / σ):

  • Derivative: ∂η/∂x_raw = β × (1/σ) — the 1/σ factor converts from standardized to raw scale

Center (x_ctr = x - μ, i.e., σ = 1):

  • Derivative: ∂η/∂x_raw = β × (1/1) = β — centering does not change the derivative

Scale (x_scl = x / σ, i.e., μ = 0):

  • Derivative: ∂η/∂x_raw = β × (1/σ) — same scaling factor as ZScore

Both finite differences (FD) and automatic differentiation (AD) backends handle all transformations automatically:

  • FD: Perturbs raw values → transformation applied during evaluation → chain rule automatic
  • AD: Dual arithmetic propagates through the transformation → chain rule automatic

Comparison: Raw vs Transformed Models

# Fit raw, standardized, centered, and scaled models
model_raw = lm(@formula(sales ~ income), df)
model_std = lm(@formula(sales ~ income), df, contrasts=Dict(:income => ZScore()))
model_ctr = lm(@formula(sales ~ income), df, contrasts=Dict(:income => Center()))
model_scl = lm(@formula(sales ~ income), df, contrasts=Dict(:income => Scale()))

# Marginal effects are IDENTICAL across all four (all on original scale)
me_raw = population_margins(model_raw, df; vars=[:income])
me_std = population_margins(model_std, df; vars=[:income])
me_ctr = population_margins(model_ctr, df; vars=[:income])
me_scl = population_margins(model_scl, df; vars=[:income])

# All give same result: effect per dollar of income
@assert DataFrame(me_raw).estimate ≈ DataFrame(me_std).estimate
@assert DataFrame(me_raw).estimate ≈ DataFrame(me_ctr).estimate
@assert DataFrame(me_raw).estimate ≈ DataFrame(me_scl).estimate

Why they match:

  • Raw model: ∂sales/∂income = β₁
  • ZScore model: ∂sales/∂income = β₁_std / σ_income (chain rule provides 1/σ)
  • Center model: ∂sales/∂income = β₁_ctr (coefficients identical to raw model)
  • Scale model: ∂sales/∂income = β₁_scl / σ_income (chain rule provides 1/σ)

Note: For linear models, centered models have identical coefficients to raw models (centering only shifts the intercept). ZScore and Scale models have coefficients on the transformed scale, but the chain rule corrects this automatically.

Elasticities with Standardized Predictors

Elasticities are invariant to standardization because they are scale-free:

# Elasticity with standardized predictors
model = lm(@formula(sales ~ income + age), df,
           contrasts = Dict(:income => ZScore()))

# Elasticity uses raw values of X and Y
result = population_margins(model, df; vars=[:income], measure=:elasticity)

# Interpretation: % change in sales per % change in income
# (same whether predictors are standardized or not)

Profile Analysis with Standardized Predictors

Reference grids work directly with raw values:

# Specify scenarios in original units
using Margins: cartesian_grid

grid = cartesian_grid(
    income = [40000, 60000, 80000],  # Raw dollar amounts
    age = [30, 40, 50]                # Raw years
)

result = profile_margins(model, df, grid; type=:effects)

# Effects are per dollar of income, per year of age
# Standardization is handled automatically during evaluation

Technical Notes

  1. Model coefficients (coef(model)) are on the standardized scale

    • β₁ represents effect per SD change in x
  2. Jacobian from FormulaCompiler is on the raw scale

    • Includes 1/σ factor from chain rule automatically
  3. Marginal effects: g = J' × β

    • The 1/σ in J combines with standardized β to give raw-scale effects

This behavior is validated with tests that compare the raw and standardized models, ensuring that both produce identical marginal effects on the original measurement scales.

Categorical Mixtures for Policy Analysis

Margins.jl supports categorical mixtures for scenario analysis, which enables the specification of population compositions as an alternative to (the observed) category levels.

Motivation: Realistic Population Scenarios

Marginal effects often use arbitrary categorical values (e.g., "set all observations to treatment=1"). Categorical mixtures enable the specification of typical values:

using CategoricalArrays, Margins

# Instead of: "All treated" (unrealistic)
unrealistic = profile_margins(
    model, data, cartesian_grid(treatment = [1]); type = :predictions
)

# Use: Realistic treatment rate  
realistic = profile_margins(
    model, data, 
    DataFrame(treatment=[mix(0 => 0.3, 1 => 0.7)])
) # 70% treatment rate

Frequency-Weighted Categorical Defaults

When categorical variables are unspecified in profiles, Margins.jl uses actual sample frequencies rather than arbitrary first levels:

# Data composition:
#     education = 40% HS, 45% College, 15% Graduate
#     region = 60% Urban, 40% Rural

# Effects "at means" uses realistic composition
result = profile_margins(model, data, means_grid(data); type = :effects)
# → Continuous vars: sample means
# → education: mix("HS" => 0.40, "College" => 0.45, "Graduate" => 0.15)  
# → region: mix("Urban" => 0.60, "Rural" => 0.40)

Scenario Analysis

Demographic Transition Scenarios

# Current population composition
current_scenario = profile_margins(model, data,
    DataFrame(education=[mix("HS" => 0.40, "College" => 0.45, "Graduate" => 0.15)]);
    type=:predictions)

# Future scenario: increased college graduation
future_scenario = profile_margins(model, data,
    DataFrame(education=[mix("HS" => 0.25, "College" => 0.60, "Graduate" => 0.15)]);
    type=:predictions)

# Policy impact
impact = DataFrame(future_scenario).estimate[1] - DataFrame(current_scenario).estimate[1]

Treatment Effect Heterogeneity

# Treatment effects across population compositions
treatment_scenarios = DataFrame([
    (treatment = 0, education = mix("HS" => 0.5, "College" => 0.5)),
    (treatment = 1, education = mix("HS" => 0.5, "College" => 0.5)),
    (treatment = 0, education = mix("HS" => 0.2, "College" => 0.8)),  
    (treatment = 1, education = mix("HS" => 0.2, "College" => 0.8))
])

results = profile_margins(
    model, data, treatment_scenarios; type = :predictions
)
treatment_effects_df = DataFrame(results)

Advanced Grouping and Stratification

Margins.jl provides a comprehensive grouping framework for population-based marginal effects analysis, supporting hierarchical stratification patterns that extend far beyond traditional approaches.

Hierarchical Grouping Framework

Basic Grouping Patterns

# Simple categorical grouping
demographic_effects = population_margins(
    model, data;
    type = :effects, vars = [:income], groups = :education
)

# Cross-tabulated grouping
cross_effects = population_margins(
    model, data;
    type = :effects,
    vars = [:income], 
    groups = [:education, :region]
)

Nested Hierarchical Grouping

# Hierarchical nesting: region → education within each region
nested_effects = population_margins(
    model, data;
    type = :effects,
    vars = [:income],
    groups = :region => :education
)

# Deep nesting: region → urban → education
deep_nested = population_margins(
    model, data;
    type = :effects,
    groups = :region => (:urban => :education)
)

Continuous Variable Binning

# Quartile analysis
income_quartiles = population_margins(
    model, data;
    type = :effects,
    groups = (:income, 4) # Q1, Q2, Q3, Q4
)

# Custom policy-relevant thresholds
policy_thresholds = population_margins(
    model, data;
    type = :effects,
    groups = (:income, [25000, 50000, 75000])
)

# Mixed categorical and continuous
mixed_groups = population_margins(
    model, data;
    type = :effects,
    groups = [:education, (:income, 4)]
)

Counterfactual Scenario Analysis

Policy Scenarios with Population Override

# Binary policy scenarios
policy_analysis = population_margins(
    model, data;
    type = :effects,
    vars = [:outcome],
    scenarios = (:policy => [0, 1])
)

# Multi-variable scenarios
complex_scenarios = population_margins(
    model, data;
    type = :effects,
    scenarios = (:treatment => [0, 1], 
    :policy => ["current", "reform"])
)

Combined Grouping and Scenarios

# Comprehensive policy analysis: demographics × policy scenarios
full_analysis = population_margins(
    model, data;
    type = :effects,
    vars = [:outcome],
    groups = [:education, :region],
    scenarios = (:treatment => [0, 1])
)

Complex Nested Patterns

Parallel Grouping Within Hierarchy

# Region → (education levels + income quartiles separately)
parallel_groups = population_margins(
    model, data;
    type = :effects,
    groups = :region => [:education, (:income, 4)]
)

Advanced Policy Applications

# Healthcare policy analysis
healthcare_analysis = population_margins(
    health_model, health_data;
    type = :effects,
    groups = :state => (:urban => [:insurance_type, (:income, 3)]),
    scenarios = (:policy_reform => [0, 1], :funding_level => [0.8, 1.2])
)

# Results: State × Urban/Rural × (Insurance×Income-Tertiles) × Policy×Funding scenarios

Second Differences (Interaction Effects)

For comprehensive coverage of second differences—interaction effects on the predicted outcome scale—see the dedicated Second Differences guide. Second differences quantify how marginal effects vary across moderator levels, extending the Margins.jl framework to address effect heterogeneity questions.

Quick reference:

# Compute AMEs across modifier levels
ames = population_margins(
    model, data; scenarios = (treated=[0,1],), type = :effects
)

# Calculate second differences
sd = second_differences(ames, :age, :treated, vcov(model))

Available functions: second_differences(), second_difference(), second_differences_pairwise(), second_differences_all_contrasts().

Error Handling and Diagnostics

Robust Error Detection

# Check for statistical validity issues
function validate_margins_result(result::MarginsResult)
    df = DataFrame(result)
    
    # Check for excessive standard errors (potential issues)
    large_se = df[df.se .> 10 * abs.(df.estimate), :]
    if nrow(large_se) > 0
        @warn "Large standard errors detected - potential statistical issues"
        println(large_se)
    end
    
    # Check for missing values
    missing_results = df[ismissing.(df.estimate) .| ismissing.(df.se), :]
    if nrow(missing_results) > 0
        @warn "Missing values in results - check model specification"
    end
    
    return df
end

# Usage
result = population_margins(model, data)
validated_df = validate_margins_result(result)

Covariance Matrix Diagnostics

# Check covariance matrix properties
function diagnose_vcov(model)
    Σ = vcov(model)
    
    # Check positive definiteness
    eigenvals = eigvals(Σ)
    if any(eigenvals .< 1e-12)
        @warn "Covariance matrix near-singular - standard errors may be unreliable"
    end
    
    # Check condition number
    cond_num = cond(Σ)
    if cond_num > 1e12
        @warn "Poorly conditioned covariance matrix - numerical issues possible"
    end
    
    return (eigenvals=eigenvals, condition_number=cond_num)
end