FormulaCompiler.jl

FormulaCompiler implements a type-stable counterfactual vector system providing variable substitution with O(1) memory overhead without data duplication. This is particularly useful for policy analysis and treatment effect evaluation.

Key Features

  • Memory efficiency: Per-row evaluation with zero allocations
  • Computational performance: Improvements over traditional modelmatrix() approaches for single-row evaluations
  • Comprehensive compatibility: Supports all valid StatsModels.jl formulas, including complex interactions and mathematical functions
  • Categorical mixtures: Compile-time support for weighted categorical specifications for marginal effects
  • Scenario analysis: Memory-efficient variable override system for counterfactual analysis
  • Unified architecture: Single compilation pipeline accommodates diverse formula structures
  • Ecosystem integration: Compatible with GLM.jl, MixedModels.jl, and StandardizedPredictors.jl
  • Dual-backend derivatives: Memory-efficient finite differences and ForwardDiff automatic differentiation options (ForwarDiff is the strongly preferred default option)

Installation

using Pkg
Pkg.add(url = "https://github.com/emfeltham/FormulaCompiler.jl")

Quick Start

Workflow

Figure: Basic FormulaCompiler.jl workflow

using FormulaCompiler, GLM, DataFrames, Tables

# Fit your model normally
df = DataFrame(
    y = randn(1000),
    x = randn(1000),
    z = abs.(randn(1000)) .+ 0.1,
    group = categorical(rand(["A", "B", "C"], 1000))
)

model = lm(@formula(y ~ x * group + log(z)), df)

# Compile once for efficient repeated evaluation  
data = Tables.columntable(df)
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))

# Memory-efficient evaluation suitable for repeated calls
compiled(row_vec, data, 1)  # Zero allocations after warmup

Performance Comparison

Performance results across all tested formula types:

using BenchmarkTools

# Traditional approach (creates full model matrix)
@benchmark modelmatrix(model)[1, :]
# Traditional approach with allocation overhead

# FormulaCompiler (zero-allocation single row)
data = Tables.columntable(df)
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))

@benchmark compiled(row_vec, data, 1)
# FormulaCompiler approach with zero allocations

Zero Allocations (verified in test suite):

  • Core row evaluation (compiled(row,data,i))
  • Scenario evaluation (CounterfactualVector)
  • FD Jacobian
  • AD Jacobian

Allocation Characteristics

FormulaCompiler.jl provides different allocation guarantees depending on the operation:

Core Model Evaluation

  • Zero allocations: modelrow!() and direct compiled() calls are 0 bytes after warmup
  • Performance: Fast per-row evaluation across all formula complexities
  • Validated: Test cases confirm zero-allocation performance

Derivative Operations

FormulaCompiler.jl provides computational primitives for derivatives with dual backend support:

BackendTypeAllocationsPerformanceAccuracyRecommendation
Automatic DifferentiationADEvaluator0 bytesFastMachine precisionStrongly preferred default
Finite DifferencesFDEvaluator0 bytesFast~1e-8-
# Build evaluator with automatic differentiation (strongly recommended)
de = derivativeevaluator(:ad, compiled, data, vars)  # Automatic differentiation

# Compute Jacobian with zero allocations
J = Matrix{Float64}(undef, length(compiled), length(vars))
derivative_modelrow!(J, de, row)  # 0 bytes, machine precision

# For marginal effects, use Margins.jl
using Margins
g = Vector{Float64}(undef, length(vars))
marginal_effects_eta!(g, de, beta, row)  # Marginal effects on η

Use Cases

  • Monte Carlo simulations with large data and many model evaluations
  • Bootstrap resampling: Repeated matrix construction
  • Marginal effects: cf. Margins.jl which is built on FormulaCompiler.jl

Next Steps

Citation

@misc{feltham_formulacompilerjl_2026,
  title = {{{FormulaCompiler}}.Jl and {{Margins}}.Jl: {{Efficient Marginal Effects}} in {{Julia}}},
  shorttitle = {{{FormulaCompiler}}.Jl and {{Margins}}.Jl},
  author = {Feltham, Eric},
  year = {2026},
  month = jan,
  number = {arXiv:2601.07065},
  eprint = {2601.07065},
  primaryclass = {stat},
  publisher = {arXiv},
  doi = {10.48550/arXiv.2601.07065},
  urldate = {2026-01-13},
  abstract = {Marginal effects analysis is fundamental to interpreting statistical models, yet existing implementations face computational constraints that limit analysis at scale. We introduce two Julia packages that address this gap. Margins.jl provides a clean two-function API organizing analysis around a 2-by-2 framework: evaluation context (population vs profile) by analytical target (effects vs predictions). The package supports interaction analysis through second differences, elasticity measures, categorical mixtures for representative profiles, and robust standard errors. FormulaCompiler.jl provides the computational foundation, transforming statistical formulas into zero-allocation, type-specialized evaluators that enable O(p) per-row computation independent of dataset size. Together, these packages achieve 622x average speedup and 460x memory reduction compared to R's marginaleffects package, with successful computation of average marginal effects and delta-method standard errors on 500,000 observations where R fails due to memory exhaustion, providing the first comprehensive and efficient marginal effects implementation for Julia's statistical ecosystem.},
  archiveprefix = {arXiv},
  keywords = {Statistics - Computation},
}