FormulaCompiler.jl
FormulaCompiler implements a type-stable counterfactual vector system providing variable substitution with O(1) memory overhead without data duplication. This is particularly useful for policy analysis and treatment effect evaluation.
Key Features
- Memory efficiency: Per-row evaluation with zero allocations
- Computational performance: Improvements over traditional
modelmatrix()approaches for single-row evaluations - Comprehensive compatibility: Supports all valid StatsModels.jl formulas, including complex interactions and mathematical functions
- Categorical mixtures: Compile-time support for weighted categorical specifications for marginal effects
- Scenario analysis: Memory-efficient variable override system for counterfactual analysis
- Unified architecture: Single compilation pipeline accommodates diverse formula structures
- Ecosystem integration: Compatible with GLM.jl, MixedModels.jl, and StandardizedPredictors.jl
- Dual-backend derivatives: Memory-efficient finite differences and ForwardDiff automatic differentiation options (ForwarDiff is the strongly preferred default option)
Installation
using Pkg
Pkg.add(url = "https://github.com/emfeltham/FormulaCompiler.jl")Quick Start
Figure: Basic FormulaCompiler.jl workflow
using FormulaCompiler, GLM, DataFrames, Tables
# Fit your model normally
df = DataFrame(
y = randn(1000),
x = randn(1000),
z = abs.(randn(1000)) .+ 0.1,
group = categorical(rand(["A", "B", "C"], 1000))
)
model = lm(@formula(y ~ x * group + log(z)), df)
# Compile once for efficient repeated evaluation
data = Tables.columntable(df)
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
# Memory-efficient evaluation suitable for repeated calls
compiled(row_vec, data, 1) # Zero allocations after warmupPerformance Comparison
Performance results across all tested formula types:
using BenchmarkTools
# Traditional approach (creates full model matrix)
@benchmark modelmatrix(model)[1, :]
# Traditional approach with allocation overhead
# FormulaCompiler (zero-allocation single row)
data = Tables.columntable(df)
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
@benchmark compiled(row_vec, data, 1)
# FormulaCompiler approach with zero allocations
Zero Allocations (verified in test suite):
- Core row evaluation (
compiled(row,data,i)) - Scenario evaluation (CounterfactualVector)
- FD Jacobian
- AD Jacobian
Allocation Characteristics
FormulaCompiler.jl provides different allocation guarantees depending on the operation:
Core Model Evaluation
- Zero allocations:
modelrow!()and directcompiled()calls are 0 bytes after warmup - Performance: Fast per-row evaluation across all formula complexities
- Validated: Test cases confirm zero-allocation performance
Derivative Operations
FormulaCompiler.jl provides computational primitives for derivatives with dual backend support:
| Backend | Type | Allocations | Performance | Accuracy | Recommendation |
|---|---|---|---|---|---|
| Automatic Differentiation | ADEvaluator | 0 bytes | Fast | Machine precision | Strongly preferred default |
| Finite Differences | FDEvaluator | 0 bytes | Fast | ~1e-8 | - |
# Build evaluator with automatic differentiation (strongly recommended)
de = derivativeevaluator(:ad, compiled, data, vars) # Automatic differentiation
# Compute Jacobian with zero allocations
J = Matrix{Float64}(undef, length(compiled), length(vars))
derivative_modelrow!(J, de, row) # 0 bytes, machine precision
# For marginal effects, use Margins.jl
using Margins
g = Vector{Float64}(undef, length(vars))
marginal_effects_eta!(g, de, beta, row) # Marginal effects on ηUse Cases
- Monte Carlo simulations with large data and many model evaluations
- Bootstrap resampling: Repeated matrix construction
- Marginal effects: cf. Margins.jl which is built on FormulaCompiler.jl
Next Steps
- Read the Getting Started guide for a detailed walkthrough
- Explore Advanced Features for scenario analysis and memory optimization
- Learn about Categorical Mixtures for marginal effects computation
- See StandardizedPredictors Integration for comprehensive z-score standardization workflows
- Check out Examples for real-world use cases
- Review the Mathematical Foundation for comprehensive theory and implementation details
- Review the API Reference for complete function documentation
- Reproduce results with the Benchmark Protocol
Citation
@misc{feltham_formulacompilerjl_2026,
title = {{{FormulaCompiler}}.Jl and {{Margins}}.Jl: {{Efficient Marginal Effects}} in {{Julia}}},
shorttitle = {{{FormulaCompiler}}.Jl and {{Margins}}.Jl},
author = {Feltham, Eric},
year = {2026},
month = jan,
number = {arXiv:2601.07065},
eprint = {2601.07065},
primaryclass = {stat},
publisher = {arXiv},
doi = {10.48550/arXiv.2601.07065},
urldate = {2026-01-13},
abstract = {Marginal effects analysis is fundamental to interpreting statistical models, yet existing implementations face computational constraints that limit analysis at scale. We introduce two Julia packages that address this gap. Margins.jl provides a clean two-function API organizing analysis around a 2-by-2 framework: evaluation context (population vs profile) by analytical target (effects vs predictions). The package supports interaction analysis through second differences, elasticity measures, categorical mixtures for representative profiles, and robust standard errors. FormulaCompiler.jl provides the computational foundation, transforming statistical formulas into zero-allocation, type-specialized evaluators that enable O(p) per-row computation independent of dataset size. Together, these packages achieve 622x average speedup and 460x memory reduction compared to R's marginaleffects package, with successful computation of average marginal effects and delta-method standard errors on 500,000 observations where R fails due to memory exhaustion, providing the first comprehensive and efficient marginal effects implementation for Julia's statistical ecosystem.},
archiveprefix = {arXiv},
keywords = {Statistics - Computation},
}