API Reference
Complete API reference for FormulaCompiler.jl functions and types.
Core Compilation Functions
FormulaCompiler.compile_formula — Functioncompile_formula(model, data_example::NamedTuple) -> UnifiedCompiledPrimary API for compiling statistical models into high-performance evaluators.
Position Mapping System
This function implements a position mapping system that converts statistical formulas into zero-allocation execution plans. The system works in three phases:
Phase 1: Formula Decomposition
- Extracts the schema-applied formula from the fitted model
- Converts StatsModels terms into typed operations (
LoadOp,ConstantOp, etc.) - Assigns unique scratch positions to intermediate values and output positions to final results
Phase 2: Position Allocation
- Uses
CompilationContext.position_mapto track term → position mappings - Allocates consecutive scratch positions starting from 1
- Maps each model matrix column to a specific output position
Phase 3: Type Specialization
- Embeds all positions as compile-time type parameters
- Creates operations like
LoadOp{:x, 3}()(load column:xinto scratch position 3) - Enables zero-allocation execution through complete type specialization
Position Mapping Examples
# Simple formula: y ~ 1 + x
# Position mapping:
# scratch[1] = 1.0 (intercept, ConstantOp{1.0, 1})
# scratch[2] = data.x[row] (variable x, LoadOp{:x, 2})
# output[1] = scratch[1] (CopyOp{1, 1})
# output[2] = scratch[2] (CopyOp{2, 2})
# Interaction: y ~ x * z
# Position mapping:
# scratch[1] = data.x[row] (LoadOp{:x, 1})
# scratch[2] = data.z[row] (LoadOp{:z, 2})
# scratch[3] = scratch[1] * scratch[2] (BinaryOp{:*, 1, 2, 3})
# output[1] = scratch[1], output[2] = scratch[2], output[3] = scratch[3]
# Function: y ~ log(x)
# Position mapping:
# scratch[1] = data.x[row] (LoadOp{:x, 1})
# scratch[2] = log(scratch[1]) (UnaryOp{:log, 1, 2})
# output[1] = scratch[2] (CopyOp{2, 1})Performance Characteristics
- Scratch space: Fixed size allocated once, reused for all rows
- Type stability: All positions known at compile time → zero allocations
- Execution: Pure array indexing with no dynamic dispatch
- Memory: O(maxscratchpositions) + O(output_size) per formula
Arguments
model: Fitted statistical model (GLM, LMM, etc.) with schema-applied formuladata_example: NamedTuple with sample data for type inference and schema validation
Returns
UnifiedCompiled{T, OpsTuple, ScratchSize, OutputSize} containing:
- Type-specialized operation tuple
- Pre-allocated scratch buffer
- Position mappings embedded in operation types
compile_formula(formula::StatsModels.FormulaTerm, data_example::NamedTuple) -> UnifiedCompiledConvenience overload to compile directly from a StatsModels.FormulaTerm and column-table data. This mirrors the model-based entry point but skips get_fixed_effects_formula.
Model Row Evaluation
FormulaCompiler.modelrow — Functionmodelrow(model, data, row_idx) -> Vector{Float64}Evaluate a single row and return a new vector (allocating version). Uses compiled formulas for optimal performance.
Example
row_values = modelrow(model, data, 1) # Returns Vector{Float64}modelrow(model, data, row_indices) -> Matrix{Float64}Evaluate multiple rows and return a new matrix (allocating version). Uses compiled formulas for optimal performance.
Example
matrix = modelrow(model, data, [1, 5, 10]) # Returns Matrix{Float64}modelrow(compiled_formula, data, row_idx) -> Vector{Float64}Evaluate a single row with pre-compiled compiled formula.
Example
compiled = compile_formula(model, data)
row_values = modelrow(compiled, data, 1) # Returns Vector{Float64}modelrow(compiled_formula, data, row_indices) -> Matrix{Float64}Evaluate multiple rows with pre-compiled compiled formula.
Example
compiled = compile_formula(model, data)
matrix = modelrow(compiled, data, [1, 5, 10]) # Returns Matrix{Float64}modelrow(model, scenario::DataScenario, row_idx) -> Vector{Float64}Evaluate model row using a data scenario (allocating version).
modelrow(compiled::UnifiedCompiled, scenario::DataScenario, row_idx) -> Vector{Float64}Evaluate model row using a data scenario with UnifiedCompiled (allocating version).
FormulaCompiler.modelrow! — Functionmodelrow!(row_vec, compiled_formula, data, row_idx)Evaluate a single row of the model matrix in-place (zero-allocation).
Arguments
row_vec::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)compiled_formula: Compiled formula fromcompile_formuladata: Data in Tables.jl format (preferably fromTables.columntable)row_idx::Int: Row index to evaluate
Returns
row_vec: The same vector passed in, now containing the evaluated row
Example
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
modelrow!(row_vec, compiled, data, 1) # Zero allocationsmodelrow!(row_vec, model, data, row_idx; cache=true)Evaluate a single row of the model matrix in-place with automatic compilation.
Arguments
row_vec::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)model: Statistical model (GLM, MixedModel, etc.)data: Data in Tables.jl formatrow_idx::Int: Row index to evaluatecache::Bool: Whether to cache compiled formula (default: true)
Returns
row_vec: The same vector passed in, now containing the evaluated row
Example
model = lm(@formula(y ~ x + group), df)
data = Tables.columntable(df)
row_vec = Vector{Float64}(undef, size(modelmatrix(model), 2))
modelrow!(row_vec, model, data, 1)FormulaCompiler.ModelRowEvaluator — TypeModelRowEvaluator{D, O}Pre-compiled evaluator using compiled formulas only.
Override and Scenario System
FormulaCompiler.OverrideVector — TypeOverrideVector{T} <: AbstractVector{T}A lazy vector that returns the same override value for all indices. This avoids allocating full arrays when setting all observations to a representative value.
Example
# Instead of: fill(2.5, 1_000_000) # Allocates 8MB
# Use: OverrideVector(2.5, 1_000_000) # Allocates ~32 bytesFormulaCompiler.DataScenario — TypeDataScenarioRepresents a data scenario with specific variable overrides. Contains the modified data that can be used directly with compiled formulas.
Fields
name::String: Descriptive name for the scenariooverrides::Dict{Symbol,Any}: Variable overrides (mutable for iterative development)data::NamedTuple: Modified column-table data with OverrideVectors appliedoriginal_data::NamedTuple: Original unmodified data for reference
FormulaCompiler.create_scenario — Functioncreate_scenario(name, original_data; overrides...)
create_scenario(name, original_data, overrides::Dict)Create a data scenario with specified variable overrides for counterfactual analysis.
Arguments
name::String: Descriptive name for this scenariooriginal_data::NamedTuple: Original data in column-table format (from Tables.columntable)overrides...: Keyword arguments for variable overrides (or Dict in second method)
Returns
DataScenario: Object containing original data, overrides, and modified data with OverrideVectors
Example
data = Tables.columntable(df)
# Override single variable to mean
scenario1 = create_scenario("x_at_mean", data; x = mean(data.x))
# Override multiple variables for policy analysis
scenario2 = create_scenario("policy", data; x = 2.5, group = "A", treatment = true)
# Use dictionary for dynamic overrides
overrides = Dict(:dose => 100.0, :region => "North")
scenario3 = create_scenario("high_dose_north", data, overrides)
# Evaluate with compiled formula (zero-allocation)
compiled = compile_formula(model)
row_vec = Vector{Float64}(undef, length(compiled))
compiled(row_vec, scenario1.data, row_idx)Near-Zero-Allocation Derivatives
FormulaCompiler.jl provides a sophisticated automatic differentiation system that achieves near-theoretical optimal allocation performance through aggressive optimization.
Performance Characteristics
- Core evaluation: Exactly 0 allocations
- Finite differences (FD): Exactly 0 allocations (optimized implementation)
- ForwardDiff derivatives: ≤512 bytes per call (ForwardDiff internals)
- Marginal effects: ≤512 bytes per call for AD backend (optimized with preallocated buffers)
- Allocation efficiency: >99.75% compared to naive AD approaches
- Validation: Cross-validated against finite differences (rtol=1e-6, atol=1e-8)
FormulaCompiler.build_derivative_evaluator — Functionbuild_derivative_evaluator(compiled, data; vars, chunk=:auto) -> DerivativeEvaluatorBuild a ForwardDiff-based derivative evaluator for a fixed set of variables.
Arguments:
compiled::UnifiedCompiled: Result ofcompile_formula(model, data).data::NamedTuple: Column-table data (e.g.,Tables.columntable(df)).vars::Vector{Symbol}: Variables to differentiate with respect to (typically continuous predictors).chunk:ForwardDiff.Chunk{N}()or:auto(usesChunk{length(vars)}).
Returns:
DerivativeEvaluator: Prebuilt evaluator object reusable across rows.
Notes:
- Compile once per model + variable set; reuse across calls.
- Zero allocations in steady state after warmup (typed closure + config; no per-call merges).
- Keep
varsfixed for best specialization.
FormulaCompiler.derivative_modelrow! — Functionderivative_modelrow!(J, deval, row) -> AbstractMatrix{Float64}Fill J with the Jacobian of one model row with respect to deval.vars.
Arguments:
J::AbstractMatrix{Float64}: Preallocated buffer of size(n_terms, n_vars).deval::DerivativeEvaluator: Built bybuild_derivative_evaluator.row::Int: Row index (1-based).
Returns:
- The same
Jbuffer, withJ[i, j] = ∂X[i]/∂vars[j]for the given row.
Notes:
- Orientation is
(n_terms, n_vars);n_terms == length(compiled). - Small allocations (~368 bytes) due to ForwardDiff internals. For strict zero-allocation requirements, use
derivative_modelrow_fd!instead.
FormulaCompiler.derivative_modelrow — Functionderivative_modelrow(deval, row) -> Matrix{Float64}Allocating convenience wrapper that returns the Jacobian for one row.
FormulaCompiler.derivative_modelrow_fd! — Functionderivative_modelrow_fd!(J, compiled, data, row; vars, step=:auto)Finite-difference Jacobian for a single row using central differences (standalone).
Arguments:
J::AbstractMatrix{Float64}: Preallocated(n_terms, n_vars)buffer.compiled::UnifiedCompiled: Result ofcompile_formula.data::NamedTuple: Column-table data.row::Int: Row index.vars::Vector{Symbol}: Variables to differentiate with respect to.step: Numeric step size or:auto(eps()^(1/3) * max(1, |x|)).
Notes:
- Two evaluations per variable; useful as a robust fallback and for cross-checks.
- This standalone path allocates per call (builds per-call overrides and small temporaries). For zero allocations after warmup, prefer the evaluator FD path (
derivative_modelrow_fd_pos!).
Missing docstring for derivative_modelrow_fd. Check Documenter's build log for details.
FormulaCompiler.contrast_modelrow! — Functioncontrast_modelrow!(Δ, compiled, data, row; var, from, to)Compute a discrete contrast at one row for a single variable: Δ = X(to) − X(from).
Arguments:
Δ::AbstractVector{Float64}: Preallocated buffer of lengthn_terms.compiled::UnifiedCompiled: Result ofcompile_formula.data::NamedTuple: Column-table data.row::Int: Row index.var::Symbol: Variable to change (e.g.,:group3).from,to: Values to contrast (level names orCategoricalValuefor categorical; numbers for discrete).
Notes:
- Uses a row-local override; for categorical columns, values are normalized to the column's levels.
Missing docstring for contrast_modelrow. Check Documenter's build log for details.
FormulaCompiler.continuous_variables — Functioncontinuous_variables(compiled, data) -> Vector{Symbol}Return a list of continuous variable symbols present in the compiled ops, excluding categoricals detected via ContrastOps. Filters by eltype(data[sym]) <: Real.
FormulaCompiler.marginal_effects_eta! — Functionmarginal_effects_eta!(g, de, beta, row; backend=:ad)Fill g with marginal effects of η = Xβ w.r.t. de.vars at row. Implements: g = J' * β, where J = ∂X/∂vars.
Arguments:
backend::Symbol::ad(ForwardDiff) or:fd(finite differences)
Backends and allocations:
:ad: Uses ForwardDiff automatic differentiation. Small allocations (~368 bytes) due to AD internals, but faster and more accurate.:fd: Uses zero-allocation finite differences. Strict 0 bytes after warmup, but slightly slower due to multiple function evaluations.- Allocating convenience (
marginal_effects_eta) allocates the result vector by design.
Recommendations:
- Use
:fdbackend for strict zero-allocation requirements - Use
:adbackend for speed and numerical accuracy (default)
Missing docstring for marginal_effects_eta. Check Documenter's build log for details.
FormulaCompiler.marginal_effects_mu! — Functionmarginal_effects_mu!(g, de, beta, row; link, backend=:ad)Compute marginal effects of μ = g⁻¹(η) at row via chain rule: dμ/dx = (dμ/dη) * (dη/dx).
Arguments:
link: Link function (e.g.,IdentityLink(),LogLink(),LogitLink())backend::Symbol::ad(ForwardDiff) or:fd(finite differences)
Backends and allocations:
:ad: Uses ForwardDiff via η path. Small allocations (~368 bytes) due to AD internals, but faster and more accurate.:fd: Uses zero-allocation finite differences. Strict 0 bytes after warmup, but slightly slower due to multiple function evaluations.- Allocating convenience (
marginal_effects_mu) allocates the result vector by design.
Recommendations:
- Use
:fdbackend for strict zero-allocation requirements - Use
:adbackend for speed and numerical accuracy (default)
Missing docstring for marginal_effects_mu. Check Documenter's build log for details.
Function Details
compile_formula(model, data) -> UnifiedCompiled
Compile a fitted model’s formula into a position-mapped, zero-allocation evaluator.
Arguments:
model: Fitted statistical model (GLM, MixedModel, etc.)data: Tables.jl-compatible data (prefer a column table viaTables.columntable)
Returns:
UnifiedCompiled: Type-specialized evaluator with embedded position mappings
Example:
model = lm(@formula(y ~ x + group), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)modelrow(model, data, row_index) -> Vector{Float64}
Evaluate model matrix row (allocating version).
Arguments:
model: Fitted statistical model or compiled formuladata: Data in Tables.jl formatrow_index: Row index to evaluate (Int) or indices (Vector{Int}/AbstractVector)
Returns:
Vector{Float64}orMatrix{Float64}: Model matrix row(s)
Example:
row_vec = modelrow(model, data, 1)
multiple_rows = modelrow(model, data, [1, 5, 10])modelrow!(output, compiled, data, row_indices)
In-place model matrix row evaluation (zero-allocation).
Arguments:
output: Pre-allocated output array (Vector or Matrix)compiled: Compiled formula objectdata: Data in Tables.jl formatrow_indices: Row index (Int) or indices (AbstractVector)
Example:
compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
modelrow!(row_vec, compiled, data, 1) # Zero allocations
# Multiple rows
matrix = Matrix{Float64}(undef, 10, length(compiled))
modelrow!(matrix, compiled, data, 1:10)ModelRowEvaluator(model, data)
Create a reusable model row evaluator object.
Arguments:
model: Fitted statistical modeldata: Data in DataFrame or Tables.jl format
Methods:
evaluator(row_index): Returns new vector (allocating)evaluator(output, row_index): In-place evaluation (non-allocating)
Example:
evaluator = ModelRowEvaluator(model, df)
result = evaluator(1) # Allocating
evaluator(row_vec, 1) # Non-allocatingcreate_scenario(name, data; overrides...)
Create a data scenario with variable overrides.
Arguments:
name: Scenario name (String)data: Base data in Tables.jl formatoverrides...: Keyword arguments specifying variable overrides
Returns:
DataScenario: Scenario object with override data
Example:
scenario = create_scenario("treatment", data;
treatment = true,
dose = 100.0
)create_scenario_grid(name, data, parameter_dict; verbose=false)
Create all combinations of scenario parameters.
Arguments:
name: Base name for scenariosdata: Base dataparameter_dict: Dict mapping variables to vectors of valuesverbose: Whether to print creation progress (default:false)
Returns:
Vector{DataScenario}: Vector of all parameter combinations
Example:
grid = create_scenario_grid("policy", data, Dict(
:treatment => [false, true],
:dose => [50, 100, 150]
); verbose=true) # Creates 6 scenarios, prints progressOverrideVector(value, length)
Create a memory-efficient constant vector.
Arguments:
value: Constant value to returnlength: Vector length
Returns:
OverrideVector: Memory-efficient constant vector
Example:
# Traditional: 8MB for 1M elements
traditional = fill(42.0, 1_000_000)
# OverrideVector: ~32 bytes
efficient = OverrideVector(42.0, 1_000_000)
# Same interface
@assert traditional[500_000] == efficient[500_000]Scenario Management Functions
set_override!(scenario, variable, value)
Add or update a variable override in a scenario.
remove_override!(scenario, variable)
Remove a variable override from a scenario.
update_scenario!(scenario; overrides...)
Bulk update multiple overrides in a scenario.
get_overrides(scenario)
Get dictionary of current overrides in a scenario.
Example:
scenario = create_scenario("dynamic", data)
set_override!(scenario, :x, 1.0)
update_scenario!(scenario; y = 2.0, z = 3.0)
overrides = get_overrides(scenario) # Dict(:x => 1.0, :y => 2.0, :z => 3.0)
remove_override!(scenario, :z)Integration Functions
fixed_effects_form(mixed_model)
Extract fixed effects formula from a MixedModel.
Arguments:
mixed_model: Fitted MixedModel
Returns:
FormulaTerm: Fixed effects portion of the formula
Example:
mixed = fit(MixedModel, @formula(y ~ x + (1|group)), df)
fixed_form = fixed_effects_form(mixed) # Returns: y ~ xUtility Functions
length(compiled_formula)
Get the number of terms in compiled formula (model matrix columns).
Example:
compiled = compile_formula(model, data)
n_terms = length(compiled) # e.g., 4Type System
Core Types
UnifiedCompiled: Position-mapped, zero-allocation compiled evaluatorDataScenario: Scenario with variable overridesScenarioCollection: Collection of related scenariosOverrideVector{T}: Memory-efficient constant vectorModelRowEvaluator: Reusable evaluator object
Internal Types
Operation types used by the unified compiler:
LoadOp{Column, OutPos}: Load a data column into a scratch positionConstantOp{Value, OutPos}: Place a compile-time constant into scratchUnaryOp{Func, InPos, OutPos}: Apply a unary functionBinaryOp{Func, InPos1, InPos2, OutPos}: Apply a binary operationContrastOp{Column, OutPositions}: Expand a categorical column via contrastsCopyOp{InPos, OutIdx}: Copy from scratch to final output index
build_derivative_evaluator(compiled, data; vars, chunk=:auto)
Build a reusable ForwardDiff-based derivative evaluator for computing Jacobians and marginal effects.
Arguments:
compiled: Compiled formula fromcompile_formuladata: Tables.jl-compatible data (column table preferred)vars: Vector of symbols for variables to differentiate with respect tochunk: ForwardDiff chunk size (:autouseslength(vars))
Returns:
DerivativeEvaluator: Reusable evaluator with preallocated buffers
Performance:
- One-time construction cost, then ≤512 bytes per derivative call (AD backend)
- Contains preallocated Jacobian matrices and gradient vectors
Example:
compiled = compile_formula(model, data)
vars = continuous_variables(compiled, data) # or [:x, :z]
de = build_derivative_evaluator(compiled, data; vars=vars)derivative_modelrow!(J, evaluator, row)
Fill Jacobian matrix with derivatives of model row with respect to selected variables.
Arguments:
J: Pre-allocated matrix of size(length(compiled), length(vars))evaluator:DerivativeEvaluatorfrombuild_derivative_evaluatorrow: Row index to evaluate
Performance:
- ≤512 bytes allocated per call (ForwardDiff internals)
- Uses preallocated buffers for near-optimal efficiency
Example:
J = Matrix{Float64}(undef, length(compiled), length(de.vars))
derivative_modelrow!(J, de, 1) # Fill J with derivativesmarginal_effects_eta!(g, evaluator, beta, row)
Compute marginal effects on linear predictor η = Xβ using chain rule.
Arguments:
g: Pre-allocated gradient vector of lengthlength(vars)evaluator:DerivativeEvaluatorbeta: Model coefficients vectorrow: Row index
Implementation:
- Computes
g = J' * βwhereJis the Jacobian matrix - Uses preallocated internal Jacobian buffer
Performance:
- ≤512 bytes per call with preallocated buffers (AD backend)
Example:
β = coef(model)
g = Vector{Float64}(undef, length(de.vars))
marginal_effects_eta!(g, de, β, 1)marginal_effects_mu!(g, evaluator, beta, row; link)
Compute marginal effects on mean μ via chain rule: dμ/dx = (dμ/dη) × (dη/dx).
Arguments:
g: Pre-allocated gradient vectorevaluator:DerivativeEvaluatorbeta: Model coefficientsrow: Row indexlink: GLM link function (e.g.,LogitLink(),LogLink())
Supported Links:
IdentityLink(),LogLink(),LogitLink(),ProbitLink()CloglogLink(),CauchitLink(),InverseLink(),SqrtLink()InverseSquareLink()(when available)
Performance:
- ≤512 bytes per call with preallocated internal buffers (AD backend)
Example:
using GLM
marginal_effects_mu!(g, de, β, 1; link=LogitLink())continuous_variables(compiled, data)
Extract continuous variable names from compiled operations, excluding categoricals.
Arguments:
compiled: Compiled formuladata: Data used in compilation
Returns:
Vector{Symbol}: Sorted list of continuous variable symbols
Example:
vars = continuous_variables(compiled, data) # e.g., [:x, :z, :age]
de = build_derivative_evaluator(compiled, data; vars=vars)Performance Notes
- Core functions (
modelrow!,compiled(row_vec, data, row)) achieve exactly 0 bytes allocated - Derivative functions achieve ≤512 bytes per call (ForwardDiff internals)
- Marginal effects use preallocated buffers to minimize allocations (≤512 bytes)
compile_formulahas one-time compilation cost but enables many fast evaluations- Use
Tables.columntableformat for best performance - Pre-allocate output vectors/matrices and reuse them across evaluations
- Build derivative evaluators once and reuse across many calls