API Reference

Complete API reference for FormulaCompiler.jl functions and types.

Core Compilation Functions

FormulaCompiler.compile_formulaFunction
compile_formula(model, data_example::NamedTuple) -> UnifiedCompiled

Primary API for compiling statistical models into high-performance evaluators.

Position Mapping System

This function implements a position mapping system that converts statistical formulas into zero-allocation execution plans. The system works in three phases:

Phase 1: Formula Decomposition

  • Extracts the schema-applied formula from the fitted model
  • Converts StatsModels terms into typed operations (LoadOp, ConstantOp, etc.)
  • Assigns unique scratch positions to intermediate values and output positions to final results

Phase 2: Position Allocation

  • Uses CompilationContext.position_map to track term → position mappings
  • Allocates consecutive scratch positions starting from 1
  • Maps each model matrix column to a specific output position

Phase 3: Type Specialization

  • Embeds all positions as compile-time type parameters
  • Creates operations like LoadOp{:x, 3}() (load column :x into scratch position 3)
  • Enables zero-allocation execution through complete type specialization

Position Mapping Examples

# Simple formula: y ~ 1 + x
# Position mapping:
# scratch[1] = 1.0          (intercept, ConstantOp{1.0, 1})
# scratch[2] = data.x[row]  (variable x, LoadOp{:x, 2})  
# output[1] = scratch[1]    (CopyOp{1, 1})
# output[2] = scratch[2]    (CopyOp{2, 2})

# Interaction: y ~ x * z  
# Position mapping:
# scratch[1] = data.x[row]     (LoadOp{:x, 1})
# scratch[2] = data.z[row]     (LoadOp{:z, 2}) 
# scratch[3] = scratch[1] * scratch[2]  (BinaryOp{:*, 1, 2, 3})
# output[1] = scratch[1], output[2] = scratch[2], output[3] = scratch[3]

# Function: y ~ log(x)
# Position mapping:
# scratch[1] = data.x[row]     (LoadOp{:x, 1})
# scratch[2] = log(scratch[1]) (UnaryOp{:log, 1, 2})
# output[1] = scratch[2]       (CopyOp{2, 1})

Performance Characteristics

  • Scratch space: Fixed size allocated once, reused for all rows
  • Type stability: All positions known at compile time → zero allocations
  • Execution: Pure array indexing with no dynamic dispatch
  • Memory: O(maxscratchpositions) + O(output_size) per formula

Arguments

  • model: Fitted statistical model (GLM, LMM, etc.) with schema-applied formula
  • data_example: NamedTuple with sample data for type inference and schema validation

Returns

UnifiedCompiled{T, OpsTuple, ScratchSize, OutputSize} containing:

  • Type-specialized operation tuple
  • Pre-allocated scratch buffer
  • Position mappings embedded in operation types
source
compile_formula(formula::StatsModels.FormulaTerm, data_example::NamedTuple) -> UnifiedCompiled

Convenience overload to compile directly from a StatsModels.FormulaTerm and column-table data. This mirrors the model-based entry point but skips get_fixed_effects_formula.

source

Model Row Evaluation

FormulaCompiler.modelrowFunction
modelrow(model, data, row_idx) -> Vector{Float64}

Evaluate a single row and return a new vector (allocating version). Uses compiled formulas for optimal performance.

Example

row_values = modelrow(model, data, 1)  # Returns Vector{Float64}
source
modelrow(model, data, row_indices) -> Matrix{Float64}

Evaluate multiple rows and return a new matrix (allocating version). Uses compiled formulas for optimal performance.

Example

matrix = modelrow(model, data, [1, 5, 10])  # Returns Matrix{Float64}
source
modelrow(compiled_formula, data, row_idx) -> Vector{Float64}

Evaluate a single row with pre-compiled compiled formula.

Example

compiled = compile_formula(model, data)
row_values = modelrow(compiled, data, 1)  # Returns Vector{Float64}
source
modelrow(compiled_formula, data, row_indices) -> Matrix{Float64}

Evaluate multiple rows with pre-compiled compiled formula.

Example

compiled = compile_formula(model, data)
matrix = modelrow(compiled, data, [1, 5, 10])  # Returns Matrix{Float64}
source
modelrow(model, scenario::DataScenario, row_idx) -> Vector{Float64}

Evaluate model row using a data scenario (allocating version).

source
modelrow(compiled::UnifiedCompiled, scenario::DataScenario, row_idx) -> Vector{Float64}

Evaluate model row using a data scenario with UnifiedCompiled (allocating version).

source
FormulaCompiler.modelrow!Function
modelrow!(row_vec, compiled_formula, data, row_idx)

Evaluate a single row of the model matrix in-place (zero-allocation).

Arguments

  • row_vec::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)
  • compiled_formula: Compiled formula from compile_formula
  • data: Data in Tables.jl format (preferably from Tables.columntable)
  • row_idx::Int: Row index to evaluate

Returns

  • row_vec: The same vector passed in, now containing the evaluated row

Example

compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
modelrow!(row_vec, compiled, data, 1)  # Zero allocations
source
modelrow!(row_vec, model, data, row_idx; cache=true)

Evaluate a single row of the model matrix in-place with automatic compilation.

Arguments

  • row_vec::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)
  • model: Statistical model (GLM, MixedModel, etc.)
  • data: Data in Tables.jl format
  • row_idx::Int: Row index to evaluate
  • cache::Bool: Whether to cache compiled formula (default: true)

Returns

  • row_vec: The same vector passed in, now containing the evaluated row

Example

model = lm(@formula(y ~ x + group), df)
data = Tables.columntable(df)
row_vec = Vector{Float64}(undef, size(modelmatrix(model), 2))
modelrow!(row_vec, model, data, 1)
Note

First call compiles the formula. Subsequent calls reuse cached version when cache=true.

source

Override and Scenario System

FormulaCompiler.OverrideVectorType
OverrideVector{T} <: AbstractVector{T}

A lazy vector that returns the same override value for all indices. This avoids allocating full arrays when setting all observations to a representative value.

Example

# Instead of: fill(2.5, 1_000_000)  # Allocates 8MB
# Use: OverrideVector(2.5, 1_000_000)  # Allocates ~32 bytes
source
FormulaCompiler.DataScenarioType
DataScenario

Represents a data scenario with specific variable overrides. Contains the modified data that can be used directly with compiled formulas.

Fields

  • name::String: Descriptive name for the scenario
  • overrides::Dict{Symbol,Any}: Variable overrides (mutable for iterative development)
  • data::NamedTuple: Modified column-table data with OverrideVectors applied
  • original_data::NamedTuple: Original unmodified data for reference
source
FormulaCompiler.create_scenarioFunction
create_scenario(name, original_data; overrides...)
create_scenario(name, original_data, overrides::Dict)

Create a data scenario with specified variable overrides for counterfactual analysis.

Arguments

  • name::String: Descriptive name for this scenario
  • original_data::NamedTuple: Original data in column-table format (from Tables.columntable)
  • overrides...: Keyword arguments for variable overrides (or Dict in second method)

Returns

  • DataScenario: Object containing original data, overrides, and modified data with OverrideVectors

Example

data = Tables.columntable(df)

# Override single variable to mean
scenario1 = create_scenario("x_at_mean", data; x = mean(data.x))

# Override multiple variables for policy analysis
scenario2 = create_scenario("policy", data; x = 2.5, group = "A", treatment = true)

# Use dictionary for dynamic overrides
overrides = Dict(:dose => 100.0, :region => "North")
scenario3 = create_scenario("high_dose_north", data, overrides)

# Evaluate with compiled formula (zero-allocation)
compiled = compile_formula(model)
row_vec = Vector{Float64}(undef, length(compiled))
compiled(row_vec, scenario1.data, row_idx)
Note

Uses memory-efficient OverrideVector to avoid data duplication. Each override creates a lazy vector returning the same value for all rows.

source

Near-Zero-Allocation Derivatives

FormulaCompiler.jl provides a sophisticated automatic differentiation system that achieves near-theoretical optimal allocation performance through aggressive optimization.

Performance Characteristics

  • Core evaluation: Exactly 0 allocations
  • Finite differences (FD): Exactly 0 allocations (optimized implementation)
  • ForwardDiff derivatives: ≤512 bytes per call (ForwardDiff internals)
  • Marginal effects: ≤512 bytes per call for AD backend (optimized with preallocated buffers)
  • Allocation efficiency: >99.75% compared to naive AD approaches
  • Validation: Cross-validated against finite differences (rtol=1e-6, atol=1e-8)
FormulaCompiler.build_derivative_evaluatorFunction
build_derivative_evaluator(compiled, data; vars, chunk=:auto) -> DerivativeEvaluator

Build a ForwardDiff-based derivative evaluator for a fixed set of variables.

Arguments:

  • compiled::UnifiedCompiled: Result of compile_formula(model, data).
  • data::NamedTuple: Column-table data (e.g., Tables.columntable(df)).
  • vars::Vector{Symbol}: Variables to differentiate with respect to (typically continuous predictors).
  • chunk: ForwardDiff.Chunk{N}() or :auto (uses Chunk{length(vars)}).

Returns:

  • DerivativeEvaluator: Prebuilt evaluator object reusable across rows.

Notes:

  • Compile once per model + variable set; reuse across calls.
  • Zero allocations in steady state after warmup (typed closure + config; no per-call merges).
  • Keep vars fixed for best specialization.
source
FormulaCompiler.derivative_modelrow!Function
derivative_modelrow!(J, deval, row) -> AbstractMatrix{Float64}

Fill J with the Jacobian of one model row with respect to deval.vars.

Arguments:

  • J::AbstractMatrix{Float64}: Preallocated buffer of size (n_terms, n_vars).
  • deval::DerivativeEvaluator: Built by build_derivative_evaluator.
  • row::Int: Row index (1-based).

Returns:

  • The same J buffer, with J[i, j] = ∂X[i]/∂vars[j] for the given row.

Notes:

  • Orientation is (n_terms, n_vars); n_terms == length(compiled).
  • Small allocations (~368 bytes) due to ForwardDiff internals. For strict zero-allocation requirements, use derivative_modelrow_fd! instead.
source
FormulaCompiler.derivative_modelrow_fd!Function
derivative_modelrow_fd!(J, compiled, data, row; vars, step=:auto)

Finite-difference Jacobian for a single row using central differences (standalone).

Arguments:

  • J::AbstractMatrix{Float64}: Preallocated (n_terms, n_vars) buffer.
  • compiled::UnifiedCompiled: Result of compile_formula.
  • data::NamedTuple: Column-table data.
  • row::Int: Row index.
  • vars::Vector{Symbol}: Variables to differentiate with respect to.
  • step: Numeric step size or :auto (eps()^(1/3) * max(1, |x|)).

Notes:

  • Two evaluations per variable; useful as a robust fallback and for cross-checks.
  • This standalone path allocates per call (builds per-call overrides and small temporaries). For zero allocations after warmup, prefer the evaluator FD path (derivative_modelrow_fd_pos!).
source
Missing docstring.

Missing docstring for derivative_modelrow_fd. Check Documenter's build log for details.

FormulaCompiler.contrast_modelrow!Function
contrast_modelrow!(Δ, compiled, data, row; var, from, to)

Compute a discrete contrast at one row for a single variable: Δ = X(to) − X(from).

Arguments:

  • Δ::AbstractVector{Float64}: Preallocated buffer of length n_terms.
  • compiled::UnifiedCompiled: Result of compile_formula.
  • data::NamedTuple: Column-table data.
  • row::Int: Row index.
  • var::Symbol: Variable to change (e.g., :group3).
  • from, to: Values to contrast (level names or CategoricalValue for categorical; numbers for discrete).

Notes:

  • Uses a row-local override; for categorical columns, values are normalized to the column's levels.
source
Missing docstring.

Missing docstring for contrast_modelrow. Check Documenter's build log for details.

FormulaCompiler.continuous_variablesFunction
continuous_variables(compiled, data) -> Vector{Symbol}

Return a list of continuous variable symbols present in the compiled ops, excluding categoricals detected via ContrastOps. Filters by eltype(data[sym]) <: Real.

source
FormulaCompiler.marginal_effects_eta!Function
marginal_effects_eta!(g, de, beta, row; backend=:ad)

Fill g with marginal effects of η = Xβ w.r.t. de.vars at row. Implements: g = J' * β, where J = ∂X/∂vars.

Arguments:

  • backend::Symbol: :ad (ForwardDiff) or :fd (finite differences)

Backends and allocations:

  • :ad: Uses ForwardDiff automatic differentiation. Small allocations (~368 bytes) due to AD internals, but faster and more accurate.
  • :fd: Uses zero-allocation finite differences. Strict 0 bytes after warmup, but slightly slower due to multiple function evaluations.
  • Allocating convenience (marginal_effects_eta) allocates the result vector by design.

Recommendations:

  • Use :fd backend for strict zero-allocation requirements
  • Use :ad backend for speed and numerical accuracy (default)
source
Missing docstring.

Missing docstring for marginal_effects_eta. Check Documenter's build log for details.

FormulaCompiler.marginal_effects_mu!Function
marginal_effects_mu!(g, de, beta, row; link, backend=:ad)

Compute marginal effects of μ = g⁻¹(η) at row via chain rule: dμ/dx = (dμ/dη) * (dη/dx).

Arguments:

  • link: Link function (e.g., IdentityLink(), LogLink(), LogitLink())
  • backend::Symbol: :ad (ForwardDiff) or :fd (finite differences)

Backends and allocations:

  • :ad: Uses ForwardDiff via η path. Small allocations (~368 bytes) due to AD internals, but faster and more accurate.
  • :fd: Uses zero-allocation finite differences. Strict 0 bytes after warmup, but slightly slower due to multiple function evaluations.
  • Allocating convenience (marginal_effects_mu) allocates the result vector by design.

Recommendations:

  • Use :fd backend for strict zero-allocation requirements
  • Use :ad backend for speed and numerical accuracy (default)
source
Missing docstring.

Missing docstring for marginal_effects_mu. Check Documenter's build log for details.


Function Details

compile_formula(model, data) -> UnifiedCompiled

Compile a fitted model’s formula into a position-mapped, zero-allocation evaluator.

Arguments:

  • model: Fitted statistical model (GLM, MixedModel, etc.)
  • data: Tables.jl-compatible data (prefer a column table via Tables.columntable)

Returns:

  • UnifiedCompiled: Type-specialized evaluator with embedded position mappings

Example:

model = lm(@formula(y ~ x + group), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)

modelrow(model, data, row_index) -> Vector{Float64}

Evaluate model matrix row (allocating version).

Arguments:

  • model: Fitted statistical model or compiled formula
  • data: Data in Tables.jl format
  • row_index: Row index to evaluate (Int) or indices (Vector{Int}/AbstractVector)

Returns:

  • Vector{Float64} or Matrix{Float64}: Model matrix row(s)

Example:

row_vec = modelrow(model, data, 1)
multiple_rows = modelrow(model, data, [1, 5, 10])

modelrow!(output, compiled, data, row_indices)

In-place model matrix row evaluation (zero-allocation).

Arguments:

  • output: Pre-allocated output array (Vector or Matrix)
  • compiled: Compiled formula object
  • data: Data in Tables.jl format
  • row_indices: Row index (Int) or indices (AbstractVector)

Example:

compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
modelrow!(row_vec, compiled, data, 1)  # Zero allocations

# Multiple rows
matrix = Matrix{Float64}(undef, 10, length(compiled))
modelrow!(matrix, compiled, data, 1:10)

ModelRowEvaluator(model, data)

Create a reusable model row evaluator object.

Arguments:

  • model: Fitted statistical model
  • data: Data in DataFrame or Tables.jl format

Methods:

  • evaluator(row_index): Returns new vector (allocating)
  • evaluator(output, row_index): In-place evaluation (non-allocating)

Example:

evaluator = ModelRowEvaluator(model, df)
result = evaluator(1)  # Allocating
evaluator(row_vec, 1)  # Non-allocating

create_scenario(name, data; overrides...)

Create a data scenario with variable overrides.

Arguments:

  • name: Scenario name (String)
  • data: Base data in Tables.jl format
  • overrides...: Keyword arguments specifying variable overrides

Returns:

  • DataScenario: Scenario object with override data

Example:

scenario = create_scenario("treatment", data; 
    treatment = true,
    dose = 100.0
)

create_scenario_grid(name, data, parameter_dict; verbose=false)

Create all combinations of scenario parameters.

Arguments:

  • name: Base name for scenarios
  • data: Base data
  • parameter_dict: Dict mapping variables to vectors of values
  • verbose: Whether to print creation progress (default: false)

Returns:

  • Vector{DataScenario}: Vector of all parameter combinations

Example:

grid = create_scenario_grid("policy", data, Dict(
    :treatment => [false, true],
    :dose => [50, 100, 150]
); verbose=true)  # Creates 6 scenarios, prints progress

OverrideVector(value, length)

Create a memory-efficient constant vector.

Arguments:

  • value: Constant value to return
  • length: Vector length

Returns:

  • OverrideVector: Memory-efficient constant vector

Example:

# Traditional: 8MB for 1M elements
traditional = fill(42.0, 1_000_000)

# OverrideVector: ~32 bytes
efficient = OverrideVector(42.0, 1_000_000)

# Same interface
@assert traditional[500_000] == efficient[500_000]

Scenario Management Functions

set_override!(scenario, variable, value)

Add or update a variable override in a scenario.

remove_override!(scenario, variable)

Remove a variable override from a scenario.

update_scenario!(scenario; overrides...)

Bulk update multiple overrides in a scenario.

get_overrides(scenario)

Get dictionary of current overrides in a scenario.

Example:

scenario = create_scenario("dynamic", data)
set_override!(scenario, :x, 1.0)
update_scenario!(scenario; y = 2.0, z = 3.0)
overrides = get_overrides(scenario)  # Dict(:x => 1.0, :y => 2.0, :z => 3.0)
remove_override!(scenario, :z)

Integration Functions

fixed_effects_form(mixed_model)

Extract fixed effects formula from a MixedModel.

Arguments:

  • mixed_model: Fitted MixedModel

Returns:

  • FormulaTerm: Fixed effects portion of the formula

Example:

mixed = fit(MixedModel, @formula(y ~ x + (1|group)), df)
fixed_form = fixed_effects_form(mixed)  # Returns: y ~ x

Utility Functions

length(compiled_formula)

Get the number of terms in compiled formula (model matrix columns).

Example:

compiled = compile_formula(model, data)
n_terms = length(compiled)           # e.g., 4

Type System

Core Types

  • UnifiedCompiled: Position-mapped, zero-allocation compiled evaluator
  • DataScenario: Scenario with variable overrides
  • ScenarioCollection: Collection of related scenarios
  • OverrideVector{T}: Memory-efficient constant vector
  • ModelRowEvaluator: Reusable evaluator object

Internal Types

Operation types used by the unified compiler:

  • LoadOp{Column, OutPos}: Load a data column into a scratch position
  • ConstantOp{Value, OutPos}: Place a compile-time constant into scratch
  • UnaryOp{Func, InPos, OutPos}: Apply a unary function
  • BinaryOp{Func, InPos1, InPos2, OutPos}: Apply a binary operation
  • ContrastOp{Column, OutPositions}: Expand a categorical column via contrasts
  • CopyOp{InPos, OutIdx}: Copy from scratch to final output index

build_derivative_evaluator(compiled, data; vars, chunk=:auto)

Build a reusable ForwardDiff-based derivative evaluator for computing Jacobians and marginal effects.

Arguments:

  • compiled: Compiled formula from compile_formula
  • data: Tables.jl-compatible data (column table preferred)
  • vars: Vector of symbols for variables to differentiate with respect to
  • chunk: ForwardDiff chunk size (:auto uses length(vars))

Returns:

  • DerivativeEvaluator: Reusable evaluator with preallocated buffers

Performance:

  • One-time construction cost, then ≤512 bytes per derivative call (AD backend)
  • Contains preallocated Jacobian matrices and gradient vectors

Example:

compiled = compile_formula(model, data)
vars = continuous_variables(compiled, data)  # or [:x, :z]
de = build_derivative_evaluator(compiled, data; vars=vars)

derivative_modelrow!(J, evaluator, row)

Fill Jacobian matrix with derivatives of model row with respect to selected variables.

Arguments:

  • J: Pre-allocated matrix of size (length(compiled), length(vars))
  • evaluator: DerivativeEvaluator from build_derivative_evaluator
  • row: Row index to evaluate

Performance:

  • ≤512 bytes allocated per call (ForwardDiff internals)
  • Uses preallocated buffers for near-optimal efficiency

Example:

J = Matrix{Float64}(undef, length(compiled), length(de.vars))
derivative_modelrow!(J, de, 1)  # Fill J with derivatives

marginal_effects_eta!(g, evaluator, beta, row)

Compute marginal effects on linear predictor η = Xβ using chain rule.

Arguments:

  • g: Pre-allocated gradient vector of length length(vars)
  • evaluator: DerivativeEvaluator
  • beta: Model coefficients vector
  • row: Row index

Implementation:

  • Computes g = J' * β where J is the Jacobian matrix
  • Uses preallocated internal Jacobian buffer

Performance:

  • ≤512 bytes per call with preallocated buffers (AD backend)

Example:

β = coef(model)
g = Vector{Float64}(undef, length(de.vars))
marginal_effects_eta!(g, de, β, 1)

marginal_effects_mu!(g, evaluator, beta, row; link)

Compute marginal effects on mean μ via chain rule: dμ/dx = (dμ/dη) × (dη/dx).

Arguments:

  • g: Pre-allocated gradient vector
  • evaluator: DerivativeEvaluator
  • beta: Model coefficients
  • row: Row index
  • link: GLM link function (e.g., LogitLink(), LogLink())

Supported Links:

  • IdentityLink(), LogLink(), LogitLink(), ProbitLink()
  • CloglogLink(), CauchitLink(), InverseLink(), SqrtLink()
  • InverseSquareLink() (when available)

Performance:

  • ≤512 bytes per call with preallocated internal buffers (AD backend)

Example:

using GLM
marginal_effects_mu!(g, de, β, 1; link=LogitLink())

continuous_variables(compiled, data)

Extract continuous variable names from compiled operations, excluding categoricals.

Arguments:

  • compiled: Compiled formula
  • data: Data used in compilation

Returns:

  • Vector{Symbol}: Sorted list of continuous variable symbols

Example:

vars = continuous_variables(compiled, data)  # e.g., [:x, :z, :age]
de = build_derivative_evaluator(compiled, data; vars=vars)

Performance Notes

  • Core functions (modelrow!, compiled(row_vec, data, row)) achieve exactly 0 bytes allocated
  • Derivative functions achieve ≤512 bytes per call (ForwardDiff internals)
  • Marginal effects use preallocated buffers to minimize allocations (≤512 bytes)
  • compile_formula has one-time compilation cost but enables many fast evaluations
  • Use Tables.columntable format for best performance
  • Pre-allocate output vectors/matrices and reuse them across evaluations
  • Build derivative evaluators once and reuse across many calls