API Reference

Complete API reference for FormulaCompiler.jl functions and types.

Core Compilation Functions

FormulaCompiler.compile_formula — Function

compile_formula(model, data_example::NamedTuple) -> UnifiedCompiled

Primary API for compiling statistical models into high-performance evaluators.

Position Mapping System

This function implements a position mapping system that converts statistical formulas into zero-allocation execution plans. The system works in three phases:

Phase 1: Formula Decomposition

Extracts the schema-applied formula from the fitted model
Converts StatsModels terms into typed operations (LoadOp, ConstantOp, etc.)
Assigns unique scratch positions to intermediate values and output positions to final results

Phase 2: Position Allocation

Uses CompilationContext.position_map to track term → position mappings
Allocates consecutive scratch positions starting from 1
Maps each model matrix column to a specific output position

Phase 3: Type Specialization

Embeds all positions as compile-time type parameters
Creates operations like LoadOp{:x, 3}() (load column :x into scratch position 3)
Enables zero-allocation execution through complete type specialization

Position Mapping Examples

# Simple formula: y ~ 1 + x
# Position mapping:
# scratch[1] = 1.0          (intercept, ConstantOp{1.0, 1})
# scratch[2] = data.x[row]  (variable x, LoadOp{:x, 2})  
# output[1] = scratch[1]    (CopyOp{1, 1})
# output[2] = scratch[2]    (CopyOp{2, 2})

# Interaction: y ~ x * z  
# Position mapping:
# scratch[1] = data.x[row]     (LoadOp{:x, 1})
# scratch[2] = data.z[row]     (LoadOp{:z, 2}) 
# scratch[3] = scratch[1] * scratch[2]  (BinaryOp{:*, 1, 2, 3})
# output[1] = scratch[1], output[2] = scratch[2], output[3] = scratch[3]

# Function: y ~ log(x)
# Position mapping:
# scratch[1] = data.x[row]     (LoadOp{:x, 1})
# scratch[2] = log(scratch[1]) (UnaryOp{:log, 1, 2})
# output[1] = scratch[2]       (CopyOp{2, 1})

Performance Characteristics

Scratch space: Fixed size allocated once, reused for all rows
Type stability: All positions known at compile time → zero allocations
Execution: Pure array indexing with no dynamic dispatch
Memory: O(maxscratchpositions) + O(output_size) per formula

Arguments

model: Fitted statistical model (GLM, LMM, etc.) with schema-applied formula
data_example: NamedTuple with sample data for type inference and schema validation

Returns

UnifiedCompiled{T, OpsTuple, ScratchSize, OutputSize} containing:

Type-specialized operation tuple
Pre-allocated scratch buffer
Position mappings embedded in operation types

source

compile_formula(formula::StatsModels.FormulaTerm, data_example::NamedTuple) -> UnifiedCompiled

Convenience overload to compile directly from a StatsModels.FormulaTerm and column-table data. This mirrors the model-based entry point but skips get_fixed_effects_formula.

source

Model Row Evaluation

FormulaCompiler.modelrow — Function

modelrow(model, data, row_idx) -> Vector{Float64}

Evaluate a single row and return a new vector (allocating version). Uses compiled formulas for optimal performance.

Example

row_values = modelrow(model, data, 1)  # Returns Vector{Float64}

source

modelrow(model, data, row_indices) -> Matrix{Float64}

Evaluate multiple rows and return a new matrix (allocating version). Uses compiled formulas for optimal performance.

Example

matrix = modelrow(model, data, [1, 5, 10])  # Returns Matrix{Float64}

source

modelrow(compiled_formula, data, row_idx) -> Vector{Float64}

Evaluate a single row with pre-compiled compiled formula.

Example

compiled = compile_formula(model, data)
row_values = modelrow(compiled, data, 1)  # Returns Vector{Float64}

source

modelrow(compiled_formula, data, row_indices) -> Matrix{Float64}

Evaluate multiple rows with pre-compiled compiled formula.

Example

compiled = compile_formula(model, data)
matrix = modelrow(compiled, data, [1, 5, 10])  # Returns Matrix{Float64}

source

modelrow(model, scenario::DataScenario, row_idx) -> Vector{Float64}

Evaluate model row using a data scenario (allocating version).

source

modelrow(compiled::UnifiedCompiled, scenario::DataScenario, row_idx) -> Vector{Float64}

Evaluate model row using a data scenario with UnifiedCompiled (allocating version).

source

FormulaCompiler.modelrow! — Function

modelrow!(row_vec, compiled_formula, data, row_idx)

Evaluate a single row of the model matrix in-place (zero-allocation).

Arguments

row_vec::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)
compiled_formula: Compiled formula from compile_formula
data: Data in Tables.jl format (preferably from Tables.columntable)
row_idx::Int: Row index to evaluate

Returns

row_vec: The same vector passed in, now containing the evaluated row

Example

compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
modelrow!(row_vec, compiled, data, 1)  # Zero allocations

source

modelrow!(row_vec, model, data, row_idx; cache=true)

Evaluate a single row of the model matrix in-place with automatic compilation.

Arguments

row_vec::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)
model: Statistical model (GLM, MixedModel, etc.)
data: Data in Tables.jl format
row_idx::Int: Row index to evaluate
cache::Bool: Whether to cache compiled formula (default: true)

Returns

row_vec: The same vector passed in, now containing the evaluated row

Example

model = lm(@formula(y ~ x + group), df)
data = Tables.columntable(df)
row_vec = Vector{Float64}(undef, size(modelmatrix(model), 2))
modelrow!(row_vec, model, data, 1)

Note

First call compiles the formula. Subsequent calls reuse cached version when cache=true.

source

FormulaCompiler.ModelRowEvaluator — Type

ModelRowEvaluator{D, O}

Pre-compiled evaluator using compiled formulas only.

source

Override and Scenario System

FormulaCompiler.OverrideVector — Type

OverrideVector{T} <: AbstractVector{T}

A lazy vector that returns the same override value for all indices. This avoids allocating full arrays when setting all observations to a representative value.

Example

# Instead of: fill(2.5, 1_000_000)  # Allocates 8MB
# Use: OverrideVector(2.5, 1_000_000)  # Allocates ~32 bytes

source

FormulaCompiler.DataScenario — Type

DataScenario

Represents a data scenario with specific variable overrides. Contains the modified data that can be used directly with compiled formulas.

Fields

name::String: Descriptive name for the scenario
overrides::Dict{Symbol,Any}: Variable overrides (mutable for iterative development)
data::NamedTuple: Modified column-table data with OverrideVectors applied
original_data::NamedTuple: Original unmodified data for reference

source

FormulaCompiler.create_scenario — Function

create_scenario(name, original_data; overrides...)
create_scenario(name, original_data, overrides::Dict)

Create a data scenario with specified variable overrides for counterfactual analysis.

Arguments

name::String: Descriptive name for this scenario
original_data::NamedTuple: Original data in column-table format (from Tables.columntable)
overrides...: Keyword arguments for variable overrides (or Dict in second method)

Returns

DataScenario: Object containing original data, overrides, and modified data with OverrideVectors

Example

data = Tables.columntable(df)

# Override single variable to mean
scenario1 = create_scenario("x_at_mean", data; x = mean(data.x))

# Override multiple variables for policy analysis
scenario2 = create_scenario("policy", data; x = 2.5, group = "A", treatment = true)

# Use dictionary for dynamic overrides
overrides = Dict(:dose => 100.0, :region => "North")
scenario3 = create_scenario("high_dose_north", data, overrides)

# Evaluate with compiled formula (zero-allocation)
compiled = compile_formula(model)
row_vec = Vector{Float64}(undef, length(compiled))
compiled(row_vec, scenario1.data, row_idx)

Note

Uses memory-efficient OverrideVector to avoid data duplication. Each override creates a lazy vector returning the same value for all rows.

source

Near-Zero-Allocation Derivatives

FormulaCompiler.jl provides a sophisticated automatic differentiation system that achieves near-theoretical optimal allocation performance through aggressive optimization.

Performance Characteristics

Core evaluation: Exactly 0 allocations
Finite differences (FD): Exactly 0 allocations (optimized implementation)
ForwardDiff derivatives: ≤512 bytes per call (ForwardDiff internals)
Marginal effects: ≤512 bytes per call for AD backend (optimized with preallocated buffers)
Allocation efficiency: >99.75% compared to naive AD approaches
Validation: Cross-validated against finite differences (rtol=1e-6, atol=1e-8)

FormulaCompiler.build_derivative_evaluator — Function

build_derivative_evaluator(compiled, data; vars, chunk=:auto) -> DerivativeEvaluator

Build a ForwardDiff-based derivative evaluator for a fixed set of variables.

Arguments:

compiled::UnifiedCompiled: Result of compile_formula(model, data).
data::NamedTuple: Column-table data (e.g., Tables.columntable(df)).
vars::Vector{Symbol}: Variables to differentiate with respect to (typically continuous predictors).
chunk: ForwardDiff.Chunk{N}() or :auto (uses Chunk{length(vars)}).

Returns:

DerivativeEvaluator: Prebuilt evaluator object reusable across rows.

Notes:

Compile once per model + variable set; reuse across calls.
Zero allocations in steady state after warmup (typed closure + config; no per-call merges).
Keep vars fixed for best specialization.

source

FormulaCompiler.derivative_modelrow! — Function

derivative_modelrow!(J, deval, row) -> AbstractMatrix{Float64}

Fill J with the Jacobian of one model row with respect to deval.vars.

Arguments:

J::AbstractMatrix{Float64}: Preallocated buffer of size (n_terms, n_vars).
deval::DerivativeEvaluator: Built by build_derivative_evaluator.
row::Int: Row index (1-based).

Returns:

The same J buffer, with J[i, j] = ∂X[i]/∂vars[j] for the given row.

Notes:

Orientation is (n_terms, n_vars); n_terms == length(compiled).
Small allocations (~368 bytes) due to ForwardDiff internals. For strict zero-allocation requirements, use derivative_modelrow_fd! instead.

source

FormulaCompiler.derivative_modelrow — Function

derivative_modelrow(deval, row) -> Matrix{Float64}

Allocating convenience wrapper that returns the Jacobian for one row.

source

FormulaCompiler.derivative_modelrow_fd! — Function

derivative_modelrow_fd!(J, compiled, data, row; vars, step=:auto)

Finite-difference Jacobian for a single row using central differences (standalone).

Arguments:

J::AbstractMatrix{Float64}: Preallocated (n_terms, n_vars) buffer.
compiled::UnifiedCompiled: Result of compile_formula.
data::NamedTuple: Column-table data.
row::Int: Row index.
vars::Vector{Symbol}: Variables to differentiate with respect to.
step: Numeric step size or :auto (eps()^(1/3) * max(1, |x|)).

Notes:

Two evaluations per variable; useful as a robust fallback and for cross-checks.
This standalone path allocates per call (builds per-call overrides and small temporaries). For zero allocations after warmup, prefer the evaluator FD path (derivative_modelrow_fd_pos!).

source

Missing docstring.

Missing docstring for derivative_modelrow_fd. Check Documenter's build log for details.

FormulaCompiler.contrast_modelrow! — Function

contrast_modelrow!(Δ, compiled, data, row; var, from, to)

Compute a discrete contrast at one row for a single variable: Δ = X(to) − X(from).

Arguments:

Δ::AbstractVector{Float64}: Preallocated buffer of length n_terms.
compiled::UnifiedCompiled: Result of compile_formula.
data::NamedTuple: Column-table data.
row::Int: Row index.
var::Symbol: Variable to change (e.g., :group3).
from, to: Values to contrast (level names or CategoricalValue for categorical; numbers for discrete).

Notes:

Uses a row-local override; for categorical columns, values are normalized to the column's levels.

source

Missing docstring.

Missing docstring for contrast_modelrow. Check Documenter's build log for details.

FormulaCompiler.continuous_variables — Function

continuous_variables(compiled, data) -> Vector{Symbol}

Return a list of continuous variable symbols present in the compiled ops, excluding categoricals detected via ContrastOps. Filters by eltype(data[sym]) <: Real.

source

FormulaCompiler.marginal_effects_eta! — Function

marginal_effects_eta!(g, de, beta, row; backend=:ad)

Fill g with marginal effects of η = Xβ w.r.t. de.vars at row. Implements: g = J' * β, where J = ∂X/∂vars.

Arguments:

backend::Symbol: :ad (ForwardDiff) or :fd (finite differences)

Backends and allocations:

:ad: Uses ForwardDiff automatic differentiation. Small allocations (~368 bytes) due to AD internals, but faster and more accurate.
:fd: Uses zero-allocation finite differences. Strict 0 bytes after warmup, but slightly slower due to multiple function evaluations.
Allocating convenience (marginal_effects_eta) allocates the result vector by design.

Recommendations:

Use :fd backend for strict zero-allocation requirements
Use :ad backend for speed and numerical accuracy (default)

source

Missing docstring.

Missing docstring for marginal_effects_eta. Check Documenter's build log for details.

FormulaCompiler.marginal_effects_mu! — Function

marginal_effects_mu!(g, de, beta, row; link, backend=:ad)

Compute marginal effects of μ = g⁻¹(η) at row via chain rule: dμ/dx = (dμ/dη) * (dη/dx).

Arguments:

link: Link function (e.g., IdentityLink(), LogLink(), LogitLink())
backend::Symbol: :ad (ForwardDiff) or :fd (finite differences)

Backends and allocations:

:ad: Uses ForwardDiff via η path. Small allocations (~368 bytes) due to AD internals, but faster and more accurate.
:fd: Uses zero-allocation finite differences. Strict 0 bytes after warmup, but slightly slower due to multiple function evaluations.
Allocating convenience (marginal_effects_mu) allocates the result vector by design.

Recommendations:

Use :fd backend for strict zero-allocation requirements
Use :ad backend for speed and numerical accuracy (default)

source

Missing docstring.

Missing docstring for marginal_effects_mu. Check Documenter's build log for details.

Function Details

`compile_formula(model, data) -> UnifiedCompiled`

Compile a fitted model’s formula into a position-mapped, zero-allocation evaluator.

Arguments:

model: Fitted statistical model (GLM, MixedModel, etc.)
data: Tables.jl-compatible data (prefer a column table via Tables.columntable)

Returns:

UnifiedCompiled: Type-specialized evaluator with embedded position mappings

Example:

model = lm(@formula(y ~ x + group), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)

`modelrow(model, data, row_index) -> Vector{Float64}`

Evaluate model matrix row (allocating version).

Arguments:

model: Fitted statistical model or compiled formula
data: Data in Tables.jl format
row_index: Row index to evaluate (Int) or indices (Vector{Int}/AbstractVector)

Returns:

Vector{Float64} or Matrix{Float64}: Model matrix row(s)

Example:

row_vec = modelrow(model, data, 1)
multiple_rows = modelrow(model, data, [1, 5, 10])

`modelrow!(output, compiled, data, row_indices)`

In-place model matrix row evaluation (zero-allocation).

Arguments:

output: Pre-allocated output array (Vector or Matrix)
compiled: Compiled formula object
data: Data in Tables.jl format
row_indices: Row index (Int) or indices (AbstractVector)

Example:

compiled = compile_formula(model, data)
row_vec = Vector{Float64}(undef, length(compiled))
modelrow!(row_vec, compiled, data, 1)  # Zero allocations

# Multiple rows
matrix = Matrix{Float64}(undef, 10, length(compiled))
modelrow!(matrix, compiled, data, 1:10)

`ModelRowEvaluator(model, data)`

Create a reusable model row evaluator object.

Arguments:

model: Fitted statistical model
data: Data in DataFrame or Tables.jl format

Methods:

evaluator(row_index): Returns new vector (allocating)
evaluator(output, row_index): In-place evaluation (non-allocating)

Example:

evaluator = ModelRowEvaluator(model, df)
result = evaluator(1)  # Allocating
evaluator(row_vec, 1)  # Non-allocating

`create_scenario(name, data; overrides...)`

Create a data scenario with variable overrides.

Arguments:

name: Scenario name (String)
data: Base data in Tables.jl format
overrides...: Keyword arguments specifying variable overrides

Returns:

DataScenario: Scenario object with override data

Example:

scenario = create_scenario("treatment", data; 
    treatment = true,
    dose = 100.0
)

`create_scenario_grid(name, data, parameter_dict; verbose=false)`

Create all combinations of scenario parameters.

Arguments:

name: Base name for scenarios
data: Base data
parameter_dict: Dict mapping variables to vectors of values
verbose: Whether to print creation progress (default: false)

Returns:

Vector{DataScenario}: Vector of all parameter combinations

Example:

grid = create_scenario_grid("policy", data, Dict(
    :treatment => [false, true],
    :dose => [50, 100, 150]
); verbose=true)  # Creates 6 scenarios, prints progress

`OverrideVector(value, length)`

Create a memory-efficient constant vector.

Arguments:

value: Constant value to return
length: Vector length

Returns:

OverrideVector: Memory-efficient constant vector

Example:

# Traditional: 8MB for 1M elements
traditional = fill(42.0, 1_000_000)

# OverrideVector: ~32 bytes
efficient = OverrideVector(42.0, 1_000_000)

# Same interface
@assert traditional[500_000] == efficient[500_000]

scenario = create_scenario("dynamic", data)
set_override!(scenario, :x, 1.0)
update_scenario!(scenario; y = 2.0, z = 3.0)
overrides = get_overrides(scenario)  # Dict(:x => 1.0, :y => 2.0, :z => 3.0)
remove_override!(scenario, :z)

Integration Functions

`fixed_effects_form(mixed_model)`

Extract fixed effects formula from a MixedModel.

Arguments:

mixed_model: Fitted MixedModel

Returns:

FormulaTerm: Fixed effects portion of the formula

Example:

mixed = fit(MixedModel, @formula(y ~ x + (1|group)), df)
fixed_form = fixed_effects_form(mixed)  # Returns: y ~ x

Utility Functions

`length(compiled_formula)`

Get the number of terms in compiled formula (model matrix columns).

Example:

compiled = compile_formula(model, data)
n_terms = length(compiled)           # e.g., 4

Type System

Core Types

UnifiedCompiled: Position-mapped, zero-allocation compiled evaluator
DataScenario: Scenario with variable overrides
ScenarioCollection: Collection of related scenarios
OverrideVector{T}: Memory-efficient constant vector
ModelRowEvaluator: Reusable evaluator object

Internal Types

Operation types used by the unified compiler:

LoadOp{Column, OutPos}: Load a data column into a scratch position
ConstantOp{Value, OutPos}: Place a compile-time constant into scratch
UnaryOp{Func, InPos, OutPos}: Apply a unary function
BinaryOp{Func, InPos1, InPos2, OutPos}: Apply a binary operation
ContrastOp{Column, OutPositions}: Expand a categorical column via contrasts
CopyOp{InPos, OutIdx}: Copy from scratch to final output index

`build_derivative_evaluator(compiled, data; vars, chunk=:auto)`

Build a reusable ForwardDiff-based derivative evaluator for computing Jacobians and marginal effects.

Arguments:

compiled: Compiled formula from compile_formula
data: Tables.jl-compatible data (column table preferred)
vars: Vector of symbols for variables to differentiate with respect to
chunk: ForwardDiff chunk size (:auto uses length(vars))

Returns:

DerivativeEvaluator: Reusable evaluator with preallocated buffers

Performance:

One-time construction cost, then ≤512 bytes per derivative call (AD backend)
Contains preallocated Jacobian matrices and gradient vectors

Example:

compiled = compile_formula(model, data)
vars = continuous_variables(compiled, data)  # or [:x, :z]
de = build_derivative_evaluator(compiled, data; vars=vars)

`derivative_modelrow!(J, evaluator, row)`

Fill Jacobian matrix with derivatives of model row with respect to selected variables.

Arguments:

J: Pre-allocated matrix of size (length(compiled), length(vars))
evaluator: DerivativeEvaluator from build_derivative_evaluator
row: Row index to evaluate

Performance:

≤512 bytes allocated per call (ForwardDiff internals)
Uses preallocated buffers for near-optimal efficiency

Example:

J = Matrix{Float64}(undef, length(compiled), length(de.vars))
derivative_modelrow!(J, de, 1)  # Fill J with derivatives

`marginal_effects_eta!(g, evaluator, beta, row)`

Compute marginal effects on linear predictor η = Xβ using chain rule.

Arguments:

g: Pre-allocated gradient vector of length length(vars)
evaluator: DerivativeEvaluator
beta: Model coefficients vector
row: Row index

Implementation:

Computes g = J' * β where J is the Jacobian matrix
Uses preallocated internal Jacobian buffer

Performance:

≤512 bytes per call with preallocated buffers (AD backend)

Example:

β = coef(model)
g = Vector{Float64}(undef, length(de.vars))
marginal_effects_eta!(g, de, β, 1)

`marginal_effects_mu!(g, evaluator, beta, row; link)`

Compute marginal effects on mean μ via chain rule: dμ/dx = (dμ/dη) × (dη/dx).

Arguments:

g: Pre-allocated gradient vector
evaluator: DerivativeEvaluator
beta: Model coefficients
row: Row index
link: GLM link function (e.g., LogitLink(), LogLink())

Supported Links:

IdentityLink(), LogLink(), LogitLink(), ProbitLink()
CloglogLink(), CauchitLink(), InverseLink(), SqrtLink()
InverseSquareLink() (when available)

Performance:

≤512 bytes per call with preallocated internal buffers (AD backend)

Example:

using GLM
marginal_effects_mu!(g, de, β, 1; link=LogitLink())

`continuous_variables(compiled, data)`

Extract continuous variable names from compiled operations, excluding categoricals.

Arguments:

compiled: Compiled formula
data: Data used in compilation

Returns:

Vector{Symbol}: Sorted list of continuous variable symbols

Example:

vars = continuous_variables(compiled, data)  # e.g., [:x, :z, :age]
de = build_derivative_evaluator(compiled, data; vars=vars)

Performance Notes

Core functions (modelrow!, compiled(row_vec, data, row)) achieve exactly 0 bytes allocated
Derivative functions achieve ≤512 bytes per call (ForwardDiff internals)
Marginal effects use preallocated buffers to minimize allocations (≤512 bytes)
compile_formula has one-time compilation cost but enables many fast evaluations
Use Tables.columntable format for best performance
Pre-allocate output vectors/matrices and reuse them across evaluations
Build derivative evaluators once and reuse across many calls

API Reference

Core Compilation Functions

Model Row Evaluation

Override and Scenario System

Near-Zero-Allocation Derivatives

Performance Characteristics

Function Details

`compile_formula(model, data) -> UnifiedCompiled`

`modelrow(model, data, row_index) -> Vector{Float64}`

`modelrow!(output, compiled, data, row_indices)`

`ModelRowEvaluator(model, data)`

`create_scenario(name, data; overrides...)`

`create_scenario_grid(name, data, parameter_dict; verbose=false)`

`OverrideVector(value, length)`

Scenario Management Functions

`set_override!(scenario, variable, value)`

`remove_override!(scenario, variable)`

`update_scenario!(scenario; overrides...)`

`get_overrides(scenario)`

Integration Functions

`fixed_effects_form(mixed_model)`

Utility Functions

`length(compiled_formula)`

Type System

Core Types

Internal Types

`build_derivative_evaluator(compiled, data; vars, chunk=:auto)`

`derivative_modelrow!(J, evaluator, row)`

`marginal_effects_eta!(g, evaluator, beta, row)`

`marginal_effects_mu!(g, evaluator, beta, row; link)`

`continuous_variables(compiled, data)`

Performance Notes