API Reference
API reference for FormulaCompiler.jl functions and types.
Core Compilation Functions
FormulaCompiler.compile_formula — Functioncompile_formula(model, data) -> UnifiedCompiledCompile a fitted statistical model into a zero-allocation, type-specialized evaluator.
Transforms statistical formulas into optimized computational engines using position mapping that achieves ~50ns per row evaluation with zero allocations. The resulting evaluator provides constant-time row access regardless of dataset size.
Arguments
model: Fitted statistical model (GLM.LinearModel,GLM.GeneralizedLinearModel,MixedModels.LinearMixedModel, etc.)data: Data in Tables.jl format (preferablyTables.columntable(df)for optimal performance)
Returns
UnifiedCompiled{T,Ops,S,O}: Callable evaluator with embedded position mappings- Call as
compiled(output_vector, data, row_index)for zero-allocation evaluation length(compiled)returns number of model matrix columns
- Call as
Performance Characteristics
- Compilation: One-time cost for complex formulas
- Evaluation: Zero bytes allocated after warmup
- Memory: O(output_size) scratch space, reused across all evaluations
- Scaling: Evaluation time independent of dataset size
Supported Models
- Linear models:
GLM.lm(@formula(y ~ x + group), df) - Generalized linear models:
GLM.glm(@formula(success ~ x), df, Binomial(), LogitLink()) - Mixed models:
MixedModels.fit(MixedModel, @formula(y ~ x + (1|group)), df)(fixed effects only) - Custom contrasts: Models with
DummyCoding(),EffectsCoding(),HelmertCoding(), etc. - Standardized predictors: Models with
ZScore()standardization
Formula Features
- Basic terms:
x,log(z),x^2,(x > 0), integer and float variables - Categorical variables: Must use
CategoricalArrays.jlformat - raw strings not supported - Interactions:
x * group,x * y * z,log(x) * group - Functions:
log,exp,sqrt,sin,cos,abs,^(integer and fractional powers) - Boolean conditions:
(x > 0),(z >= mean(z)),(group == "A") - Complex formulas:
x * log(abs(z)) * group + sqrt(y) + (w > threshold)
Data Requirements
- Categorical variables: Must use
categorical(column)before model fitting - Missing values: Not supported - remove with
dropmissing()or impute before compilation - Table format: Use
Tables.columntable(df)for optimal performance
Example
using FormulaCompiler, GLM, DataFrames, Tables, CategoricalArrays
# Fit model
df = DataFrame(
y = randn(1000),
x = randn(1000),
group = categorical(rand(["A", "B"], 1000)) # Required: use categorical()
)
model = lm(@formula(y ~ x * group + log(abs(x) + 1)), df)
# Compile once
data = Tables.columntable(df) # Convert for optimal performance
compiled = compile_formula(model, data)
# Use many times (zero allocations)
output = Vector{Float64}(undef, length(compiled))
compiled(output, data, 1) # Zero allocations
compiled(output, data, 500) # Zero allocations
# Substantial speedup compared to modelmatrix(model)[row, :]Mixed Models Example
using MixedModels
mixed = fit(MixedModel, @formula(y ~ x + treatment + (1|subject)), df)
compiled = compile_formula(mixed, data) # Compiles fixed effects: y ~ x + treatmentSee also: modelrow!, ModelRowEvaluator
compile_formula(formula::StatsModels.FormulaTerm, data) -> UnifiedCompiledCompile a formula directly without a fitted model for zero-allocation evaluation.
This overload enables compilation from raw formulas, bypassing model fitting when only the computational structure is needed. Useful for custom model implementations or direct formula evaluation workflows.
Arguments
formula::StatsModels.FormulaTerm: Formula specification (e.g., from@formula(y ~ x + group))data: Data in Tables.jl format (preferablyTables.columntable(df))
Returns
UnifiedCompiled{T,Ops,S,O}: Zero-allocation evaluator, same interface as model-based compilation
Performance
- Compilation: Fast for complex formulas
- Evaluation: Zero bytes allocated
- Memory: Identical performance to model-based compilation
Example
using StatsModels, FormulaCompiler, Tables
# Direct formula compilation
formula = @formula(y ~ x * group + log(z))
data = Tables.columntable(df)
compiled = compile_formula(formula, data)
# Zero-allocation evaluation
output = Vector{Float64}(undef, length(compiled))
compiled(output, data, 1) # Zero allocationsUse Cases
- Custom model implementations requiring direct formula evaluation
- Performance-critical applications avoiding model fitting overhead
- Exploratory analysis with formula variations
- Integration with external statistical frameworks
See also: compile_formula(model, data) for model-based compilation
FormulaCompiler.get_or_compile_formula — Functionget_or_compile_formula(model, data)Get cached compiled formula or compile new one with semantic type-aware caching.
Cache Key Strategy
Creates cache key based on:
- Model object (coefficients, structure)
- Column names (formula structure)
- Semantic type categories (compilation behavior)
Type Category Benefits
- Better cache hits: Vector{Int} and Vector{Float64} share cache entry
- Correct mixture handling: CategoricalArray vs CategoricalMixture distinguished
- Future-proof: New types can be added to category system
Examples
# These share a cache entry (both :numeric):
data1 = (x = Float64[1.0, 2.0], y = ...)
data2 = (x = Int[1, 2], y = ...) # Cache HIT ✓
# These get separate entries (different compilation):
data3 = (edu = categorical(["HS"]), ...) # :categorical
data4 = (edu = mix("HS" => 0.5, "C" => 0.5), ...) # :mixture - Cache MISS ✓Model Row Evaluation
FormulaCompiler.modelrow — Functionmodelrow(model, data, row_idx) -> Vector{Float64}Evaluate a single model matrix row, returning a new vector (allocating version).
Convenient interface for when pre-allocation is not practical. Uses internal formula compilation and caching for performance optimization, though the non-allocating modelrow! interface is preferred for performance-critical code.
Arguments
model: Fitted statistical model (GLM, MixedModel, etc.)data: Data in Tables.jl formatrow_idx::Int: Row index to evaluate (1-based)
Returns
Vector{Float64}: New vector containing model matrix row values
Performance
- First call: Includes one-time compilation cost
- Subsequent calls: Fast evaluation plus allocation cost for vector creation
- Memory: Allocates new vector each call
- Caching: Automatically caches compiled formula for reuse
Example
using FormulaCompiler, GLM
model = lm(@formula(y ~ x * group + log(z)), df)
data = Tables.columntable(df)
# Convenient single-row evaluation
row_1 = modelrow(model, data, 1) # First call (includes compilation)
row_2 = modelrow(model, data, 2) # Subsequent calls (uses cached compilation)
row_100 = modelrow(model, data, 100) # Fast (uses cached compilation)When to Use
- Prototyping: Quick analysis and exploration
- Small datasets: When allocation overhead is negligible
- Convenience: When code simplicity outweighs performance requirements
Performance Alternative
For zero-allocation performance in loops, use modelrow!:
output = Vector{Float64}(undef, length(compile_formula(model, data)))
for i in 1:n_iterations
modelrow!(output, compiled, data, i) # Zero allocations each iteration
endSee also: modelrow!, ModelRowEvaluator, compile_formula
modelrow(model, data, row_indices) -> Matrix{Float64}Evaluate multiple rows and return a new matrix (allocating version). Uses compiled formulas for optimal performance.
Example
matrix = modelrow(model, data, [1, 5, 10]) # Returns Matrix{Float64}modelrow(compiled_formula, data, row_idx) -> Vector{Float64}Evaluate a single row with pre-compiled compiled formula.
Example
compiled = compile_formula(model, data)
row_values = modelrow(compiled, data, 1) # Returns Vector{Float64}modelrow(compiled_formula, data, row_indices) -> Matrix{Float64}Evaluate multiple rows with pre-compiled compiled formula.
Example
compiled = compile_formula(model, data)
matrix = modelrow(compiled, data, [1, 5, 10]) # Returns Matrix{Float64}FormulaCompiler.modelrow! — Functionmodelrow!(output, compiled, data, row_idx) -> outputEvaluate a single model matrix row in-place with zero allocations.
The primary interface for high-performance row evaluation. This function provides zero-allocation evaluation, making it suitable for tight computational loops and performance-critical applications.
Arguments
output::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)- Must have length ≥
length(compiled) - Contents will be overwritten with model matrix row values
- Must have length ≥
compiled: Compiled formula fromcompile_formula(model, data)data: Data in Tables.jl format (preferablyTables.columntable(df)for best performance)row_idx::Int: Row index to evaluate (1-based indexing)
Returns
output: The same vector passed in, now containing the evaluated model matrix row
Performance
- Memory: Zero bytes allocated after warmup
- Scaling: Constant time regardless of dataset size or formula complexity
- Validation: Tested across 2000+ diverse formula configurations
Example
using FormulaCompiler, GLM, Tables
# Setup (one-time cost)
model = lm(@formula(y ~ x * group + log(z)), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)
output = Vector{Float64}(undef, length(compiled))
# High-performance evaluation (repeated many times)
modelrow!(output, compiled, data, 1) # Zero allocations
modelrow!(output, compiled, data, 100) # Zero allocations
# Monte Carlo simulation example
for i in 1:1_000_000
row_idx = rand(1:nrow(df))
modelrow!(output, compiled, data, row_idx) # Zero allocations each call
# Process output...
endError Handling
BoundsError: Ifrow_idxexceeds data sizeDimensionMismatch: Ifoutputvector is too small- Validates arguments in debug builds
See also: modelrow for allocating version, compile_formula, ModelRowEvaluator
modelrow!(row_vec, model, data, row_idx; cache=true)Evaluate a single row of the model matrix in-place with automatic compilation.
Arguments
row_vec::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)model: Statistical model (GLM, MixedModel, etc.)data: Data in Tables.jl formatrow_idx::Int: Row index to evaluatecache::Bool: Whether to cache compiled formula (default: true)
Returns
row_vec: The same vector passed in, now containing the evaluated row
Example
model = lm(@formula(y ~ x + group), df)
data = Tables.columntable(df)
row_vec = Vector{Float64}(undef, size(modelmatrix(model), 2))
modelrow!(row_vec, model, data, 1)FormulaCompiler.ModelRowEvaluator — TypeModelRowEvaluator{T,Ops,S,O}Object-oriented interface for reusable, pre-compiled model evaluation.
Combines compiled formula, data, and output buffer into a single object that can be called repeatedly for both allocating and non-allocating row evaluation. Useful when the same model and data will be evaluated many times.
Type Parameters
T: Element type (typicallyFloat64)Ops: Compiled operations tuple typeS: Scratch buffer sizeO: Output vector size
Fields
compiled::UnifiedCompiled: Pre-compiled formuladata::NamedTuple: Data in column-table formatrow_vec::Vector{Float64}: Internal buffer for non-allocating calls
Constructors
ModelRowEvaluator(model, df::DataFrame) # Converts DataFrame to column table
ModelRowEvaluator(model, data::NamedTuple) # Uses data directlyInterface
# Allocating interface - returns new vector
result = evaluator(row_idx)
# Non-allocating interface - uses provided vector
evaluator(output_vector, row_idx)Performance
- Construction: One-time compilation cost
- Allocating calls: Fast evaluation plus allocation cost
- Non-allocating calls: Zero bytes allocated
- Memory: Minimal overhead beyond compiled formula and data reference
Example
using FormulaCompiler, GLM
# Create evaluator (one-time setup)
model = lm(@formula(y ~ x * group + log(z)), df)
evaluator = ModelRowEvaluator(model, df)
# Allocating interface (convenient)
row_1 = evaluator(1) # Returns Vector{Float64}
row_2 = evaluator(100) # Returns Vector{Float64}
# Non-allocating interface (fast)
output = Vector{Float64}(undef, length(evaluator))
evaluator(output, 1) # Zero allocations
evaluator(output, 100) # Zero allocations
# Batch processing
results = Matrix{Float64}(undef, 1000, length(evaluator))
for i in 1:1000
evaluator(view(results, i, :), i) # Zero allocations
endWhen to Use
- Repeated evaluation: Same model and data used many times
- Object-oriented style: Prefer objects over function calls
- Mixed interfaces: Need both allocating and non-allocating evaluation
- Clean encapsulation: Bundle model, data, and buffer management
See also: modelrow!, modelrow, compile_formula
Derivatives
FormulaCompiler provides computational primitives for computing derivatives of model matrix rows with respect to continuous variables. These functions enable zero-allocation Jacobian computation using either automatic differentiation (ForwardDiff) or finite differences.
For marginal effects, standard errors, and complete statistical workflows, see Margins.jl.
Evaluator Construction
Recommended: Use the unified dispatcher for user-facing code:
# Automatic differentiation (preferred)
de = derivativeevaluator(:ad, compiled, data, [:x, :z])
# Finite differences
de = derivativeevaluator(:fd, compiled, data, [:x, :z])Advanced: Direct constructor functions (primarily for internal use):
Missing docstring for derivativeevaluator. Check Documenter's build log for details.
FormulaCompiler.derivativeevaluator_fd — Functionderivativeevaluator_fd(compiled, data, vars) -> FDEvaluatorCreate a finite differences specialized FDEvaluator using Float64 counterfactual vectors.
Returns a concrete FDEvaluator with only FD infrastructure, no field pollution from AD. Uses NumericCounterfactualVector{Float64} for type-stable counterfactual operations.
FormulaCompiler.derivativeevaluator_ad — Functionderivativeevaluator_ad(compiled, data, vars) -> ADEvaluatorCreate an automatic differentiation specialized ADEvaluator using Dual counterfactual vectors.
Returns a concrete ADEvaluator with only AD infrastructure, no field pollution from FD. Uses NumericCounterfactualVector{Dual{...}} for type-stable dual number operations.
Jacobian Computation
FormulaCompiler.derivative_modelrow! — Functionderivative_modelrow!(J, de::ADEvaluator, row) -> JPrimary automatic differentiation API - zero allocations via ForwardDiff.jacobian!.
Use cached ForwardDiff configuration for zero allocations. Replaces manual dual construction with ForwardDiff's optimized jacobian! routine.
Arguments
J::AbstractMatrix{Float64}: Preallocated Jacobian buffer of size(n_terms, n_vars)de::ADEvaluator: AD evaluator built byderivativeevaluator(:ad, compiled, data, vars)row::Int: Row index to evaluate (1-based indexing)
Returns
J: The same matrix passed in, now containingJ[i,j] = ∂X[i]/∂vars[j]for the specified row
Performance Characteristics
- Memory: 0 bytes allocated (cached buffers and ForwardDiff config)
- Speed: Target ~60ns with ForwardDiff.jacobian! optimization
- Accuracy: Machine precision derivatives via ForwardDiff dual arithmetic
Example
using FormulaCompiler, GLM
# Setup model
model = lm(@formula(y ~ x + z), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)
# Build AD evaluator
de = derivativeevaluator(:ad, compiled, data, [:x, :z])
# Zero-allocation Jacobian computation
J = Matrix{Float64}(undef, length(compiled), length(de.vars))
derivative_modelrow!(J, de, 1) # 0 bytes allocatedderivative_modelrow!(J, de::FDEvaluator, row) -> JPrimary finite differences API - zero allocations, concrete type dispatch.
Computes full Jacobian matrix ∂X[i]/∂vars[j] using central differences with adaptive step sizing. Matches automatic_diff.jl signature for seamless backend switching.
Performance Characteristics
- Memory: 0 bytes allocated (uses pre-allocated FDEvaluator buffers)
- Speed: ~65ns per variable with mathematical optimizations
- Accuracy: Adaptive step sizing balances truncation/roundoff error
Mathematical Method
Central differences: ∂f/∂x ≈ [f(x+h) - f(x-h)] / (2h) Step sizing: h = ε^(1/3) * max(1, |x|) for numerical stability
Arguments
J::AbstractMatrix{Float64}: Pre-allocated Jacobian buffer of size(n_terms, n_vars)de::FDEvaluator: Pre-built evaluator fromderivativeevaluator_fd(compiled, data, vars)row::Int: Row index to evaluate (1-based indexing)
Returns
J: The same matrix passed in, containingJ[i,j] = ∂X[i]/∂vars[j]
Example
using FormulaCompiler, GLM
# Setup model and data
model = lm(@formula(y ~ x * group + log(abs(z) + 1)), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)
# Build FD evaluator
de_fd = derivativeevaluator_fd(compiled, data, [:x, :z])
# Zero-allocation finite differences
J = Matrix{Float64}(undef, length(compiled), length(de_fd.vars))
derivative_modelrow!(J, de_fd, 1) # 0 bytes allocatedSee also: derivativeevaluator_fd
Missing docstring for derivative_modelrow. Check Documenter's build log for details.
Variable Identification
FormulaCompiler.continuous_variables — Functioncontinuous_variables(compiled, data) -> Vector{Symbol}Identify continuous variables suitable for derivative computation from a compiled formula.
Analyzes compiled operations to distinguish between continuous variables (suitable for differentiation) and categorical variables (requiring discrete analysis). Essential for determining valid variable sets for derivative evaluators and marginal effects computation.
Arguments
compiled::UnifiedCompiled: Compiled formula fromcompile_formula(model, data)data::NamedTuple: Data in column-table format (fromTables.columntable(df))
Returns
Vector{Symbol}: Sorted list of continuous variable names- Includes: Float64, Int64, Int32, Int variables used in LoadOp operations
- Excludes: Variables appearing only in ContrastOp operations (categorical contrasts)
- Excludes: Boolean variables (treated as categorical regardless of numeric type)
Classification Algorithm
- Operation analysis: Scan compiled operations for LoadOp vs ContrastOp usage
- Type filtering: Verify variables have Real element types in data
- Boolean exclusion: Remove Bool variables (categorical by convention)
- Categorical exclusion: Remove variables only appearing in contrast operations
Example
using FormulaCompiler, GLM, CategoricalArrays
# Mixed variable types
df = DataFrame(
y = randn(1000),
price = randn(1000), # Float64 - continuous
quantity = rand(1:100, 1000), # Int64 - continuous
available = rand(Bool, 1000), # Bool - categorical
category = categorical(rand(["A", "B", "C"], 1000)) # Categorical - categorical
)
model = lm(@formula(y ~ price + quantity + available + category), df)
compiled = compile_formula(model, Tables.columntable(df))
# Identify continuous variables
continuous_vars = continuous_variables(compiled, Tables.columntable(df))
# Returns: [:price, :quantity]
# Use for derivative evaluator construction
de_fd = derivativeevaluator_fd(compiled, Tables.columntable(df), continuous_vars)
de_ad = derivativeevaluator_ad(compiled, Tables.columntable(df), continuous_vars)Use Cases
- Pre-validation: Check variable suitability before building derivative evaluators
- Automatic selection: Programmatically identify all differentiable variables
- Error prevention: Avoid attempting derivatives on categorical variables
- Model introspection: Understand variable roles in compiled formulas
Implementation Details
- Scans LoadOp operations for direct variable usage (continuous indicators)
- Identifies ContrastOp operations for categorical variable detection
- Applies type checking to ensure Real element types in the actual data
- Returns sorted list for consistent ordering across calls
See also: derivativeevaluator_fd, derivativeevaluator_ad, derivative_modelrow!
Link Function Derivatives
Computational primitives for GLM link function derivatives (used by Margins.jl for computing marginal effects on the mean response).
FormulaCompiler.supported_link_functions — Functionsupported_link_functions() -> Vector{String}Return list of GLM link functions with implemented dmudeta methods.
Note: Link function support is now determined by Julia's method dispatch. Any link function with a dmudeta method will work automatically. This function provides a convenience list of commonly tested functions.
Example
links = supported_link_functions()
println("Common GLM links: ", join(links, ", "))Categorical Contrasts
FormulaCompiler.ContrastEvaluator — TypeContrastEvaluator{T, Ops, S, O, NTMerged, CounterfactualTuple}Zero-allocation evaluator for categorical and binary variable contrasts.
Provides efficient discrete marginal effects computation by pre-allocating all buffers and pre-computing categorical level mappings. Eliminates the ~2KB allocation overhead of the basic contrast_modelrow! function for batch contrast operations.
Uses typed counterfactual vectors for type-stable, zero-allocation performance.
Fields
compiled: Base compiled formula evaluatorvars: Variables available for contrast computationdata_counterfactual: Counterfactual data structure for variable substitutioncounterfactuals: Tuple of typed CounterfactualVector{T} subtypes for each variabley_from_buf: Pre-allocated buffer for "from" level evaluationy_to_buf: Pre-allocated buffer for "to" level evaluationrow: Current row being processed
Performance
- Zero allocations after construction for all contrast operations
- Type stability via typed counterfactual vectors
- Buffer reuse across multiple contrasts and rows
- Type specialization for compiled formula operations
Usage
# One-time setup
evaluator = contrastevaluator(compiled, data, [:treatment, :education])
contrast_buf = Vector{Float64}(undef, length(compiled))
# Fast repeated contrasts (zero allocations)
for row in 1:n_rows
contrast_modelrow!(contrast_buf, evaluator, row, :treatment, "Control", "Drug")
# Process contrast_buf...
endFormulaCompiler.contrastevaluator — Functioncontrastevaluator(compiled, data, vars) -> ContrastEvaluatorConstruct a ContrastEvaluator for efficient categorical and binary contrast computation.
Pre-allocates all necessary buffers and pre-computes categorical level mappings to eliminate allocations during contrast evaluation.
Arguments
compiled: Result fromcompile_formula(model, data)data: Column-table data as NamedTuplevars: Vector of variable symbols available for contrasts
Returns
ContrastEvaluator configured for zero-allocation contrast computation.
Performance Notes
- One-time cost: Setup involves building override structures and categorical mappings
- Categorical optimization: Level mappings computed once, reused for all contrasts
- Memory efficiency: Buffers sized exactly for the compiled formula
Example
# Setup for categorical contrasts
evaluator = contrastevaluator(compiled, data, [:group, :region, :binary_var])
# Zero-allocation usage
contrast_buf = Vector{Float64}(undef, length(compiled))
contrast_modelrow!(contrast_buf, evaluator, 1, :group, "Control", "Treatment")FormulaCompiler.CategoricalLevelMap — TypeCategoricalLevelMap{Var, LevelTuple}Stores pre-computed level mappings for a categorical variable in contrast evaluators.
Similar to ContrastOp, this struct uses type parameters for compile-time specialization while storing runtime level data as a field.
Type Parameters
Var::Symbol: Variable name (e.g.,:group,:treatment)LevelTuple: Type of the levels tuple (e.g.,NTuple{3, Tuple{String, CategoricalValue{UInt32}}})
Fields
levels: Tuple of (level, CategoricalValue) pairs preserving natural level types
Example
# String categorical with 3 levels
CategoricalLevelMap{:group, NTuple{3, Tuple{String, CategoricalValue{UInt32}}}}(
(("Control", catval1), ("Treatment", catval2), ("Placebo", catval3))
)
# Integer categorical with 5 levels
CategoricalLevelMap{:age_group, NTuple{5, Tuple{Int64, CategoricalValue{UInt32}}}}(
((1, catval1), (2, catval2), (3, catval3), (4, catval4), (5, catval5))
)Performance
- Zero allocations: All types concrete, fully specialized
- Natural types: No String conversion needed for Int/Symbol levels
- Fast lookup: Linear search through small tuple (2-10 levels typical)
FormulaCompiler.contrast_modelrow! — Functioncontrast_modelrow!(Δ, evaluator, row, var, from, to) -> ΔCompute discrete contrast using pre-allocated ContrastEvaluator (zero allocations).
Evaluates Δ = X(var=to) - X(var=from) using the evaluator's pre-allocated buffers and pre-computed categorical mappings for optimal performance.
Arguments
Δ::AbstractVector{Float64}: Output contrast vector (modified in-place)evaluator::ContrastEvaluator: Pre-configured contrast evaluatorrow::Int: Row index to evaluatevar::Symbol: Variable to contrast (must be inevaluator.vars)from: Reference level (baseline)to: Target level (comparison)
Performance
- Zero allocations - uses pre-allocated buffers from evaluator
- Categorical optimization - uses pre-computed level mappings
- Type specialization - compiled formula operations fully optimized
Error Handling
- Validates that
varexists in evaluator's variable list - Handles both categorical and numeric variable types
- Provides clear error messages for invalid level specifications
Example
evaluator = contrastevaluator(compiled, data, [:treatment])
contrast_buf = Vector{Float64}(undef, length(compiled))
# Zero-allocation contrast computation
contrast_modelrow!(contrast_buf, evaluator, 1, :treatment, "Control", "Drug")
# contrast_buf now contains the discrete effect vectorFormulaCompiler.contrast_gradient! — Functioncontrast_gradient!(∇β, evaluator, row, var, from, to, β, [link]) -> ∇βCompute parameter gradients for discrete effects: ∂(discrete_effect)/∂β - zero allocations.
Computes the gradient of discrete marginal effects with respect to model parameters using the mathematical formula:
- Linear scale (η): ∇β = ΔX = X₁ - X₀ (contrast vector)
- Response scale (μ): ∇β = g'(η₁) × X₁ - g'(η₀) × X₀ (chain rule with link derivatives)
This enables uncertainty quantification via the delta method: SE = √(∇β' Σ ∇β).
Arguments
∇β::AbstractVector{Float64}: Output gradient vector (modified in-place)evaluator::ContrastEvaluator: Pre-configured contrast evaluatorrow::Int: Row index to evaluatevar::Symbol: Variable to contrast (must be inevaluator.vars)from: Reference level (baseline)to: Target level (comparison)β::AbstractVector{<:Real}: Model coefficients (used only for response-scale computation)link: GLM link function (optional, defaults to linear scale)
Returns
∇β: The same vector passed in, containing parameter gradients ∂(discrete_effect)/∂β
Performance
- Zero allocations - uses pre-allocated buffers from evaluator
- Link function support - handles all GLM links (Identity, Log, Logit, etc.)
- Type flexibility - accepts any Real coefficient type, converts internally
Mathematical Method
Linear Scale (default):
discrete_effect = η₁ - η₀ = (X₁'β) - (X₀'β) = (X₁ - X₀)'β = ΔX'β
∇β = ΔX = X₁ - X₀Response Scale (with link function):
discrete_effect = μ₁ - μ₀ = g⁻¹(η₁) - g⁻¹(η₀)
∇β = g'(η₁) × X₁ - g'(η₀) × X₀ (chain rule)Example
evaluator = contrastevaluator(compiled, data, [:treatment])
∇β = Vector{Float64}(undef, length(compiled))
# Linear scale gradients (η = Xβ scale)
contrast_gradient!(∇β, evaluator, 1, :treatment, "Control", "Drug", β)
# Response scale gradients (μ = g⁻¹(η) scale)
link = GLM.LogitLink()
contrast_gradient!(∇β, evaluator, 1, :treatment, "Control", "Drug", β, link)
# Delta method standard error
se = sqrt(∇β' * vcov_matrix * ∇β)Integration with Delta Method
Parameter gradients enable uncertainty quantification:
# Compute discrete effect + gradient simultaneously
discrete_effect = contrast_modelrow(evaluator, row, var, from, to)
contrast_gradient!(∇β, evaluator, row, var, from, to, β, link)
# Delta method confidence intervals
variance = ∇β' * vcov_matrix * ∇β
se = sqrt(variance)
ci_lower = discrete_effect - 1.96 * se
ci_upper = discrete_effect + 1.96 * seFormulaCompiler.contrast_gradient — Functioncontrast_gradient(evaluator, row, var, from, to, β, [link]) -> Vector{Float64}Convenience version that allocates and returns the gradient vector.
Categorical Mixtures
Utilities for constructing and validating categorical mixtures used in efficient profile-based marginal effects.
FormulaCompiler.mix — Functionmix(pairs...)Convenient constructor for CategoricalMixture from level => weight pairs. This is the main user-facing function for creating mixture specifications.
Arguments
pairs...: Level => weight pairs (e.g., "A" => 0.3, "B" => 0.7)
Returns
CategoricalMixture: Validated mixture object ready for use with FormulaCompiler
Examples
# Basic categorical mixture
group_mix = mix("Control" => 0.4, "Treatment" => 0.6)
# Educational composition
education_mix = mix("high_school" => 0.4, "college" => 0.4, "graduate" => 0.2)
# Regional distribution using symbols
region_mix = mix(:urban => 0.7, :rural => 0.3)
# Boolean mixture (30% false, 70% true)
treated_mix = mix(false => 0.3, true => 0.7)
# Works with any comparable type
age_group_mix = mix("young" => 0.25, "middle" => 0.50, "old" => 0.25)Validation
The mix() function automatically validates:
- At least one level => weight pair is provided
- All weights are non-negative
- Weights sum to 1.0 (within numerical tolerance)
- All levels are unique
Integration with FormulaCompiler
CounterfactualVector Pattern for Categorical Mixtures
The unified row-wise architecture provides efficient single-row mixture perturbations:
using FormulaCompiler, DataFrames, Tables
# Prepare data with mixture column
df = DataFrame(
y = randn(1000),
x = randn(1000),
group = fill(mix("A" => 0.4, "B" => 0.6), 1000) # Baseline mixture
)
data = Tables.columntable(df)
# Compile formula
model = lm(@formula(y ~ x * group), df)
compiled = compile_formula(model, data)
# Pattern 1: Single-row mixture perturbation
# Create counterfactual vector for mixture column
cf_mixture = counterfactualvector(data.group, 1) # CategoricalMixtureCounterfactualVector
# Apply different mixture to specific row
new_mixture = mix("A" => 0.8, "B" => 0.2) # Policy counterfactual
update_counterfactual_row!(cf_mixture, 500) # Target row 500
update_counterfactual_replacement!(cf_mixture, new_mixture)
# Evaluate with counterfactual data
data_cf = (data..., group=cf_mixture)
output = Vector{Float64}(undef, length(compiled))
compiled(output, data_cf, 500) # Row 500 uses new mixture, others use baseline
# Pattern 2: Population marginal effects with mixture profiles
function mixture_marginal_effects(model, data, base_mixture, alt_mixture)
compiled = compile_formula(model, data)
cf_mixture = counterfactualvector(data.group, 1)
data_cf = (data..., group=cf_mixture)
n_rows = length(data.x)
baseline_effects = Vector{Float64}(undef, n_rows)
alternative_effects = Vector{Float64}(undef, n_rows)
for row in 1:n_rows
update_counterfactual_row!(cf_mixture, row)
# Baseline mixture
update_counterfactual_replacement!(cf_mixture, base_mixture)
compiled(view(baseline_effects, row:row), data_cf, row)
# Alternative mixture
update_counterfactual_replacement!(cf_mixture, alt_mixture)
compiled(view(alternative_effects, row:row), data_cf, row)
end
return mean(alternative_effects - baseline_effects)
end
# Example: Policy effect of changing group composition
base_mix = mix("A" => 0.4, "B" => 0.6)
policy_mix = mix("A" => 0.7, "B" => 0.3)
effect = mixture_marginal_effects(model, data, base_mix, policy_mix)Reference Grid Pattern
For systematic marginal effects computation across different mixture profiles:
# Create reference grid with multiple mixture specifications
mixtures = [
mix("A" => 1.0, "B" => 0.0), # Pure A
mix("A" => 0.5, "B" => 0.5), # Balanced
mix("A" => 0.0, "B" => 1.0) # Pure B
]
# Evaluate effects across all mixture profiles
effects_by_mixture = Vector{Float64}(undef, length(mixtures))
cf_mixture = counterfactualvector(data.group, 1)
data_cf = (data..., group=cf_mixture)
for (i, mixture_spec) in enumerate(mixtures)
update_counterfactual_replacement!(cf_mixture, mixture_spec)
# Compute average effect across all rows for this mixture
row_effects = Vector{Float64}(undef, n_rows)
for row in 1:n_rows
update_counterfactual_row!(cf_mixture, row)
compiled(view(row_effects, row:row), data_cf, row)
end
effects_by_mixture[i] = mean(row_effects)
endPerformance
Mixture creation is lightweight and validation happens at construction time. The resulting CategoricalMixture objects are compiled into zero-allocation evaluators by FormulaCompiler's compilation system.
FormulaCompiler.CategoricalMixture — TypeCategoricalMixture{T}Represents a mixture of categorical levels with associated weights for statistical analysis. Used to specify population composition scenarios and marginal effects computation.
Fields
levels::Vector{T}: Categorical levels (strings, symbols, booleans, or other types)weights::Vector{Float64}: Associated weights (must sum to 1.0)
Example
# Educational composition mixture
edu_mix = CategoricalMixture(["high_school", "college"], [0.6, 0.4])
# Using the convenient mix() constructor
treatment_mix = mix("control" => 0.4, "treatment" => 0.6)
boolean_mix = mix(false => 0.3, true => 0.7)Validation
- Levels and weights must have the same length
- All weights must be non-negative
- Weights must sum to 1.0 (within tolerance)
- Levels must be unique
Integration with FormulaCompiler
CategoricalMixture objects are automatically detected by FormulaCompiler's compilation system and compiled into efficient zero-allocation evaluators using MixtureContrastOp.
FormulaCompiler.MixtureWithLevels — TypeMixtureWithLevels{T}Wrapper that includes original categorical levels with the mixture for FormulaCompiler processing. This type provides proper type-safe access to mixture components for the compilation system.
Fields
mixture::CategoricalMixture{T}: The core mixture specificationoriginal_levels::Vector{String}: Original levels from the data column
Usage
This type is used internally by FormulaCompiler's scenario system to provide type-safe mixture processing with access to both mixture specifications and original data structure.
# Usually created automatically by FormulaCompiler's scenario system
mixture = mix("A" => 0.3, "B" => 0.7)
original_levels = ["A", "B", "C"] # From the actual data column
wrapper = MixtureWithLevels(mixture, original_levels)
# Direct property access
wrapper.mixture.levels # Access to mixture levels
wrapper.mixture.weights # Access to mixture weights
wrapper.original_levels # Access to original data levelsFormulaCompiler.validate_mixture_against_data — Functionvalidate_mixture_against_data(mixture::CategoricalMixture, col, var::Symbol)Validate that all levels in the mixture exist in the actual data column. Throws ArgumentError if any mixture levels are not found in the data.
Arguments
mixture::CategoricalMixture: The mixture specification to validatecol: The data column to validate againstvar::Symbol: Variable name for error reporting
Throws
ArgumentError: If mixture contains levels not found in the data
Examples
# Validate mixture against categorical data
data_col = categorical(["A", "B", "C", "A", "B"])
mixture = mix("A" => 0.5, "B" => 0.5)
validate_mixture_against_data(mixture, data_col, :group) # ✓ Valid
# This would throw an error
bad_mixture = mix("A" => 0.5, "X" => 0.5) # "X" not in data
validate_mixture_against_data(bad_mixture, data_col, :group) # ✗ ErrorThis function is used internally by FormulaCompiler's scenario system to ensure mixture specifications are compatible with the actual data.
FormulaCompiler.mixture_to_scenario_value — Functionmixture_to_scenario_value(mixture::CategoricalMixture, original_col)Convert a categorical mixture to a representative value for FormulaCompiler scenario creation. Uses weighted average encoding to provide a smooth, continuous representation.
Strategy
- CategoricalArray: Weighted average of level indices
- Bool: Probability of true (equivalent to current fractional Bool support)
- Other: Weighted average of sorted unique level indices
Arguments
mixture::CategoricalMixture: The mixture to convertoriginal_col: The original data column for context
Returns
Float64: Continuous representation of the mixture
Examples
# Boolean mixture -> probability of true
bool_mix = mix(false => 0.3, true => 0.7)
mixture_to_scenario_value(bool_mix, [true, false, true]) # -> 0.7
# Categorical mixture -> weighted average of level indices
cat_mix = mix("A" => 0.6, "B" => 0.4)
cat_col = categorical(["A", "B", "C"])
mixture_to_scenario_value(cat_mix, cat_col) # -> 1.4 (0.6*1 + 0.4*2)This function is used internally by FormulaCompiler's scenario system to convert mixture specifications into values that can be used with the existing override system.
Utilities
FormulaCompiler.not — Functionnot(x)Logical NOT operation for use in formula specifications.
Arguments
x::Bool: Returns the logical negation (!x)x::Real: Returns 1 - x (useful for probability complements)
Returns
- For Bool: The opposite boolean value
- For Real: The complement (1 - x)
Example
# In a formula
model = lm(@formula(y ~ not(treatment)), df)
# For probabilities
p = 0.3
q = not(p) # 0.7