API Reference

API reference for FormulaCompiler.jl functions and types.

Core Compilation Functions

FormulaCompiler.compile_formula — Function

compile_formula(model, data) -> UnifiedCompiled

Compile a fitted statistical model into a zero-allocation, type-specialized evaluator.

Transforms statistical formulas into optimized computational engines using position mapping that achieves ~50ns per row evaluation with zero allocations. The resulting evaluator provides constant-time row access regardless of dataset size.

Arguments

model: Fitted statistical model (GLM.LinearModel, GLM.GeneralizedLinearModel, MixedModels.LinearMixedModel, etc.)
data: Data in Tables.jl format (preferably Tables.columntable(df) for optimal performance)

Returns

UnifiedCompiled{T,Ops,S,O}: Callable evaluator with embedded position mappings
- Call as compiled(output_vector, data, row_index) for zero-allocation evaluation
- length(compiled) returns number of model matrix columns

Performance Characteristics

Compilation: One-time cost for complex formulas
Evaluation: Zero bytes allocated after warmup
Memory: O(output_size) scratch space, reused across all evaluations
Scaling: Evaluation time independent of dataset size

Supported Models

Linear models: GLM.lm(@formula(y ~ x + group), df)
Generalized linear models: GLM.glm(@formula(success ~ x), df, Binomial(), LogitLink())
Mixed models: MixedModels.fit(MixedModel, @formula(y ~ x + (1|group)), df) (fixed effects only)
Custom contrasts: Models with DummyCoding(), EffectsCoding(), HelmertCoding(), etc.
Standardized predictors: Models with ZScore() standardization

Formula Features

Basic terms: x, log(z), x^2, (x > 0), integer and float variables
Categorical variables: Must use CategoricalArrays.jl format - raw strings not supported
Interactions: x * group, x * y * z, log(x) * group
Functions: log, exp, sqrt, sin, cos, abs, ^ (integer and fractional powers)
Boolean conditions: (x > 0), (z >= mean(z)), (group == "A")
Complex formulas: x * log(abs(z)) * group + sqrt(y) + (w > threshold)

Data Requirements

Categorical variables: Must use categorical(column) before model fitting
Missing values: Not supported - remove with dropmissing() or impute before compilation
Table format: Use Tables.columntable(df) for optimal performance

Example

using FormulaCompiler, GLM, DataFrames, Tables, CategoricalArrays

# Fit model
df = DataFrame(
    y = randn(1000), 
    x = randn(1000), 
    group = categorical(rand(["A", "B"], 1000))  # Required: use categorical()
)
model = lm(@formula(y ~ x * group + log(abs(x) + 1)), df)

# Compile once
data = Tables.columntable(df)  # Convert for optimal performance
compiled = compile_formula(model, data)

# Use many times (zero allocations)
output = Vector{Float64}(undef, length(compiled))
compiled(output, data, 1)     # Zero allocations
compiled(output, data, 500)   # Zero allocations

# Substantial speedup compared to modelmatrix(model)[row, :]

Mixed Models Example

using MixedModels
mixed = fit(MixedModel, @formula(y ~ x + treatment + (1|subject)), df)
compiled = compile_formula(mixed, data)  # Compiles fixed effects: y ~ x + treatment

source

compile_formula(formula::StatsModels.FormulaTerm, data) -> UnifiedCompiled

Compile a formula directly without a fitted model for zero-allocation evaluation.

This overload enables compilation from raw formulas, bypassing model fitting when only the computational structure is needed. Useful for custom model implementations or direct formula evaluation workflows.

Arguments

formula::StatsModels.FormulaTerm: Formula specification (e.g., from @formula(y ~ x + group))
data: Data in Tables.jl format (preferably Tables.columntable(df))

Returns

UnifiedCompiled{T,Ops,S,O}: Zero-allocation evaluator, same interface as model-based compilation

Performance

Compilation: Fast for complex formulas
Evaluation: Zero bytes allocated
Memory: Identical performance to model-based compilation

Example

using StatsModels, FormulaCompiler, Tables

# Direct formula compilation
formula = @formula(y ~ x * group + log(z))
data = Tables.columntable(df)
compiled = compile_formula(formula, data)

# Zero-allocation evaluation
output = Vector{Float64}(undef, length(compiled))
compiled(output, data, 1)  # Zero allocations

Use Cases

Custom model implementations requiring direct formula evaluation
Performance-critical applications avoiding model fitting overhead
Exploratory analysis with formula variations
Integration with external statistical frameworks

See also: compile_formula(model, data) for model-based compilation

source

FormulaCompiler.get_or_compile_formula — Function

get_or_compile_formula(model, data)

Get cached compiled formula or compile new one with semantic type-aware caching.

Cache Key Strategy

Creates cache key based on:

Model object (coefficients, structure)
Column names (formula structure)
Semantic type categories (compilation behavior)

Type Category Benefits

Better cache hits: Vector{Int} and Vector{Float64} share cache entry
Correct mixture handling: CategoricalArray vs CategoricalMixture distinguished
Future-proof: New types can be added to category system

Examples

# These share a cache entry (both :numeric):
data1 = (x = Float64[1.0, 2.0], y = ...)
data2 = (x = Int[1, 2], y = ...)  # Cache HIT ✓

# These get separate entries (different compilation):
data3 = (edu = categorical(["HS"]), ...)      # :categorical
data4 = (edu = mix("HS" => 0.5, "C" => 0.5), ...)  # :mixture - Cache MISS ✓

source

Model Row Evaluation

FormulaCompiler.modelrow — Function

modelrow(model, data, row_idx) -> Vector{Float64}

Evaluate a single model matrix row, returning a new vector (allocating version).

Convenient interface for when pre-allocation is not practical. Uses internal formula compilation and caching for performance optimization, though the non-allocating modelrow! interface is preferred for performance-critical code.

Arguments

model: Fitted statistical model (GLM, MixedModel, etc.)
data: Data in Tables.jl format
row_idx::Int: Row index to evaluate (1-based)

Returns

Vector{Float64}: New vector containing model matrix row values

Performance

First call: Includes one-time compilation cost
Subsequent calls: Fast evaluation plus allocation cost for vector creation
Memory: Allocates new vector each call
Caching: Automatically caches compiled formula for reuse

Example

using FormulaCompiler, GLM

model = lm(@formula(y ~ x * group + log(z)), df)
data = Tables.columntable(df)

# Convenient single-row evaluation
row_1 = modelrow(model, data, 1)      # First call (includes compilation)
row_2 = modelrow(model, data, 2)      # Subsequent calls (uses cached compilation)
row_100 = modelrow(model, data, 100)  # Fast (uses cached compilation)

When to Use

Prototyping: Quick analysis and exploration
Small datasets: When allocation overhead is negligible
Convenience: When code simplicity outweighs performance requirements

Performance Alternative

For zero-allocation performance in loops, use modelrow!:

output = Vector{Float64}(undef, length(compile_formula(model, data)))
for i in 1:n_iterations
    modelrow!(output, compiled, data, i)  # Zero allocations each iteration
end

source

modelrow(model, data, row_indices) -> Matrix{Float64}

Evaluate multiple rows and return a new matrix (allocating version). Uses compiled formulas for optimal performance.

Example

matrix = modelrow(model, data, [1, 5, 10])  # Returns Matrix{Float64}

source

modelrow(compiled_formula, data, row_idx) -> Vector{Float64}

Evaluate a single row with pre-compiled compiled formula.

Example

compiled = compile_formula(model, data)
row_values = modelrow(compiled, data, 1)  # Returns Vector{Float64}

source

modelrow(compiled_formula, data, row_indices) -> Matrix{Float64}

Evaluate multiple rows with pre-compiled compiled formula.

Example

compiled = compile_formula(model, data)
matrix = modelrow(compiled, data, [1, 5, 10])  # Returns Matrix{Float64}

source

FormulaCompiler.modelrow! — Function

modelrow!(output, compiled, data, row_idx) -> output

Evaluate a single model matrix row in-place with zero allocations.

The primary interface for high-performance row evaluation. This function provides zero-allocation evaluation, making it suitable for tight computational loops and performance-critical applications.

Arguments

output::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)
- Must have length ≥ length(compiled)
- Contents will be overwritten with model matrix row values
compiled: Compiled formula from compile_formula(model, data)
data: Data in Tables.jl format (preferably Tables.columntable(df) for best performance)
row_idx::Int: Row index to evaluate (1-based indexing)

Returns

output: The same vector passed in, now containing the evaluated model matrix row

Performance

Memory: Zero bytes allocated after warmup
Scaling: Constant time regardless of dataset size or formula complexity
Validation: Tested across 2000+ diverse formula configurations

Example

using FormulaCompiler, GLM, Tables

# Setup (one-time cost)
model = lm(@formula(y ~ x * group + log(z)), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)
output = Vector{Float64}(undef, length(compiled))

# High-performance evaluation (repeated many times)
modelrow!(output, compiled, data, 1)    # Zero allocations
modelrow!(output, compiled, data, 100)  # Zero allocations

# Monte Carlo simulation example
for i in 1:1_000_000
    row_idx = rand(1:nrow(df))
    modelrow!(output, compiled, data, row_idx)  # Zero allocations each call
    # Process output...
end

Error Handling

BoundsError: If row_idx exceeds data size
DimensionMismatch: If output vector is too small
Validates arguments in debug builds

See also: modelrow for allocating version, compile_formula, ModelRowEvaluator

source

modelrow!(row_vec, model, data, row_idx; cache=true)

Evaluate a single row of the model matrix in-place with automatic compilation.

Arguments

row_vec::AbstractVector{Float64}: Pre-allocated output vector (modified in-place)
model: Statistical model (GLM, MixedModel, etc.)
data: Data in Tables.jl format
row_idx::Int: Row index to evaluate
cache::Bool: Whether to cache compiled formula (default: true)

Returns

row_vec: The same vector passed in, now containing the evaluated row

Example

model = lm(@formula(y ~ x + group), df)
data = Tables.columntable(df)
row_vec = Vector{Float64}(undef, size(modelmatrix(model), 2))
modelrow!(row_vec, model, data, 1)

Note

First call compiles the formula. Subsequent calls reuse cached version when cache=true.

source

FormulaCompiler.ModelRowEvaluator — Type

ModelRowEvaluator{T,Ops,S,O}

Object-oriented interface for reusable, pre-compiled model evaluation.

Combines compiled formula, data, and output buffer into a single object that can be called repeatedly for both allocating and non-allocating row evaluation. Useful when the same model and data will be evaluated many times.

Type Parameters

T: Element type (typically Float64)
Ops: Compiled operations tuple type
S: Scratch buffer size
O: Output vector size

Fields

compiled::UnifiedCompiled: Pre-compiled formula
data::NamedTuple: Data in column-table format
row_vec::Vector{Float64}: Internal buffer for non-allocating calls

Constructors

ModelRowEvaluator(model, df::DataFrame)      # Converts DataFrame to column table
ModelRowEvaluator(model, data::NamedTuple)   # Uses data directly

Interface

# Allocating interface - returns new vector
result = evaluator(row_idx)

# Non-allocating interface - uses provided vector  
evaluator(output_vector, row_idx)

Performance

Construction: One-time compilation cost
Allocating calls: Fast evaluation plus allocation cost
Non-allocating calls: Zero bytes allocated
Memory: Minimal overhead beyond compiled formula and data reference

Example

using FormulaCompiler, GLM

# Create evaluator (one-time setup)
model = lm(@formula(y ~ x * group + log(z)), df)
evaluator = ModelRowEvaluator(model, df)

# Allocating interface (convenient)
row_1 = evaluator(1)      # Returns Vector{Float64}
row_2 = evaluator(100)    # Returns Vector{Float64}

# Non-allocating interface (fast)
output = Vector{Float64}(undef, length(evaluator))
evaluator(output, 1)      # Zero allocations
evaluator(output, 100)    # Zero allocations

# Batch processing
results = Matrix{Float64}(undef, 1000, length(evaluator))
for i in 1:1000
    evaluator(view(results, i, :), i)  # Zero allocations
end

When to Use

Repeated evaluation: Same model and data used many times
Object-oriented style: Prefer objects over function calls
Mixed interfaces: Need both allocating and non-allocating evaluation
Clean encapsulation: Bundle model, data, and buffer management

source

Derivatives

FormulaCompiler provides computational primitives for computing derivatives of model matrix rows with respect to continuous variables. These functions enable zero-allocation Jacobian computation using either automatic differentiation (ForwardDiff) or finite differences.

For marginal effects, standard errors, and complete statistical workflows, see Margins.jl.

Evaluator Construction

Recommended: Use the unified dispatcher for user-facing code:

# Automatic differentiation (preferred)
de = derivativeevaluator(:ad, compiled, data, [:x, :z])

# Finite differences
de = derivativeevaluator(:fd, compiled, data, [:x, :z])

Advanced: Direct constructor functions (primarily for internal use):

Missing docstring.

Missing docstring for derivativeevaluator. Check Documenter's build log for details.

FormulaCompiler.derivativeevaluator_fd — Function

derivativeevaluator_fd(compiled, data, vars) -> FDEvaluator

Create a finite differences specialized FDEvaluator using Float64 counterfactual vectors.

Returns a concrete FDEvaluator with only FD infrastructure, no field pollution from AD. Uses NumericCounterfactualVector{Float64} for type-stable counterfactual operations.

source

FormulaCompiler.derivativeevaluator_ad — Function

derivativeevaluator_ad(compiled, data, vars) -> ADEvaluator

Create an automatic differentiation specialized ADEvaluator using Dual counterfactual vectors.

Returns a concrete ADEvaluator with only AD infrastructure, no field pollution from FD. Uses NumericCounterfactualVector{Dual{...}} for type-stable dual number operations.

source

Jacobian Computation

FormulaCompiler.derivative_modelrow! — Function

derivative_modelrow!(J, de::ADEvaluator, row) -> J

Primary automatic differentiation API - zero allocations via ForwardDiff.jacobian!.

Use cached ForwardDiff configuration for zero allocations. Replaces manual dual construction with ForwardDiff's optimized jacobian! routine.

Arguments

J::AbstractMatrix{Float64}: Preallocated Jacobian buffer of size (n_terms, n_vars)
de::ADEvaluator: AD evaluator built by derivativeevaluator(:ad, compiled, data, vars)
row::Int: Row index to evaluate (1-based indexing)

Returns

J: The same matrix passed in, now containing J[i,j] = ∂X[i]/∂vars[j] for the specified row

Performance Characteristics

Memory: 0 bytes allocated (cached buffers and ForwardDiff config)
Speed: Target ~60ns with ForwardDiff.jacobian! optimization
Accuracy: Machine precision derivatives via ForwardDiff dual arithmetic

Example

using FormulaCompiler, GLM

# Setup model
model = lm(@formula(y ~ x + z), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)

# Build AD evaluator
de = derivativeevaluator(:ad, compiled, data, [:x, :z])

# Zero-allocation Jacobian computation
J = Matrix{Float64}(undef, length(compiled), length(de.vars))
derivative_modelrow!(J, de, 1)  # 0 bytes allocated

source

derivative_modelrow!(J, de::FDEvaluator, row) -> J

Primary finite differences API - zero allocations, concrete type dispatch.

Computes full Jacobian matrix ∂X[i]/∂vars[j] using central differences with adaptive step sizing. Matches automatic_diff.jl signature for seamless backend switching.

Performance Characteristics

Memory: 0 bytes allocated (uses pre-allocated FDEvaluator buffers)
Speed: ~65ns per variable with mathematical optimizations
Accuracy: Adaptive step sizing balances truncation/roundoff error

Mathematical Method

Central differences: ∂f/∂x ≈ [f(x+h) - f(x-h)] / (2h) Step sizing: h = ε^(1/3) * max(1, |x|) for numerical stability

Arguments

J::AbstractMatrix{Float64}: Pre-allocated Jacobian buffer of size (n_terms, n_vars)
de::FDEvaluator: Pre-built evaluator from derivativeevaluator_fd(compiled, data, vars)
row::Int: Row index to evaluate (1-based indexing)

Returns

J: The same matrix passed in, containing J[i,j] = ∂X[i]/∂vars[j]

Example

using FormulaCompiler, GLM

# Setup model and data
model = lm(@formula(y ~ x * group + log(abs(z) + 1)), df)
data = Tables.columntable(df)
compiled = compile_formula(model, data)

# Build FD evaluator
de_fd = derivativeevaluator_fd(compiled, data, [:x, :z])

# Zero-allocation finite differences
J = Matrix{Float64}(undef, length(compiled), length(de_fd.vars))
derivative_modelrow!(J, de_fd, 1)  # 0 bytes allocated

Variable Identification

FormulaCompiler.continuous_variables — Function

continuous_variables(compiled, data) -> Vector{Symbol}

Identify continuous variables suitable for derivative computation from a compiled formula.

Analyzes compiled operations to distinguish between continuous variables (suitable for differentiation) and categorical variables (requiring discrete analysis). Essential for determining valid variable sets for derivative evaluators and marginal effects computation.

Arguments

compiled::UnifiedCompiled: Compiled formula from compile_formula(model, data)
data::NamedTuple: Data in column-table format (from Tables.columntable(df))

Returns

Vector{Symbol}: Sorted list of continuous variable names
- Includes: Float64, Int64, Int32, Int variables used in LoadOp operations
- Excludes: Variables appearing only in ContrastOp operations (categorical contrasts)
- Excludes: Boolean variables (treated as categorical regardless of numeric type)

Classification Algorithm

Operation analysis: Scan compiled operations for LoadOp vs ContrastOp usage
Type filtering: Verify variables have Real element types in data
Boolean exclusion: Remove Bool variables (categorical by convention)
Categorical exclusion: Remove variables only appearing in contrast operations

Example

using FormulaCompiler, GLM, CategoricalArrays

# Mixed variable types
df = DataFrame(
    y = randn(1000),
    price = randn(1000),          # Float64 - continuous
    quantity = rand(1:100, 1000), # Int64 - continuous
    available = rand(Bool, 1000), # Bool - categorical
    category = categorical(rand(["A", "B", "C"], 1000))  # Categorical - categorical
)

model = lm(@formula(y ~ price + quantity + available + category), df)
compiled = compile_formula(model, Tables.columntable(df))

# Identify continuous variables
continuous_vars = continuous_variables(compiled, Tables.columntable(df))
# Returns: [:price, :quantity]

# Use for derivative evaluator construction
de_fd = derivativeevaluator_fd(compiled, Tables.columntable(df), continuous_vars)
de_ad = derivativeevaluator_ad(compiled, Tables.columntable(df), continuous_vars)

Use Cases

Pre-validation: Check variable suitability before building derivative evaluators
Automatic selection: Programmatically identify all differentiable variables
Error prevention: Avoid attempting derivatives on categorical variables
Model introspection: Understand variable roles in compiled formulas

Implementation Details

Scans LoadOp operations for direct variable usage (continuous indicators)
Identifies ContrastOp operations for categorical variable detection
Applies type checking to ensure Real element types in the actual data
Returns sorted list for consistent ordering across calls

source

Link Function Derivatives

Computational primitives for GLM link function derivatives (used by Margins.jl for computing marginal effects on the mean response).

Missing docstring.

Missing docstring for _dmu_deta. Check Documenter's build log for details.

Missing docstring.

Missing docstring for _d2mu_deta2. Check Documenter's build log for details.

FormulaCompiler.supported_link_functions — Function

supported_link_functions() -> Vector{String}

Return list of GLM link functions with implemented dmudeta methods.

Note: Link function support is now determined by Julia's method dispatch. Any link function with a dmudeta method will work automatically. This function provides a convenience list of commonly tested functions.

Example

links = supported_link_functions()
println("Common GLM links: ", join(links, ", "))

source

Categorical Contrasts

FormulaCompiler.ContrastEvaluator — Type

ContrastEvaluator{T, Ops, S, O, NTMerged, CounterfactualTuple}

Zero-allocation evaluator for categorical and binary variable contrasts.

Provides efficient discrete marginal effects computation by pre-allocating all buffers and pre-computing categorical level mappings. Eliminates the ~2KB allocation overhead of the basic contrast_modelrow! function for batch contrast operations.

Uses typed counterfactual vectors for type-stable, zero-allocation performance.

Fields

compiled: Base compiled formula evaluator
vars: Variables available for contrast computation
data_counterfactual: Counterfactual data structure for variable substitution
counterfactuals: Tuple of typed CounterfactualVector{T} subtypes for each variable
y_from_buf: Pre-allocated buffer for "from" level evaluation
y_to_buf: Pre-allocated buffer for "to" level evaluation
row: Current row being processed

Performance

Zero allocations after construction for all contrast operations
Type stability via typed counterfactual vectors
Buffer reuse across multiple contrasts and rows
Type specialization for compiled formula operations

Usage

# One-time setup
evaluator = contrastevaluator(compiled, data, [:treatment, :education])
contrast_buf = Vector{Float64}(undef, length(compiled))

# Fast repeated contrasts (zero allocations)
for row in 1:n_rows
    contrast_modelrow!(contrast_buf, evaluator, row, :treatment, "Control", "Drug")
    # Process contrast_buf...
end

source

FormulaCompiler.contrastevaluator — Function

contrastevaluator(compiled, data, vars) -> ContrastEvaluator

Construct a ContrastEvaluator for efficient categorical and binary contrast computation.

Pre-allocates all necessary buffers and pre-computes categorical level mappings to eliminate allocations during contrast evaluation.

Arguments

compiled: Result from compile_formula(model, data)
data: Column-table data as NamedTuple
vars: Vector of variable symbols available for contrasts

Returns

ContrastEvaluator configured for zero-allocation contrast computation.

Performance Notes

One-time cost: Setup involves building override structures and categorical mappings
Categorical optimization: Level mappings computed once, reused for all contrasts
Memory efficiency: Buffers sized exactly for the compiled formula

Example

# Setup for categorical contrasts
evaluator = contrastevaluator(compiled, data, [:group, :region, :binary_var])

# Zero-allocation usage
contrast_buf = Vector{Float64}(undef, length(compiled))
contrast_modelrow!(contrast_buf, evaluator, 1, :group, "Control", "Treatment")

source

FormulaCompiler.CategoricalLevelMap — Type

CategoricalLevelMap{Var, LevelTuple}

Stores pre-computed level mappings for a categorical variable in contrast evaluators.

Similar to ContrastOp, this struct uses type parameters for compile-time specialization while storing runtime level data as a field.

Type Parameters

Var::Symbol: Variable name (e.g., :group, :treatment)
LevelTuple: Type of the levels tuple (e.g., NTuple{3, Tuple{String, CategoricalValue{UInt32}}})

Fields

levels: Tuple of (level, CategoricalValue) pairs preserving natural level types

Example

# String categorical with 3 levels
CategoricalLevelMap{:group, NTuple{3, Tuple{String, CategoricalValue{UInt32}}}}(
    (("Control", catval1), ("Treatment", catval2), ("Placebo", catval3))
)

# Integer categorical with 5 levels
CategoricalLevelMap{:age_group, NTuple{5, Tuple{Int64, CategoricalValue{UInt32}}}}(
    ((1, catval1), (2, catval2), (3, catval3), (4, catval4), (5, catval5))
)

Performance

Zero allocations: All types concrete, fully specialized
Natural types: No String conversion needed for Int/Symbol levels
Fast lookup: Linear search through small tuple (2-10 levels typical)

source

FormulaCompiler.contrast_modelrow! — Function

contrast_modelrow!(Δ, evaluator, row, var, from, to) -> Δ

Compute discrete contrast using pre-allocated ContrastEvaluator (zero allocations).

Evaluates Δ = X(var=to) - X(var=from) using the evaluator's pre-allocated buffers and pre-computed categorical mappings for optimal performance.

Arguments

Δ::AbstractVector{Float64}: Output contrast vector (modified in-place)
evaluator::ContrastEvaluator: Pre-configured contrast evaluator
row::Int: Row index to evaluate
var::Symbol: Variable to contrast (must be in evaluator.vars)
from: Reference level (baseline)
to: Target level (comparison)

Performance

Zero allocations - uses pre-allocated buffers from evaluator
Categorical optimization - uses pre-computed level mappings
Type specialization - compiled formula operations fully optimized

Error Handling

Validates that var exists in evaluator's variable list
Handles both categorical and numeric variable types
Provides clear error messages for invalid level specifications

Example

evaluator = contrastevaluator(compiled, data, [:treatment])
contrast_buf = Vector{Float64}(undef, length(compiled))

# Zero-allocation contrast computation
contrast_modelrow!(contrast_buf, evaluator, 1, :treatment, "Control", "Drug")
# contrast_buf now contains the discrete effect vector

source

FormulaCompiler.contrast_gradient! — Function

contrast_gradient!(∇β, evaluator, row, var, from, to, β, [link]) -> ∇β

Compute parameter gradients for discrete effects: ∂(discrete_effect)/∂β - zero allocations.

Computes the gradient of discrete marginal effects with respect to model parameters using the mathematical formula:

Linear scale (η): ∇β = ΔX = X₁ - X₀ (contrast vector)
Response scale (μ): ∇β = g'(η₁) × X₁ - g'(η₀) × X₀ (chain rule with link derivatives)

This enables uncertainty quantification via the delta method: SE = √(∇β' Σ ∇β).

Arguments

∇β::AbstractVector{Float64}: Output gradient vector (modified in-place)
evaluator::ContrastEvaluator: Pre-configured contrast evaluator
row::Int: Row index to evaluate
var::Symbol: Variable to contrast (must be in evaluator.vars)
from: Reference level (baseline)
to: Target level (comparison)
β::AbstractVector{<:Real}: Model coefficients (used only for response-scale computation)
link: GLM link function (optional, defaults to linear scale)

Returns

∇β: The same vector passed in, containing parameter gradients ∂(discrete_effect)/∂β

Performance

Zero allocations - uses pre-allocated buffers from evaluator
Link function support - handles all GLM links (Identity, Log, Logit, etc.)
Type flexibility - accepts any Real coefficient type, converts internally

Mathematical Method

Linear Scale (default):

discrete_effect = η₁ - η₀ = (X₁'β) - (X₀'β) = (X₁ - X₀)'β = ΔX'β
∇β = ΔX = X₁ - X₀

Response Scale (with link function):

discrete_effect = μ₁ - μ₀ = g⁻¹(η₁) - g⁻¹(η₀)
∇β = g'(η₁) × X₁ - g'(η₀) × X₀  (chain rule)

Example

evaluator = contrastevaluator(compiled, data, [:treatment])
∇β = Vector{Float64}(undef, length(compiled))

# Linear scale gradients (η = Xβ scale)
contrast_gradient!(∇β, evaluator, 1, :treatment, "Control", "Drug", β)

# Response scale gradients (μ = g⁻¹(η) scale)
link = GLM.LogitLink()
contrast_gradient!(∇β, evaluator, 1, :treatment, "Control", "Drug", β, link)

# Delta method standard error
se = sqrt(∇β' * vcov_matrix * ∇β)

Integration with Delta Method

Parameter gradients enable uncertainty quantification:

# Compute discrete effect + gradient simultaneously
discrete_effect = contrast_modelrow(evaluator, row, var, from, to)
contrast_gradient!(∇β, evaluator, row, var, from, to, β, link)

# Delta method confidence intervals
variance = ∇β' * vcov_matrix * ∇β
se = sqrt(variance)
ci_lower = discrete_effect - 1.96 * se
ci_upper = discrete_effect + 1.96 * se

source

FormulaCompiler.contrast_gradient — Function

contrast_gradient(evaluator, row, var, from, to, β, [link]) -> Vector{Float64}

Convenience version that allocates and returns the gradient vector.

source

Categorical Mixtures

Utilities for constructing and validating categorical mixtures used in efficient profile-based marginal effects.

FormulaCompiler.mix — Function

mix(pairs...)

Convenient constructor for CategoricalMixture from level => weight pairs. This is the main user-facing function for creating mixture specifications.

Arguments

pairs...: Level => weight pairs (e.g., "A" => 0.3, "B" => 0.7)

Returns

CategoricalMixture: Validated mixture object ready for use with FormulaCompiler

Examples

# Basic categorical mixture
group_mix = mix("Control" => 0.4, "Treatment" => 0.6)

# Educational composition
education_mix = mix("high_school" => 0.4, "college" => 0.4, "graduate" => 0.2)

# Regional distribution using symbols
region_mix = mix(:urban => 0.7, :rural => 0.3)

# Boolean mixture (30% false, 70% true)
treated_mix = mix(false => 0.3, true => 0.7)

# Works with any comparable type
age_group_mix = mix("young" => 0.25, "middle" => 0.50, "old" => 0.25)

Validation

The mix() function automatically validates:

At least one level => weight pair is provided
All weights are non-negative
Weights sum to 1.0 (within numerical tolerance)
All levels are unique

Integration with FormulaCompiler

CounterfactualVector Pattern for Categorical Mixtures

The unified row-wise architecture provides efficient single-row mixture perturbations:

using FormulaCompiler, DataFrames, Tables

# Prepare data with mixture column
df = DataFrame(
    y = randn(1000),
    x = randn(1000),
    group = fill(mix("A" => 0.4, "B" => 0.6), 1000)  # Baseline mixture
)
data = Tables.columntable(df)

# Compile formula
model = lm(@formula(y ~ x * group), df)
compiled = compile_formula(model, data)

# Pattern 1: Single-row mixture perturbation
# Create counterfactual vector for mixture column
cf_mixture = counterfactualvector(data.group, 1)  # CategoricalMixtureCounterfactualVector

# Apply different mixture to specific row
new_mixture = mix("A" => 0.8, "B" => 0.2)  # Policy counterfactual
update_counterfactual_row!(cf_mixture, 500)  # Target row 500
update_counterfactual_replacement!(cf_mixture, new_mixture)

# Evaluate with counterfactual data
data_cf = (data..., group=cf_mixture)
output = Vector{Float64}(undef, length(compiled))
compiled(output, data_cf, 500)  # Row 500 uses new mixture, others use baseline

# Pattern 2: Population marginal effects with mixture profiles
function mixture_marginal_effects(model, data, base_mixture, alt_mixture)
    compiled = compile_formula(model, data)
    cf_mixture = counterfactualvector(data.group, 1)
    data_cf = (data..., group=cf_mixture)

    n_rows = length(data.x)
    baseline_effects = Vector{Float64}(undef, n_rows)
    alternative_effects = Vector{Float64}(undef, n_rows)

    for row in 1:n_rows
        update_counterfactual_row!(cf_mixture, row)

        # Baseline mixture
        update_counterfactual_replacement!(cf_mixture, base_mixture)
        compiled(view(baseline_effects, row:row), data_cf, row)

        # Alternative mixture
        update_counterfactual_replacement!(cf_mixture, alt_mixture)
        compiled(view(alternative_effects, row:row), data_cf, row)
    end

    return mean(alternative_effects - baseline_effects)
end

# Example: Policy effect of changing group composition
base_mix = mix("A" => 0.4, "B" => 0.6)
policy_mix = mix("A" => 0.7, "B" => 0.3)
effect = mixture_marginal_effects(model, data, base_mix, policy_mix)

Reference Grid Pattern

For systematic marginal effects computation across different mixture profiles:

# Create reference grid with multiple mixture specifications
mixtures = [
    mix("A" => 1.0, "B" => 0.0),    # Pure A
    mix("A" => 0.5, "B" => 0.5),    # Balanced
    mix("A" => 0.0, "B" => 1.0)     # Pure B
]

# Evaluate effects across all mixture profiles
effects_by_mixture = Vector{Float64}(undef, length(mixtures))
cf_mixture = counterfactualvector(data.group, 1)
data_cf = (data..., group=cf_mixture)

for (i, mixture_spec) in enumerate(mixtures)
    update_counterfactual_replacement!(cf_mixture, mixture_spec)

    # Compute average effect across all rows for this mixture
    row_effects = Vector{Float64}(undef, n_rows)
    for row in 1:n_rows
        update_counterfactual_row!(cf_mixture, row)
        compiled(view(row_effects, row:row), data_cf, row)
    end
    effects_by_mixture[i] = mean(row_effects)
end

Performance

Mixture creation is lightweight and validation happens at construction time. The resulting CategoricalMixture objects are compiled into zero-allocation evaluators by FormulaCompiler's compilation system.

source

FormulaCompiler.CategoricalMixture — Type

CategoricalMixture{T}

Represents a mixture of categorical levels with associated weights for statistical analysis. Used to specify population composition scenarios and marginal effects computation.

Fields

levels::Vector{T}: Categorical levels (strings, symbols, booleans, or other types)
weights::Vector{Float64}: Associated weights (must sum to 1.0)

Example

# Educational composition mixture
edu_mix = CategoricalMixture(["high_school", "college"], [0.6, 0.4])

# Using the convenient mix() constructor
treatment_mix = mix("control" => 0.4, "treatment" => 0.6)
boolean_mix = mix(false => 0.3, true => 0.7)

Validation

Levels and weights must have the same length
All weights must be non-negative
Weights must sum to 1.0 (within tolerance)
Levels must be unique

Integration with FormulaCompiler

CategoricalMixture objects are automatically detected by FormulaCompiler's compilation system and compiled into efficient zero-allocation evaluators using MixtureContrastOp.

source

FormulaCompiler.MixtureWithLevels — Type

MixtureWithLevels{T}

Wrapper that includes original categorical levels with the mixture for FormulaCompiler processing. This type provides proper type-safe access to mixture components for the compilation system.

Fields

mixture::CategoricalMixture{T}: The core mixture specification
original_levels::Vector{String}: Original levels from the data column

Usage

This type is used internally by FormulaCompiler's scenario system to provide type-safe mixture processing with access to both mixture specifications and original data structure.

# Usually created automatically by FormulaCompiler's scenario system
mixture = mix("A" => 0.3, "B" => 0.7)
original_levels = ["A", "B", "C"]  # From the actual data column
wrapper = MixtureWithLevels(mixture, original_levels)

# Direct property access
wrapper.mixture.levels     # Access to mixture levels
wrapper.mixture.weights    # Access to mixture weights
wrapper.original_levels    # Access to original data levels

source

FormulaCompiler.validate_mixture_against_data — Function

validate_mixture_against_data(mixture::CategoricalMixture, col, var::Symbol)

Validate that all levels in the mixture exist in the actual data column. Throws ArgumentError if any mixture levels are not found in the data.

Arguments

mixture::CategoricalMixture: The mixture specification to validate
col: The data column to validate against
var::Symbol: Variable name for error reporting

Throws

ArgumentError: If mixture contains levels not found in the data

Examples

# Validate mixture against categorical data
data_col = categorical(["A", "B", "C", "A", "B"])
mixture = mix("A" => 0.5, "B" => 0.5)
validate_mixture_against_data(mixture, data_col, :group)  # ✓ Valid

# This would throw an error
bad_mixture = mix("A" => 0.5, "X" => 0.5)  # "X" not in data
validate_mixture_against_data(bad_mixture, data_col, :group)  # ✗ Error

This function is used internally by FormulaCompiler's scenario system to ensure mixture specifications are compatible with the actual data.

source

FormulaCompiler.mixture_to_scenario_value — Function

mixture_to_scenario_value(mixture::CategoricalMixture, original_col)

Convert a categorical mixture to a representative value for FormulaCompiler scenario creation. Uses weighted average encoding to provide a smooth, continuous representation.

Strategy

CategoricalArray: Weighted average of level indices
Bool: Probability of true (equivalent to current fractional Bool support)
Other: Weighted average of sorted unique level indices

Arguments

mixture::CategoricalMixture: The mixture to convert
original_col: The original data column for context

Returns

Float64: Continuous representation of the mixture

Examples

# Boolean mixture -> probability of true
bool_mix = mix(false => 0.3, true => 0.7)
mixture_to_scenario_value(bool_mix, [true, false, true]) # -> 0.7

# Categorical mixture -> weighted average of level indices
cat_mix = mix("A" => 0.6, "B" => 0.4)  
cat_col = categorical(["A", "B", "C"])
mixture_to_scenario_value(cat_mix, cat_col) # -> 1.4 (0.6*1 + 0.4*2)

This function is used internally by FormulaCompiler's scenario system to convert mixture specifications into values that can be used with the existing override system.

source

Utilities

FormulaCompiler.not — Function

not(x)

Logical NOT operation for use in formula specifications.

Arguments

x::Bool: Returns the logical negation (!x)
x::Real: Returns 1 - x (useful for probability complements)

Returns

For Bool: The opposite boolean value
For Real: The complement (1 - x)

Example

# In a formula
model = lm(@formula(y ~ not(treatment)), df)

# For probabilities
p = 0.3
q = not(p)  # 0.7

Warning

For Real values, this assumes x is in [0,1] range. No bounds checking is performed.

source