Backend Selection Guide
Understanding AD and FD backends for marginal effects computation
Recommendation: Use AD (Automatic Differentiation)
TL;DR: Use backend=:ad (the default) for all marginal effects computations. It provides:
- Zero allocation performance after warmup
- Machine precision accuracy
- Domain safety for log(), sqrt(), and other sensitive functions
- 3-5x faster than FD in most cases
- All numeric types supported (Int8, Int16, Float64, etc.)
# Recommended (AD is the default):
result = population_margins(model, data; type=:effects)
# Explicit AD specification (equivalent):
result = population_margins(model, data; type=:effects, backend=:ad)When Finite Differences (FD) Exists
The :fd backend exists for:
- Historical compatibility - Legacy code using FD
- Debugging - Comparing AD vs FD results to validate correctness
- Edge cases - Rare situations where FD may be preferred
Important: FD is not recommended for new code. It was developed before efficient AD implementation and is now effectively in maintenance mode.
Quick Decision Tree
For all applications:
└── Use backend=:ad (default)
├── Required for: log(x), sqrt(x), 1/x, x^(1/3), etc.
├── Recommended for: all other formulas
└── Never fails: domain-safe evaluation
Only use backend=:fd if:
├── Maintaining legacy code that explicitly uses FD
└── Debugging/validation (comparing AD vs FD results)Critical Reliability Differences
Domain-Sensitive Functions: Always Use AD
Functions that require backend=:ad:
# Log transformations - FD can push values below zero
model = lm(@formula(y ~ log(x)), data)
population_margins(model, data; backend=:ad) # Required
# Square root functions - FD can push values negative
model = lm(@formula(y ~ sqrt(x)), data)
population_margins(model, data; backend=:ad) # Required
# Inverse functions near zero - FD can create division issues
model = lm(@formula(y ~ 1/x), data)
population_margins(model, data; backend=:ad) # Recommended
# Fractional powers - Similar domain sensitivity as sqrt
model = lm(@formula(y ~ x^(1/3)), data)
population_margins(model, data; backend=:ad) # RecommendedWhy FD fails: Finite difference computation f(x+h) - f(x) can push arguments outside valid domains:
log(x+h)wherex+h < 0→DomainErrorsqrt(x+h)wherex+h < 0→DomainError1/(x+h)wherex+h ≈ 0→ numerical instability
Why AD succeeds: Automatic differentiation computes exact derivatives without domain-violating function evaluations.
Functions Safe for Either Backend
# Linear relationships - both backends equivalent
model = lm(@formula(y ~ x + z), data)
population_margins(model, data; backend=:fd) # Efficient performance
population_margins(model, data; backend=:ad) # Equivalent results
# Polynomial functions - both work well
model = lm(@formula(y ~ x + x^2), data)
population_margins(model, data; backend=:fd) # Choose based on performance needs
population_margins(model, data; backend=:ad) # Same statistical results
# Simple transformations - no domain issues
model = lm(@formula(y ~ x/10 + z*2), data)
population_margins(model, data; backend=:fd) # Zero allocationPerformance Characteristics
Memory Usage Analysis
Both backends now achieve zero allocation performance:
# FD: Zero allocation after warmup
@allocated population_margins(model, data_100; backend=:fd) # 0 bytes
@allocated population_margins(model, data_1000; backend=:fd) # 0 bytes
@allocated population_margins(model, data_5000; backend=:fd) # 0 bytes
# AD: Zero allocation after warmup
@allocated population_margins(model, data_100; backend=:ad) # 0 bytes
@allocated population_margins(model, data_1000; backend=:ad) # 0 bytes
@allocated population_margins(model, data_5000; backend=:ad) # 0 bytesMemory Usage Decision:
- All dataset sizes: Both backends achieve zero allocation performance
- Choice based on reliability and accuracy: AD provides superior domain handling
- Construction cost: AD requires slightly more memory during evaluator setup (amortized over many evaluations)
Speed Performance
Both backends achieve excellent performance, with AD providing 3-5x improvements:
# Typical performance ranges (varies by system and model complexity)
# Small problems (n=100-1000)
@btime population_margins($model, $data; backend=:fd) # 0.1-10ms (baseline)
@btime population_margins($model, $data; backend=:ad) # 0.05-5ms (3-5x faster!)
# Large problems (n=10000+)
@btime population_margins($model, $large_data; backend=:fd) # Scales linearly with n
@btime population_margins($model, $large_data; backend=:ad) # Scales linearly, but with better constant factorsKey insight: With zero-allocation AD, the performance differences now favor AD in most cases, while maintaining superior numerical properties.
Numerical Accuracy
Both Backends Provide Equivalent Accuracy
For well-conditioned problems, both backends produce statistically equivalent results:
# Linear models - identical to machine precision
fd_result = population_margins(model, data; backend=:fd)
ad_result = population_margins(model, data; backend=:ad)
DataFrame(fd_result).estimate ≈ DataFrame(ad_result).estimate # rtol=1e-12 PASS
# GLM models - equivalent within appropriate tolerances
fd_glm = population_margins(glm_model, data; backend=:fd)
ad_glm = population_margins(glm_model, data; backend=:ad)
DataFrame(fd_glm).estimate ≈ DataFrame(ad_glm).estimate # rtol=1e-10 PASSAD May Be More Accurate For
- Complex function compositions
- Functions with steep gradients
- Near-boundary evaluations
- Models with numerical conditioning issues
Production Recommendations
Backend Selection Policy
- No
:automode is provided. - No implicit backend fallbacks are performed.
- Select
backendexplicitly. Use:adby default; use:fdonly when explicitly intended and theoretically safe.
Backend Selection by Use Case
| Use Case | Backend | Rationale |
|---|---|---|
| Domain-sensitive functions (log, sqrt, 1/x) | :ad | Required - FD fails with DomainError |
| General production workflows | :ad | Zero allocation + reliability + 3-5x faster |
| Large datasets (>10k observations) | :ad | Zero allocation + superior performance |
| Development/testing | :ad | Higher reliability + machine precision |
| High-precision requirements | :ad | Exact derivatives vs numerical approximation |
| Legacy code maintenance | :fd | Only if existing code explicitly uses FD |
| Debugging/validation | Both | Compare results to verify correctness |
Production Configuration Guidance
- Default to
backend=:adfor reliability and accuracy (also zero allocation). - Use
backend=:fdonly for simple, well-conditioned formulas and when you explicitly want FD. - For domain-sensitive functions (log, sqrt, 1/x near 0), always use
:ad.
Troubleshooting Backend Issues
Common Error Patterns
DomainError with FD Backend
# Error: DomainError with -1.23e-6: log was called with a negative real number
result = population_margins(model, data; backend=:fd) # Fails
# Solution: Use AD backend for log functions
result = population_margins(model, data; backend=:ad) # Works~~Memory Pressure with AD Backend~~ (Obsolete)
Note: This troubleshooting section is obsolete as of v2.0. Both AD and FD achieve zero allocation performance, so there is no memory efficiency difference between backends. If you encounter memory issues, they are likely related to dataset size or model complexity, not the backend choice.
Backend Validation Testing
# Test both backends for new functions
function test_backend_compatibility(model, data)
try_fd = try population_margins(model, data; backend=:fd) catch nothing end
try_ad = try population_margins(model, data; backend=:ad) catch nothing end
if try_fd === nothing && try_ad !== nothing
@warn "Function requires AD backend - FD fails with domain error"
return :ad_required
elseif try_fd !== nothing && try_ad !== nothing
# Compare results for consistency
fd_est = DataFrame(try_fd).estimate
ad_est = DataFrame(try_ad).estimate
if fd_est ≈ ad_est rtol=1e-10
@info "Both backends produce consistent results"
return :either_ok
else
@warn "Backends produce different results - investigate numerical issues"
return :inconsistent
end
else
@error "Both backends failed"
return :both_failed
end
end
# Usage
compatibility = test_backend_compatibility(model, data)Advanced Topics
FormulaCompiler Integration
Both backends leverage FormulaCompiler.jl's optimized evaluation:
# FD: Uses finite difference approximation with compiled evaluators
# - Zero allocation after warmup
# - Reuses pre-allocated buffers
# - Scalar operations avoid broadcast allocations
# AD: Uses dual number arithmetic with compiled evaluators (OPTIMIZED)
# - Zero allocation after warmup via pre-conversion strategy
# - Exact derivative computation with machine precision
# - 3-5x performance improvement over previous AD implementation
# - Composition via chain rule with type homogeneityCustom Tolerance Settings
For functions near domain boundaries, you may need custom tolerances:
# Custom finite difference step size (advanced)
# Note: This is a FormulaCompiler.jl setting, not directly exposed in Margins.jl
# Contact maintainers if you need custom FD step sizes for specific functionsSummary Guidelines
Default Strategy (Recommended):
Use backend=:ad for everything. The AD backend is now the recommended default for all use cases, providing:
- Zero allocation performance (equal to FD)
- Superior speed (3-5x faster than FD)
- Domain safety (handles log, sqrt, 1/x correctly)
- Machine precision accuracy
- Statistical validity
When to Use FD:
Only use backend=:fd for:
- Legacy compatibility - Maintaining existing code that explicitly uses FD
- Validation - Comparing AD vs FD results for debugging
- Very rare edge cases - Contact maintainers if you believe you need FD for a new use case
Important: FD is not faster, not more memory-efficient, and less reliable than AD in v2.0+. There is no performance or memory reason to prefer FD for new code.
Statistical Guarantees:
Both backends maintain statistical correctness when they succeed:
- Same delta-method standard errors (when computed successfully)
- Same marginal effect estimates (when numerically stable)
- Same confidence intervals and hypothesis tests
The reliability difference is in computational robustness, not statistical validity.
For performance optimization details, see Performance Guide. For mathematical background, see Mathematical Foundation.