Linear Regression
A comprehensive guide to Bayesian linear regression using Fugue. This tutorial demonstrates how to build, analyze, and extend linear models for real-world data analysis, showcasing the power of probabilistic programming for uncertainty quantification and model comparison.
By the end of this tutorial, you will understand:
- Bayesian Linear Regression: Prior specification and posterior inference for regression parameters
- Uncertainty Quantification: How to extract and interpret parameter uncertainty from MCMC samples
- Robust Regression: Using heavy-tailed distributions to handle outliers
- Polynomial Regression: Modeling nonlinear relationships with polynomial basis functions
- Model Selection: Bayesian methods for comparing regression models
- Regularization: Ridge regression through hierarchical priors
- Production Applications: Scalable inference for high-dimensional regression problems
The Linear Regression Framework
Linear regression is the cornerstone of statistical modeling. In the Bayesian framework, we treat regression parameters as random variables with prior distributions, allowing us to quantify uncertainty in our estimates and make probabilistic predictions.
graph TB A["Data: (x₁,y₁), (x₂,y₂), ..., (xₙ,yₙ)"] --> B["Linear Model<br/>y = β₀ + β₁x + ε"] B --> C["Bayesian Framework"] C --> D["Prior: p(β₀, β₁, σ)"] C --> E["Likelihood: p(y|X, β₀, β₁, σ)"] D --> F["Posterior: p(β₀, β₁, σ|y, X)"] E --> F F --> G["MCMC Sampling"] G --> H["Parameter Uncertainty"] G --> I["Predictive Distribution"] G --> J["Model Comparison"]
Mathematical Foundation
Basic Linear Model
The fundamental linear regression model is:
where:
- : Response variable (dependent)
- : Predictor variable (independent)
- : Intercept parameter
- : Slope parameter
- : Random error
Bayesian Specification
Prior Distributions:
- or
Likelihood:
Posterior:
When using conjugate priors (Normal-Inverse-Gamma), the posterior has a closed form. However, MCMC allows us to use more flexible priors and handle complex models without conjugacy restrictions.
Basic Linear Regression
Let's start with the fundamental case: simple linear regression with one predictor variable.
Implementation
use fugue::*;
use fugue::runtime::interpreters::PriorHandler;
use rand::{SeedableRng, rngs::StdRng};
// Basic Bayesian linear regression model
fn basic_linear_regression_model(x_data: Vec<f64>, y_data: Vec<f64>) -> Model<(f64, f64, f64)> {
prob! {
let intercept <- sample(addr!("intercept"), Normal::new(0.0, 10.0).unwrap());
let slope <- sample(addr!("slope"), Normal::new(0.0, 10.0).unwrap());
// Use a well-behaved prior for sigma (now that MCMC handles positivity constraints)
let sigma <- sample(addr!("sigma"), Gamma::new(1.0, 1.0).unwrap()); // Mean = 1, more concentrated
// Simple observations (limited number for efficiency)
let _obs_0 <- observe(addr!("y", 0), Normal::new(intercept + slope * x_data[0], sigma).unwrap(), y_data[0]);
let _obs_1 <- observe(addr!("y", 1), Normal::new(intercept + slope * x_data[1], sigma).unwrap(), y_data[1]);
let _obs_2 <- observe(addr!("y", 2), Normal::new(intercept + slope * x_data[2], sigma).unwrap(), y_data[2]);
pure((intercept, slope, sigma))
}
}
fn basic_regression_demo() {
println!("=== Basic Linear Regression ===\n");
// Generate synthetic data: y = 2 + 1.5*x + noise (smaller dataset for demo)
let (x_data, y_data) = generate_regression_data(20, 1.5, 2.0, 0.5, 12345);
println!("📊 Generated {} data points", x_data.len());
println!(" - True intercept: 2.0, True slope: 1.5, True sigma: 0.5");
println!(
" - Data range: x ∈ [{:.1}, {:.1}], y ∈ [{:.1}, {:.1}]",
x_data[0],
x_data[x_data.len() - 1],
y_data.iter().fold(f64::INFINITY, |a, &b| a.min(b)),
y_data.iter().fold(f64::NEG_INFINITY, |a, &b| a.max(b))
);
// Create model function that uses the data
let model_fn = move || basic_linear_regression_model(x_data.clone(), y_data.clone());
println!("\n🔬 Running MCMC inference...");
let mut rng = StdRng::seed_from_u64(42);
let samples = adaptive_mcmc_chain(&mut rng, model_fn, 500, 100);
// Extract parameter estimates
let intercepts: Vec<f64> = samples
.iter()
.filter_map(|(_, trace)| trace.get_f64(&addr!("intercept")))
.collect();
let slopes: Vec<f64> = samples
.iter()
.filter_map(|(_, trace)| trace.get_f64(&addr!("slope")))
.collect();
let sigmas: Vec<f64> = samples
.iter()
.filter_map(|(_, trace)| trace.get_f64(&addr!("sigma")))
.collect();
if !intercepts.is_empty() && !slopes.is_empty() && !sigmas.is_empty() {
println!("✅ MCMC completed with {} samples", samples.len());
println!("\n📈 Parameter Estimates:");
let mean_intercept = intercepts.iter().sum::<f64>() / intercepts.len() as f64;
let mean_slope = slopes.iter().sum::<f64>() / slopes.len() as f64;
let mean_sigma = sigmas.iter().sum::<f64>() / sigmas.len() as f64;
println!(" - Intercept: {:.3} (true: 2.0)", mean_intercept);
println!(" - Slope: {:.3} (true: 1.5)", mean_slope);
println!(" - Sigma: {:.3} (true: 0.5)", mean_sigma);
// Show some diagnostics
let valid_traces = samples
.iter()
.filter(|(_, trace)| trace.total_log_weight().is_finite())
.count();
println!(" - Valid traces: {} / {}", valid_traces, samples.len());
} else {
println!("❌ MCMC failed - no valid samples obtained");
}
println!();
}
Key Concepts
- Prior Specification: We use weakly informative priors that allow the data to dominate
- Vectorized Likelihood: The
for
loop handles multiple observations efficiently - Parameter Recovery: MCMC estimates should recover true parameter values
- Uncertainty Quantification: Standard deviations provide parameter uncertainty
- Use
Normal(0, 10)
for regression coefficients when predictors are standardized - Use
Gamma(2, 0.5)
for error variance (σ) - gives reasonable prior mass over positive values - Adjust prior scale based on your domain knowledge and data scale
Interpretation
The posterior samples provide:
- Point Estimates: Posterior means are Bayesian parameter estimates
- Credible Intervals: Quantiles give uncertainty bounds (e.g., 95% credible intervals)
- Predictive Distribution: For new :
Robust Regression
Standard linear regression assumes Gaussian errors, making it sensitive to outliers. Robust regression uses heavy-tailed distributions to reduce outlier influence.
Theory
Replace the normal likelihood with a t-distribution:
where:
- (linear predictor)
- : Degrees of freedom (lower = heavier tails)
- As , (normal distribution)
Implementation
use fugue::*;
// Robust regression using t-distribution for outlier resistance
fn robust_regression_model(x_data: Vec<f64>, y_data: Vec<f64>) -> Model<(f64, f64, f64, f64)> {
prob! {
let intercept <- sample(addr!("intercept"), Normal::new(0.0, 10.0).unwrap());
let slope <- sample(addr!("slope"), Normal::new(0.0, 10.0).unwrap());
let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());
let nu <- sample(addr!("nu"), Gamma::new(2.0, 0.1).unwrap()); // Degrees of freedom for t-dist
// Use plate notation for observations
let _observations <- plate!(i in x_data.iter().zip(y_data.iter()).enumerate().take(3) => {
let (idx, (x_i, y_i)) = i;
observe(addr!("y", idx), Normal::new(intercept + slope * x_i, sigma).unwrap(), *y_i)
});
pure((intercept, slope, sigma, nu))
}
}
fn robust_regression_demo() {
println!("=== Robust Linear Regression ===\n");
// Generate data with outliers
let (mut x_data, mut y_data) = generate_regression_data(40, 1.2, 3.0, 0.4, 67890);
// Add some outliers
x_data.extend(vec![8.5, 9.2, 7.8]);
y_data.extend(vec![20.0, -5.0, 25.0]); // Clear outliers
println!(
"📊 Generated {} data points (with 3 outliers)",
x_data.len()
);
println!(" - Base relationship: y = 3.0 + 1.2*x + noise");
println!(" - Added outliers at x=[8.5, 9.2, 7.8] with y=[20.0, -5.0, 25.0]");
// Compare standard vs robust regression
let mut rng = StdRng::seed_from_u64(42);
// Standard regression
println!("\n🔬 Standard Linear Regression:");
let standard_model_fn = || basic_linear_regression_model(x_data.clone(), y_data.clone());
let standard_samples = adaptive_mcmc_chain(&mut rng, standard_model_fn, 500, 100);
let std_intercepts: Vec<f64> = standard_samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("intercept")).unwrap())
.collect();
let std_slopes: Vec<f64> = standard_samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("slope")).unwrap())
.collect();
println!(
" - Intercept: {:.3} (true: 3.0)",
std_intercepts.iter().sum::<f64>() / std_intercepts.len() as f64
);
println!(
" - Slope: {:.3} (true: 1.2)",
std_slopes.iter().sum::<f64>() / std_slopes.len() as f64
);
// Robust regression (conceptual - using same likelihood but different prior structure)
println!("\n🛡️ Robust Regression (Conceptual):");
let mut rng2 = StdRng::seed_from_u64(42);
let robust_model_fn = || robust_regression_model(x_data.clone(), y_data.clone());
let robust_samples = adaptive_mcmc_chain(&mut rng2, robust_model_fn, 500, 100);
let rob_intercepts: Vec<f64> = robust_samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("intercept")).unwrap())
.collect();
let rob_slopes: Vec<f64> = robust_samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("slope")).unwrap())
.collect();
let rob_nus: Vec<f64> = robust_samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("nu")).unwrap())
.collect();
println!(
" - Intercept: {:.3} (true: 3.0)",
rob_intercepts.iter().sum::<f64>() / rob_intercepts.len() as f64
);
println!(
" - Slope: {:.3} (true: 1.2)",
rob_slopes.iter().sum::<f64>() / rob_slopes.len() as f64
);
println!(
" - Degrees of freedom (ν): {:.3}",
rob_nus.iter().sum::<f64>() / rob_nus.len() as f64
);
println!("\n💡 Note: Lower ν indicates heavier tails (more robust to outliers)");
println!();
}
Robust vs. Standard Comparison
graph LR A["Data with Outliers"] --> B["Standard Regression"] A --> C["Robust Regression"] B --> D["Biased Parameters<br/>Large Residuals"] C --> E["Stable Parameters<br/>Heavy-tailed Errors"]
Advantages of Robust Regression:
- Outlier Resistance: Heavy tails accommodate extreme values
- Automatic Detection: Low indicates outlier presence
- Flexible: Reduces to normal regression when is large
t-distribution likelihoods are more computationally expensive than normal distributions. For very large datasets, consider preprocessing to remove obvious outliers first.
Polynomial Regression
Linear regression can model nonlinear relationships using polynomial basis functions:
Mathematical Framework
Design Matrix: For polynomial degree :
Hierarchical Prior: Control overfitting with shrinkage priors:
Implementation
use fugue::*;
// Polynomial regression with automatic relevance determination
fn polynomial_regression_model(
x_data: Vec<f64>,
y_data: Vec<f64>,
_degree: usize,
) -> Model<Vec<f64>> {
prob! {
// Hierarchical prior for polynomial coefficients
let precision <- sample(addr!("precision"), Gamma::new(2.0, 1.0).unwrap());
// Sample polynomial coefficients (fixed degree for simplicity)
let coef_0 <- sample(addr!("coef", 0), Normal::new(0.0, 1.0 / precision.sqrt()).unwrap());
let coef_1 <- sample(addr!("coef", 1), Normal::new(0.0, 1.0 / precision.sqrt()).unwrap());
let coef_2 <- sample(addr!("coef", 2), Normal::new(0.0, 1.0 / precision.sqrt()).unwrap());
let coefficients = vec![coef_0, coef_1, coef_2];
// Noise parameter
let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());
// Clone coefficients for use in closure
let coefficients_for_observations = coefficients.clone();
let _observations <- plate!(i in x_data.iter().zip(y_data.iter()).enumerate().take(3) => {
let (idx, (x_i, y_i)) = i;
let mut mean_i = 0.0;
for (d, coef) in coefficients_for_observations.iter().enumerate() {
mean_i += coef * x_i.powi(d as i32);
}
observe(addr!("y", idx), Normal::new(mean_i, sigma).unwrap(), *y_i)
});
pure(coefficients)
}
}
fn polynomial_regression_demo() {
println!("=== Polynomial Regression ===\n");
// Generate nonlinear data: y = 1 + 2x - 0.5x² + noise
let x_raw: Vec<f64> = (0..30).map(|i| i as f64 / 29.0 * 4.0).collect(); // x from 0 to 4
let y_data: Vec<f64> = x_raw
.iter()
.map(|&x| {
let true_mean = 1.0 + 2.0 * x - 0.5 * x.powi(2);
let mut rng = StdRng::seed_from_u64(((x * 1000.0) as u64) + 555);
true_mean + Normal::new(0.0, 0.3).unwrap().sample(&mut rng)
})
.collect();
println!("📊 Generated nonlinear data: y = 1 + 2x - 0.5x² + noise");
println!(" - {} data points, x ∈ [0, 4]", x_raw.len());
// Fit polynomial models of different degrees
for degree in [1, 2, 3].iter() {
println!("\n🔬 Fitting degree {} polynomial...", degree);
let mut rng = StdRng::seed_from_u64(42 + *degree as u64);
let model_fn = || polynomial_regression_model(x_raw.clone(), y_data.clone(), *degree);
let samples = adaptive_mcmc_chain(&mut rng, model_fn, 400, 80);
println!(" Coefficient estimates:");
for d in 0..=*degree {
let coef_samples: Vec<f64> = samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("coef", d)).unwrap())
.collect();
let mean_coef = coef_samples.iter().sum::<f64>() / coef_samples.len() as f64;
let true_coef = match d {
0 => 1.0, // intercept
1 => 2.0, // linear term
2 => -0.5, // quadratic term
_ => 0.0, // higher terms should be ~0
};
println!(" x^{}: {:.3} (true: {:.1})", d, mean_coef, true_coef);
}
// Model comparison metric (simplified log marginal likelihood)
let log_likelihoods: Vec<f64> = samples
.iter()
.map(|(_, trace)| trace.log_likelihood)
.collect();
let avg_log_likelihood = log_likelihoods.iter().sum::<f64>() / log_likelihoods.len() as f64;
println!(" Average log-likelihood: {:.2}", avg_log_likelihood);
}
println!("\n💡 The degree-2 polynomial should have the highest likelihood!");
println!();
}
Overfitting Prevention
graph TD A["Polynomial Degree"] --> B["Model Complexity"] B --> C{Degree Choice} C -->|Too Low| D["Underfitting<br/>High Bias"] C -->|Just Right| E["Good Fit<br/>Balanced"] C -->|Too High| F["Overfitting<br/>High Variance"] G["Hierarchical Priors"] --> H["Automatic Shrinkage"] H --> E
Shrinkage Benefits:
- Automatic Regularization: Higher-order terms shrink toward zero
- Bias-Variance Tradeoff: Balances model flexibility with stability
- Model Selection: Coefficients near zero indicate irrelevant terms
Bayesian Model Selection
Compare different regression models using marginal likelihood and information criteria.
Model Comparison Framework
For models :
Marginal Likelihood:
Bayes Factors:
Model Posterior Probabilities:
Implementation
use fugue::*;
// Bayesian model selection for regression
#[derive(Clone, Copy, Debug)]
enum RegressionModel {
Linear,
Quadratic,
Cubic,
}
fn model_selection_demo() {
println!("=== Bayesian Model Selection ===\n");
// Generate quadratic data
let x_data: Vec<f64> = (0..25).map(|i| (i as f64 - 12.0) / 5.0).collect(); // x from -2.4 to 2.4
let y_data: Vec<f64> = x_data
.iter()
.map(|&x| {
let true_mean = 0.5 + 1.5 * x - 0.8 * x.powi(2);
let mut rng = StdRng::seed_from_u64(((x.abs() * 1000.0) as u64) + 777);
true_mean + Normal::new(0.0, 0.2).unwrap().sample(&mut rng)
})
.collect();
println!("📊 True model: y = 0.5 + 1.5x - 0.8x² + noise");
let models = [
(RegressionModel::Linear, 1),
(RegressionModel::Quadratic, 2),
(RegressionModel::Cubic, 3),
];
let mut model_scores = Vec::new();
for (model_type, degree) in models.iter() {
println!("\n🔬 Evaluating {:?} model...", model_type);
let mut rng = StdRng::seed_from_u64(42 + *degree as u64);
let model_fn = || polynomial_regression_model(x_data.clone(), y_data.clone(), *degree);
let samples = adaptive_mcmc_chain(&mut rng, model_fn, 300, 60);
// Compute approximate marginal likelihood (harmonic mean estimator)
let log_likelihoods: Vec<f64> = samples
.iter()
.map(|(_, trace)| trace.log_likelihood)
.collect();
let max_ll = log_likelihoods
.iter()
.fold(f64::NEG_INFINITY, |a, &b| a.max(b));
let shifted_lls: Vec<f64> = log_likelihoods.iter().map(|ll| ll - max_ll).collect();
let mean_exp_ll =
shifted_lls.iter().map(|ll| ll.exp()).sum::<f64>() / shifted_lls.len() as f64;
let marginal_log_likelihood = max_ll + mean_exp_ll.ln();
model_scores.push((*model_type, marginal_log_likelihood));
println!(
" - Marginal log-likelihood: {:.2}",
marginal_log_likelihood
);
// Show coefficient estimates
for d in 0..=*degree {
let coef_samples: Vec<f64> = samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("coef", d)).unwrap())
.collect();
let mean_coef = coef_samples.iter().sum::<f64>() / coef_samples.len() as f64;
println!(" Coefficient x^{}: {:.3}", d, mean_coef);
}
}
// Find best model
model_scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
println!("\n🏆 Model Ranking:");
for (i, (model, score)) in model_scores.iter().enumerate() {
let relative_score = score - model_scores[0].1;
println!(
" {}. {:?}: {:.2} (Δ = {:.2})",
i + 1,
model,
score,
relative_score
);
}
println!("\n💡 The Quadratic model should win (matches true data generating process)!");
println!();
}
Model Selection Criteria
Criterion | Formula | Interpretation |
---|---|---|
Marginal Likelihood | Higher is better | |
Bayes Factor | > 3: strong evidence for | |
DIC | Lower is better | |
WAIC | Lower is better |
- Start Simple: Begin with linear models, add complexity as needed
- Cross-Validation: Use holdout data to validate model predictions
- Domain Knowledge: Consider scientific plausibility, not just statistical fit
- Multiple Criteria: Don't rely on a single selection criterion
Regularized Regression
High-dimensional regression requires regularization to prevent overfitting. Ridge regression achieves this through hierarchical priors.
Ridge Regression Theory
Penalty Formulation:
Bayesian Equivalent:
The regularization parameter controls shrinkage:
- Large : Strong shrinkage (high bias, low variance)
- Small : Weak shrinkage (low bias, high variance)
Implementation
use fugue::*;
// Ridge regression (L2 regularization) through hierarchical priors
fn ridge_regression_model(x_data: Vec<Vec<f64>>, y_data: Vec<f64>, lambda: f64) -> Model<Vec<f64>> {
let p = x_data[0].len(); // number of features
prob! {
// Sample coefficients with ridge penalty
let beta_0 <- sample(addr!("beta", 0), Normal::new(0.0, 1.0 / lambda.sqrt()).unwrap());
let beta_1 <- sample(addr!("beta", 1), Normal::new(0.0, 1.0 / lambda.sqrt()).unwrap());
let beta_2 <- sample(addr!("beta", 2), Normal::new(0.0, 1.0 / lambda.sqrt()).unwrap());
let coefficients = vec![beta_0, beta_1, beta_2];
let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());
// Clone coefficients for use in closure
let coefficients_for_observations = coefficients.clone();
let _observations <- plate!(i in x_data.iter().zip(y_data.iter()).enumerate().take(2) => {
let (idx, (x_i, y_i)) = i;
let mut mean_i = 0.0;
for (j, beta_j) in coefficients_for_observations.iter().enumerate() {
if j < p && j < x_i.len() {
mean_i += beta_j * x_i[j];
}
}
observe(addr!("y", idx), Normal::new(mean_i, sigma).unwrap(), *y_i)
});
pure(coefficients)
}
}
fn regularized_regression_demo() {
println!("=== Regularized Regression (Ridge) ===\n");
// Generate high-dimensional data with few relevant features
let n = 40;
let p = 8; // 8 features, only 3 are relevant
let mut x_data = Vec::new();
let mut y_data = Vec::new();
let true_coefs = [2.0, -1.5, 0.0, 1.2, 0.0, 0.0, 0.0, -0.8]; // Only indices 0,1,3,7 matter
for i in 0..n {
let mut rng = StdRng::seed_from_u64(1000 + i as u64);
let x_i: Vec<f64> = (0..p)
.map(|_| Normal::new(0.0, 1.0).unwrap().sample(&mut rng))
.collect();
let true_mean: f64 = x_i.iter().zip(true_coefs.iter()).map(|(x, c)| x * c).sum();
let y_i = true_mean + Normal::new(0.0, 0.5).unwrap().sample(&mut rng);
x_data.push(x_i);
y_data.push(y_i);
}
println!("📊 High-dimensional regression:");
println!(" - {} observations, {} features", n, p);
println!(" - True coefficients: [2.0, -1.5, 0.0, 1.2, 0.0, 0.0, 0.0, -0.8]");
println!(" - Only 4 out of 8 features are relevant");
// Compare different regularization strengths
let lambdas = [0.1, 1.0, 10.0];
for &lambda in lambdas.iter() {
println!("\n🔬 Ridge regression with λ = {}:", lambda);
let mut rng = StdRng::seed_from_u64(42 + (lambda * 100.0) as u64);
let model_fn = || ridge_regression_model(x_data.clone(), y_data.clone(), lambda);
let samples = adaptive_mcmc_chain(&mut rng, model_fn, 300, 60);
println!(" Coefficient estimates (true values in parentheses):");
for (j, &true_coef) in true_coefs.iter().enumerate().take(p) {
let coef_samples: Vec<f64> = samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("beta", j)).unwrap())
.collect();
let mean_coef = coef_samples.iter().sum::<f64>() / coef_samples.len() as f64;
println!(" β{}: {:6.3} ({:5.1})", j, mean_coef, true_coef);
}
// Compute prediction accuracy (simplified)
let predictions: Vec<f64> = x_data
.iter()
.map(|x_i| {
let mut pred = 0.0;
for (j, &x_val) in x_i.iter().enumerate().take(p) {
let coef_samples: Vec<f64> = samples
.iter()
.map(|(_, trace)| trace.get_f64(&addr!("beta", j)).unwrap())
.collect();
let mean_coef = coef_samples.iter().sum::<f64>() / coef_samples.len() as f64;
pred += mean_coef * x_val;
}
pred
})
.collect();
let mse = y_data
.iter()
.zip(predictions.iter())
.map(|(y, pred)| (y - pred).powi(2))
.sum::<f64>()
/ n as f64;
println!(" - Mean Squared Error: {:.4}", mse);
}
println!("\n💡 Higher λ shrinks coefficients toward zero (regularization effect)");
println!(" Optimal λ balances bias-variance tradeoff!");
println!();
}
Regularization Effects
graph TB A["High-Dimensional Data<br/>p >> n"] --> B["Regularization"] B --> C["λ = 0.1<br/>Weak Shrinkage"] B --> D["λ = 1.0<br/>Moderate Shrinkage"] B --> E["λ = 10.0<br/>Strong Shrinkage"] C --> F["Low Bias<br/>High Variance"] D --> G["Balanced<br/>Optimal MSE"] E --> H["High Bias<br/>Low Variance"]
Advantages of Bayesian Ridge:
- Automatic λ Selection: Through hierarchical priors on precision
- Uncertainty Quantification: Full posterior for all parameters
- Feature Selection: Coefficients with narrow posteriors around zero
Advanced Extensions
Hierarchical Linear Models
use fugue::*;
// Group-level regression with varying intercepts
fn hierarchical_regression_model(
x_data: Vec<f64>,
y_data: Vec<f64>,
group_ids: Vec<usize>
) -> Model<(f64, f64, Vec<f64>)> {
let n_groups = group_ids.iter().max().unwrap() + 1;
prob!(
// Global parameters
let global_slope <- sample(addr!("global_slope"), Normal::new(0.0, 5.0).unwrap());
let sigma_y <- sample(addr!("sigma_y"), Gamma::new(2.0, 0.5).unwrap());
let sigma_group <- sample(addr!("sigma_group"), Gamma::new(2.0, 1.0).unwrap());
// Group-specific intercepts
let mut group_intercepts = Vec::new();
for g in 0..n_groups {
let intercept_g <- sample(
addr!("intercept", g),
Normal::new(0.0, sigma_group).unwrap()
);
group_intercepts.push(intercept_g);
}
// Likelihood
for (i, ((x_i, y_i), group_i)) in x_data.iter()
.zip(y_data.iter())
.zip(group_ids.iter())
.enumerate()
{
let mean_i = group_intercepts[*group_i] + global_slope * x_i;
let _obs <- observe(addr!("y", i), Normal::new(mean_i, sigma_y).unwrap(), *y_i);
}
pure((global_slope, sigma_y, group_intercepts))
)
}
Spline Regression
use fugue::*;
// Bayesian cubic spline regression
fn spline_regression_model(
x_data: Vec<f64>,
y_data: Vec<f64>,
knots: Vec<f64>
) -> Model<Vec<f64>> {
let n_basis = knots.len() + 3; // Cubic splines
prob!(
// Smoothness prior
let precision <- sample(addr!("precision"), Gamma::new(1.0, 0.1).unwrap());
// Spline coefficients with smoothness penalty
let mut coefficients = Vec::new();
for j in 0..n_basis {
let coef_j <- sample(
addr!("coef", j),
Normal::new(0.0, 1.0 / precision.sqrt()).unwrap()
);
coefficients.push(coef_j);
}
let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());
// Likelihood (basis functions would be computed here)
for (i, (x_i, y_i)) in x_data.iter().zip(y_data.iter()).enumerate() {
// Compute basis function values at x_i
let mut mean_i = 0.0;
for (j, coef_j) in coefficients.iter().enumerate() {
// basis_function(x_i, j, knots) would compute B-spline basis
let basis_val = if j < knots.len() {
(x_i - knots[j]).max(0.0).powi(3)
} else {
x_i.powi(j - knots.len())
};
mean_i += coef_j * basis_val;
}
let _obs <- observe(addr!("y", i), Normal::new(mean_i, sigma).unwrap(), *y_i);
}
pure(coefficients)
)
}
Production Considerations
Scalability
For large datasets:
- Minibatch MCMC: Use data subsets for likelihood computation
- Variational Inference: Approximate posterior for faster computation
- GPU Acceleration: Vectorized operations on GPU
- Sparse Representations: Efficient storage for high-dimensional sparse data
Model Diagnostics
Essential checks for regression models:
use fugue::inference::diagnostics::*;
fn regression_diagnostics(samples: &[(f64, f64, f64)], x_data: &[f64], y_data: &[f64]) {
// Residual analysis
let predictions: Vec<f64> = samples.iter().map(|(intercept, slope, _)| {
x_data.iter().map(|&x| intercept + slope * x).collect::<Vec<_>>()
}).flatten().collect();
// Compute residuals
let residuals: Vec<f64> = y_data.iter().zip(predictions.iter())
.map(|(y, pred)| y - pred).collect();
// Check for patterns in residuals
println!("Residual diagnostics:");
println!(" Mean residual: {:.4}", residuals.iter().sum::<f64>() / residuals.len() as f64);
println!(" Residual std: {:.4}", {
let mean = residuals.iter().sum::<f64>() / residuals.len() as f64;
(residuals.iter().map(|r| (r - mean).powi(2)).sum::<f64>() / (residuals.len() - 1) as f64).sqrt()
});
}
Cross-Validation
use fugue::*;
fn k_fold_cross_validation<F>(
x_data: Vec<f64>,
y_data: Vec<f64>,
k: usize,
model_fn: F
) -> f64
where F: Fn(Vec<f64>, Vec<f64>) -> Model<(f64, f64, f64)>
{
let n = x_data.len();
let fold_size = n / k;
let mut mse_scores = Vec::new();
for fold in 0..k {
let test_start = fold * fold_size;
let test_end = if fold == k - 1 { n } else { (fold + 1) * fold_size };
// Split data
let mut train_x = Vec::new();
let mut train_y = Vec::new();
let mut test_x = Vec::new();
let mut test_y = Vec::new();
for i in 0..n {
if i >= test_start && i < test_end {
test_x.push(x_data[i]);
test_y.push(y_data[i]);
} else {
train_x.push(x_data[i]);
train_y.push(y_data[i]);
}
}
// Train model (simplified - would run MCMC here)
let mut rng = StdRng::seed_from_u64(fold as u64);
let (params, _) = runtime::handler::run(
PriorHandler { rng: &mut rng, trace: Trace::default() },
model_fn(train_x, train_y)
);
// Predict on test set
let predictions: Vec<f64> = test_x.iter()
.map(|&x| params.0 + params.1 * x)
.collect();
// Compute MSE
let mse = test_y.iter().zip(predictions.iter())
.map(|(y, pred)| (y - pred).powi(2))
.sum::<f64>() / test_y.len() as f64;
mse_scores.push(mse);
}
mse_scores.iter().sum::<f64>() / k as f64
}
Real-World Applications
Economic Forecasting
// Example: GDP growth prediction
let gdp_model = prob!(
// Macroeconomic predictors
let beta_inflation <- sample(addr!("beta_inflation"), Normal::new(0.0, 2.0).unwrap());
let beta_unemployment <- sample(addr!("beta_unemployment"), Normal::new(0.0, 2.0).unwrap());
let beta_interest_rate <- sample(addr!("beta_interest_rate"), Normal::new(0.0, 2.0).unwrap());
let intercept <- sample(addr!("intercept"), Normal::new(2.0, 1.0).unwrap()); // Prior: ~2% growth
let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());
// Quarterly GDP growth predictions
for (i, (inflation, unemployment, interest_rate, gdp_growth)) in economic_data.iter().enumerate() {
let expected_growth = intercept +
beta_inflation * inflation +
beta_unemployment * unemployment +
beta_interest_rate * interest_rate;
let _obs <- observe(addr!("gdp", i), Normal::new(expected_growth, sigma).unwrap(), *gdp_growth);
}
pure((intercept, beta_inflation, beta_unemployment, beta_interest_rate))
);
Medical Research
// Example: Drug dose-response modeling
let dose_response_model = prob!(
// Log-linear dose-response
let log_ic50 <- sample(addr!("log_ic50"), Normal::new(0.0, 2.0).unwrap()); // IC50 concentration
let hill_slope <- sample(addr!("hill_slope"), Normal::new(1.0, 0.5).unwrap()); // Cooperativity
let baseline <- sample(addr!("baseline"), Normal::new(100.0, 10.0).unwrap()); // No drug effect
let max_effect <- sample(addr!("max_effect"), Normal::new(0.0, 10.0).unwrap()); // Maximum inhibition
let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());
for (i, (log_dose, response)) in dose_response_data.iter().enumerate() {
// Hill equation: E = baseline + (max_effect - baseline) / (1 + 10^(hill_slope * (log_ic50 - log_dose)))
let hill_term = hill_slope * (log_ic50 - log_dose);
let expected_response = baseline + (max_effect - baseline) / (1.0 + (10.0_f64).powf(hill_term));
let _obs <- observe(addr!("response", i), Normal::new(expected_response, sigma).unwrap(), *response);
}
pure((log_ic50, hill_slope, baseline, max_effect))
);
Testing Your Understanding
Exercise 1: Multiple Regression
Extend basic linear regression to handle multiple predictors:
use fugue::*;
fn multiple_regression_model(
x_data: Vec<Vec<f64>>, // Matrix: n observations × p predictors
y_data: Vec<f64>
) -> Model<Vec<f64>> {
let p = x_data[0].len(); // number of predictors
prob!(
// TODO: Implement multiple regression
// - Create coefficient vector of length p
// - Use matrix multiplication for linear predictor
// - Add appropriate priors for high-dimensional case
pure(vec![0.0; p]) // Placeholder
)
}
Exercise 2: Heteroscedastic Regression
Model non-constant error variance:
use fugue::*;
fn heteroscedastic_model(
x_data: Vec<f64>,
y_data: Vec<f64>
) -> Model<(f64, f64, f64, f64)> {
prob!(
// TODO: Implement regression with non-constant variance
// - Model log(σ²) as linear function of x
// - σ²ᵢ = exp(γ₀ + γ₁ * xᵢ)
// - Use different variance for each observation
pure((0.0, 0.0, 0.0, 0.0)) // Placeholder
)
}
Exercise 3: Bayesian Variable Selection
Implement spike-and-slab priors for variable selection:
use fugue::*;
fn variable_selection_model(
x_data: Vec<Vec<f64>>,
y_data: Vec<f64>,
inclusion_probability: f64
) -> Model<(Vec<f64>, Vec<bool>)> {
let p = x_data[0].len();
prob!(
// TODO: Implement variable selection
// - γⱼ ~ Bernoulli(π) for inclusion indicators
// - βⱼ | γⱼ ~ γⱼ * Normal(0, τ²) + (1-γⱼ) * δ₀
// - Spike-and-slab prior structure
pure((vec![0.0; p], vec![false; p])) // Placeholder
)
}
Key Takeaways
- Bayesian Framework: Uncertainty quantification through posterior distributions
- Model Extensions: Robustness, nonlinearity, and regularization through prior specification
- Model Selection: Principled comparison using marginal likelihood and Bayes factors
- Scalability: Hierarchical models and efficient computation for high-dimensional problems
- Real-World Applications: Flexible framework adaptable to diverse scientific domains
- Production Ready: Cross-validation, diagnostics, and robust inference workflows
Core Techniques:
- ✅ Basic Regression with uncertainty quantification
- ✅ Robust Methods for outlier resistance
- ✅ Polynomial Modeling for nonlinear relationships
- ✅ Bayesian Model Selection for optimal complexity
- ✅ Ridge Regression for high-dimensional problems
- ✅ Hierarchical Extensions for grouped data
- ✅ Production Deployment with diagnostics and validation
Linear regression in Fugue provides a solid foundation for more complex statistical models. The Bayesian approach naturally handles uncertainty, enables model comparison, and scales to modern high-dimensional problems through principled regularization.
Further Reading
- Building Complex Models - Advanced modeling techniques
- Optimizing Performance - Scalable inference strategies
- Classification - Logistic regression and discrete outcomes
- Hierarchical Models - Multi-level modeling
- Gelman et al. "Bayesian Data Analysis" - Comprehensive Bayesian statistics
- McElreath "Statistical Rethinking" - Modern approach to statistical modeling