Linear Regression

Contents

The Linear Regression Framework
- Mathematical Foundation
  - Basic Linear Model
  - Bayesian Specification
Basic Linear Regression
Robust Regression
Polynomial Regression
Bayesian Model Selection
Regularized Regression
Advanced Extensions
- Hierarchical Linear Models
- Spline Regression
Production Considerations
Real-World Applications
- Economic Forecasting
- Medical Research
Testing Your Understanding
Key Takeaways
Further Reading

A comprehensive guide to Bayesian linear regression using Fugue. This tutorial demonstrates how to build, analyze, and extend linear models for real-world data analysis, showcasing the power of probabilistic programming for uncertainty quantification and model comparison.

Learning Objectives

By the end of this tutorial, you will understand:

Bayesian Linear Regression: Prior specification and posterior inference for regression parameters
Uncertainty Quantification: How to extract and interpret parameter uncertainty from MCMC samples
Robust Regression: Using heavy-tailed distributions to handle outliers
Polynomial Regression: Modeling nonlinear relationships with polynomial basis functions
Model Selection: Bayesian methods for comparing regression models
Regularization: Ridge regression through hierarchical priors
Production Applications: Scalable inference for high-dimensional regression problems

The Linear Regression Framework

Linear regression is the cornerstone of statistical modeling. In the Bayesian framework, we treat regression parameters as random variables with prior distributions, allowing us to quantify uncertainty in our estimates and make probabilistic predictions.

graph TB
    A["Data: (x₁,y₁), (x₂,y₂), ..., (xₙ,yₙ)"] --> B["Linear Model<br/>y = β₀ + β₁x + ε"]

    B --> C["Bayesian Framework"]
    C --> D["Prior: p(β₀, β₁, σ)"]
    C --> E["Likelihood: p(y|X, β₀, β₁, σ)"]

    D --> F["Posterior: p(β₀, β₁, σ|y, X)"]
    E --> F

    F --> G["MCMC Sampling"]
    G --> H["Parameter Uncertainty"]
    G --> I["Predictive Distribution"]
    G --> J["Model Comparison"]

Mathematical Foundation

Basic Linear Model

The fundamental linear regression model is:

$y_{i} = β_{0} + β_{1} x_{i} + ε_{i}$

where:

$y_{i}$ : Response variable (dependent)
$x_{i}$ : Predictor variable (independent)
$β_{0}$ : Intercept parameter
$β_{1}$ : Slope parameter
$ε_{i} \sim N (0, σ^{2})$ : Random error

Bayesian Specification

Prior Distributions:

$β_{0} \sim N (μ_{β_{0}}, σ_{β_{0}}^{2})$
$β_{1} \sim N (μ_{β_{1}}, σ_{β_{1}}^{2})$
$σ \sim Gamma (α, β)$ or $σ^{2} \sim InvGamma (α, β)$

Likelihood: $p (y_{1}, \dots, y_{n} ∣ β_{0}, β_{1}, σ, x_{1}, \dots, x_{n}) = i = 1 \prod n N (y_{i} ∣ β_{0} + β_{1} x_{i}, σ^{2})$

Posterior: $p (β_{0}, β_{1}, σ ∣ y, X) \propto p (β_{0}, β_{1}, σ) i = 1 \prod n N (y_{i} ∣ β_{0} + β_{1} x_{i}, σ^{2})$

Conjugate Analysis

When using conjugate priors (Normal-Inverse-Gamma), the posterior has a closed form. However, MCMC allows us to use more flexible priors and handle complex models without conjugacy restrictions.

Basic Linear Regression

Let's start with the fundamental case: simple linear regression with one predictor variable.

Implementation

use fugue::*;
use fugue::runtime::interpreters::PriorHandler;
use rand::{SeedableRng, rngs::StdRng};
// Basic Bayesian linear regression model
fn basic_linear_regression_model(x_data: Vec<f64>, y_data: Vec<f64>) -> Model<(f64, f64, f64)> {
    prob! {
        let intercept <- sample(addr!("intercept"), Normal::new(0.0, 10.0).unwrap());
        let slope <- sample(addr!("slope"), Normal::new(0.0, 10.0).unwrap());

        // Use a well-behaved prior for sigma (now that MCMC handles positivity constraints)
        let sigma <- sample(addr!("sigma"), Gamma::new(1.0, 1.0).unwrap()); // Mean = 1, more concentrated

        // Simple observations (limited number for efficiency)
        let _obs_0 <- observe(addr!("y", 0), Normal::new(intercept + slope * x_data[0], sigma).unwrap(), y_data[0]);
        let _obs_1 <- observe(addr!("y", 1), Normal::new(intercept + slope * x_data[1], sigma).unwrap(), y_data[1]);
        let _obs_2 <- observe(addr!("y", 2), Normal::new(intercept + slope * x_data[2], sigma).unwrap(), y_data[2]);

        pure((intercept, slope, sigma))
    }
}

fn basic_regression_demo() {
    println!("=== Basic Linear Regression ===\n");

    // Generate synthetic data: y = 2 + 1.5*x + noise (smaller dataset for demo)
    let (x_data, y_data) = generate_regression_data(20, 1.5, 2.0, 0.5, 12345);

    println!("📊 Generated {} data points", x_data.len());
    println!("   - True intercept: 2.0, True slope: 1.5, True sigma: 0.5");
    println!(
        "   - Data range: x ∈ [{:.1}, {:.1}], y ∈ [{:.1}, {:.1}]",
        x_data[0],
        x_data[x_data.len() - 1],
        y_data.iter().fold(f64::INFINITY, |a, &b| a.min(b)),
        y_data.iter().fold(f64::NEG_INFINITY, |a, &b| a.max(b))
    );

    // Create model function that uses the data
    let model_fn = move || basic_linear_regression_model(x_data.clone(), y_data.clone());

    println!("\n🔬 Running MCMC inference...");
    let mut rng = StdRng::seed_from_u64(42);
    let samples = adaptive_mcmc_chain(&mut rng, model_fn, 500, 100);

    // Extract parameter estimates
    let intercepts: Vec<f64> = samples
        .iter()
        .filter_map(|(_, trace)| trace.get_f64(&addr!("intercept")))
        .collect();
    let slopes: Vec<f64> = samples
        .iter()
        .filter_map(|(_, trace)| trace.get_f64(&addr!("slope")))
        .collect();
    let sigmas: Vec<f64> = samples
        .iter()
        .filter_map(|(_, trace)| trace.get_f64(&addr!("sigma")))
        .collect();

    if !intercepts.is_empty() && !slopes.is_empty() && !sigmas.is_empty() {
        println!("✅ MCMC completed with {} samples", samples.len());
        println!("\n📈 Parameter Estimates:");

        let mean_intercept = intercepts.iter().sum::<f64>() / intercepts.len() as f64;
        let mean_slope = slopes.iter().sum::<f64>() / slopes.len() as f64;
        let mean_sigma = sigmas.iter().sum::<f64>() / sigmas.len() as f64;

        println!("   - Intercept: {:.3} (true: 2.0)", mean_intercept);
        println!("   - Slope: {:.3} (true: 1.5)", mean_slope);
        println!("   - Sigma: {:.3} (true: 0.5)", mean_sigma);

        // Show some diagnostics
        let valid_traces = samples
            .iter()
            .filter(|(_, trace)| trace.total_log_weight().is_finite())
            .count();
        println!("   - Valid traces: {} / {}", valid_traces, samples.len());
    } else {
        println!("❌ MCMC failed - no valid samples obtained");
    }
    println!();
}

Key Concepts

Prior Specification: We use weakly informative priors that allow the data to dominate
Vectorized Likelihood: The for loop handles multiple observations efficiently
Parameter Recovery: MCMC estimates should recover true parameter values
Uncertainty Quantification: Standard deviations provide parameter uncertainty

Prior Selection

Use Normal(0, 10) for regression coefficients when predictors are standardized
Use Gamma(2, 0.5) for error variance (σ) - gives reasonable prior mass over positive values
Adjust prior scale based on your domain knowledge and data scale

Interpretation

The posterior samples provide:

Point Estimates: Posterior means are Bayesian parameter estimates
Credible Intervals: Quantiles give uncertainty bounds (e.g., 95% credible intervals)
Predictive Distribution: For new $x^{*}$ : $p (y^{*} ∣ x^{*}, data) = \int p (y^{*} ∣ β_{0}, β_{1}, σ, x^{*}) p (β_{0}, β_{1}, σ ∣ data) d β_{0} d β_{1} d σ$

Robust Regression

Standard linear regression assumes Gaussian errors, making it sensitive to outliers. Robust regression uses heavy-tailed distributions to reduce outlier influence.

Theory

Replace the normal likelihood with a t-distribution:

$y_{i} ∣ μ_{i}, σ, ν \sim t_{ν} (μ_{i}, σ^{2})$

where:

$μ_{i} = β_{0} + β_{1} x_{i}$ (linear predictor)
$ν$ : Degrees of freedom (lower = heavier tails)
As $ν \to \infty$ , $t_{ν} \to N$ (normal distribution)

Implementation

use fugue::*;
// Robust regression using t-distribution for outlier resistance
fn robust_regression_model(x_data: Vec<f64>, y_data: Vec<f64>) -> Model<(f64, f64, f64, f64)> {
    prob! {
        let intercept <- sample(addr!("intercept"), Normal::new(0.0, 10.0).unwrap());
        let slope <- sample(addr!("slope"), Normal::new(0.0, 10.0).unwrap());
        let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());
        let nu <- sample(addr!("nu"), Gamma::new(2.0, 0.1).unwrap()); // Degrees of freedom for t-dist

        // Use plate notation for observations
        let _observations <- plate!(i in x_data.iter().zip(y_data.iter()).enumerate().take(3) => {
            let (idx, (x_i, y_i)) = i;
            observe(addr!("y", idx), Normal::new(intercept + slope * x_i, sigma).unwrap(), *y_i)
        });

        pure((intercept, slope, sigma, nu))
    }
}

fn robust_regression_demo() {
    println!("=== Robust Linear Regression ===\n");

    // Generate data with outliers
    let (mut x_data, mut y_data) = generate_regression_data(40, 1.2, 3.0, 0.4, 67890);

    // Add some outliers
    x_data.extend(vec![8.5, 9.2, 7.8]);
    y_data.extend(vec![20.0, -5.0, 25.0]); // Clear outliers

    println!(
        "📊 Generated {} data points (with 3 outliers)",
        x_data.len()
    );
    println!("   - Base relationship: y = 3.0 + 1.2*x + noise");
    println!("   - Added outliers at x=[8.5, 9.2, 7.8] with y=[20.0, -5.0, 25.0]");

    // Compare standard vs robust regression
    let mut rng = StdRng::seed_from_u64(42);

    // Standard regression
    println!("\n🔬 Standard Linear Regression:");
    let standard_model_fn = || basic_linear_regression_model(x_data.clone(), y_data.clone());
    let standard_samples = adaptive_mcmc_chain(&mut rng, standard_model_fn, 500, 100);

    let std_intercepts: Vec<f64> = standard_samples
        .iter()
        .map(|(_, trace)| trace.get_f64(&addr!("intercept")).unwrap())
        .collect();
    let std_slopes: Vec<f64> = standard_samples
        .iter()
        .map(|(_, trace)| trace.get_f64(&addr!("slope")).unwrap())
        .collect();

    println!(
        "   - Intercept: {:.3} (true: 3.0)",
        std_intercepts.iter().sum::<f64>() / std_intercepts.len() as f64
    );
    println!(
        "   - Slope: {:.3} (true: 1.2)",
        std_slopes.iter().sum::<f64>() / std_slopes.len() as f64
    );

    // Robust regression (conceptual - using same likelihood but different prior structure)
    println!("\n🛡️ Robust Regression (Conceptual):");
    let mut rng2 = StdRng::seed_from_u64(42);
    let robust_model_fn = || robust_regression_model(x_data.clone(), y_data.clone());
    let robust_samples = adaptive_mcmc_chain(&mut rng2, robust_model_fn, 500, 100);

    let rob_intercepts: Vec<f64> = robust_samples
        .iter()
        .map(|(_, trace)| trace.get_f64(&addr!("intercept")).unwrap())
        .collect();
    let rob_slopes: Vec<f64> = robust_samples
        .iter()
        .map(|(_, trace)| trace.get_f64(&addr!("slope")).unwrap())
        .collect();
    let rob_nus: Vec<f64> = robust_samples
        .iter()
        .map(|(_, trace)| trace.get_f64(&addr!("nu")).unwrap())
        .collect();

    println!(
        "   - Intercept: {:.3} (true: 3.0)",
        rob_intercepts.iter().sum::<f64>() / rob_intercepts.len() as f64
    );
    println!(
        "   - Slope: {:.3} (true: 1.2)",
        rob_slopes.iter().sum::<f64>() / rob_slopes.len() as f64
    );
    println!(
        "   - Degrees of freedom (ν): {:.3}",
        rob_nus.iter().sum::<f64>() / rob_nus.len() as f64
    );

    println!("\n💡 Note: Lower ν indicates heavier tails (more robust to outliers)");
    println!();
}

Robust vs. Standard Comparison

graph LR
    A["Data with Outliers"] --> B["Standard Regression"]
    A --> C["Robust Regression"]

    B --> D["Biased Parameters<br/>Large Residuals"]
    C --> E["Stable Parameters<br/>Heavy-tailed Errors"]

Advantages of Robust Regression:

Outlier Resistance: Heavy tails accommodate extreme values
Automatic Detection: Low $ν$ indicates outlier presence
Flexible: Reduces to normal regression when $ν$ is large

Computational Complexity

t-distribution likelihoods are more computationally expensive than normal distributions. For very large datasets, consider preprocessing to remove obvious outliers first.

Polynomial Regression

Linear regression can model nonlinear relationships using polynomial basis functions:

$y_{i} = β_{0} + β_{1} x_{i} + β_{2} x_{i}^{2} + \dots + β_{p} x_{i}^{p} + ε_{i}$

Mathematical Framework

Design Matrix: For polynomial degree $p$ : $X = 11 ⋮ 1 x_{1} x_{2} ⋮ x_{n} x_{1}^{2} x_{2}^{2} ⋮ x_{n}^{2} \dots \dots ⋱ \dots x_{1}^{p} x_{2}^{p} ⋮ x_{n}^{p}$

Hierarchical Prior: Control overfitting with shrinkage priors: $τ \sim Gamma (a, b) (global precision)$ $β_{j} \sim N (0, 1/ τ) j = 0, 1, \dots, p$

Implementation

use fugue::*;
// Polynomial regression with automatic relevance determination
fn polynomial_regression_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    _degree: usize,
) -> Model<Vec<f64>> {
    prob! {
        // Hierarchical prior for polynomial coefficients
        let precision <- sample(addr!("precision"), Gamma::new(2.0, 1.0).unwrap());

        // Sample polynomial coefficients (fixed degree for simplicity)
        let coef_0 <- sample(addr!("coef", 0), Normal::new(0.0, 1.0 / precision.sqrt()).unwrap());
        let coef_1 <- sample(addr!("coef", 1), Normal::new(0.0, 1.0 / precision.sqrt()).unwrap());
        let coef_2 <- sample(addr!("coef", 2), Normal::new(0.0, 1.0 / precision.sqrt()).unwrap());
        let coefficients = vec![coef_0, coef_1, coef_2];

        // Noise parameter
        let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());

        // Clone coefficients for use in closure
        let coefficients_for_observations = coefficients.clone();
        let _observations <- plate!(i in x_data.iter().zip(y_data.iter()).enumerate().take(3) => {
            let (idx, (x_i, y_i)) = i;
            let mut mean_i = 0.0;
            for (d, coef) in coefficients_for_observations.iter().enumerate() {
                mean_i += coef * x_i.powi(d as i32);
            }
            observe(addr!("y", idx), Normal::new(mean_i, sigma).unwrap(), *y_i)
        });

        pure(coefficients)
    }
}

fn polynomial_regression_demo() {
    println!("=== Polynomial Regression ===\n");

    // Generate nonlinear data: y = 1 + 2x - 0.5x² + noise
    let x_raw: Vec<f64> = (0..30).map(|i| i as f64 / 29.0 * 4.0).collect(); // x from 0 to 4
    let y_data: Vec<f64> = x_raw
        .iter()
        .map(|&x| {
            let true_mean = 1.0 + 2.0 * x - 0.5 * x.powi(2);
            let mut rng = StdRng::seed_from_u64(((x * 1000.0) as u64) + 555);
            true_mean + Normal::new(0.0, 0.3).unwrap().sample(&mut rng)
        })
        .collect();

    println!("📊 Generated nonlinear data: y = 1 + 2x - 0.5x² + noise");
    println!("   - {} data points, x ∈ [0, 4]", x_raw.len());

    // Fit polynomial models of different degrees
    for degree in [1, 2, 3].iter() {
        println!("\n🔬 Fitting degree {} polynomial...", degree);

        let mut rng = StdRng::seed_from_u64(42 + *degree as u64);
        let model_fn = || polynomial_regression_model(x_raw.clone(), y_data.clone(), *degree);
        let samples = adaptive_mcmc_chain(&mut rng, model_fn, 400, 80);

        println!("   Coefficient estimates:");
        for d in 0..=*degree {
            let coef_samples: Vec<f64> = samples
                .iter()
                .map(|(_, trace)| trace.get_f64(&addr!("coef", d)).unwrap())
                .collect();
            let mean_coef = coef_samples.iter().sum::<f64>() / coef_samples.len() as f64;

            let true_coef = match d {
                0 => 1.0,  // intercept
                1 => 2.0,  // linear term
                2 => -0.5, // quadratic term
                _ => 0.0,  // higher terms should be ~0
            };

            println!("     x^{}: {:.3} (true: {:.1})", d, mean_coef, true_coef);
        }

        // Model comparison metric (simplified log marginal likelihood)
        let log_likelihoods: Vec<f64> = samples
            .iter()
            .map(|(_, trace)| trace.log_likelihood)
            .collect();
        let avg_log_likelihood = log_likelihoods.iter().sum::<f64>() / log_likelihoods.len() as f64;
        println!("     Average log-likelihood: {:.2}", avg_log_likelihood);
    }

    println!("\n💡 The degree-2 polynomial should have the highest likelihood!");
    println!();
}

Overfitting Prevention

graph TD
    A["Polynomial Degree"] --> B["Model Complexity"]
    B --> C{Degree Choice}

    C -->|Too Low| D["Underfitting<br/>High Bias"]
    C -->|Just Right| E["Good Fit<br/>Balanced"]
    C -->|Too High| F["Overfitting<br/>High Variance"]

    G["Hierarchical Priors"] --> H["Automatic Shrinkage"]
    H --> E

Shrinkage Benefits:

Automatic Regularization: Higher-order terms shrink toward zero
Bias-Variance Tradeoff: Balances model flexibility with stability
Model Selection: Coefficients near zero indicate irrelevant terms

Bayesian Model Selection

Compare different regression models using marginal likelihood and information criteria.

Model Comparison Framework

For models $M_{1}, M_{2}, \dots, M_{K}$ :

Marginal Likelihood: $p (y ∣ M_{k}) = \int p (y ∣ θ_{k}, M_{k}) p (θ_{k} ∣ M_{k}) d θ_{k}$

Bayes Factors: $B F_{12} = \frac{p ( y ∣ M _{1} )}{p ( y ∣ M _{2} )}$

Model Posterior Probabilities: $p (M_{k} ∣ y) = \frac{p ( y ∣ M _{k} ) p ( M _{k} )}{\sum _{j = 1}^{K} p ( y ∣ M _{j} ) p ( M _{j} )}$

Implementation

use fugue::*;
// Bayesian model selection for regression
#[derive(Clone, Copy, Debug)]
enum RegressionModel {
    Linear,
    Quadratic,
    Cubic,
}

fn model_selection_demo() {
    println!("=== Bayesian Model Selection ===\n");

    // Generate quadratic data
    let x_data: Vec<f64> = (0..25).map(|i| (i as f64 - 12.0) / 5.0).collect(); // x from -2.4 to 2.4
    let y_data: Vec<f64> = x_data
        .iter()
        .map(|&x| {
            let true_mean = 0.5 + 1.5 * x - 0.8 * x.powi(2);
            let mut rng = StdRng::seed_from_u64(((x.abs() * 1000.0) as u64) + 777);
            true_mean + Normal::new(0.0, 0.2).unwrap().sample(&mut rng)
        })
        .collect();

    println!("📊 True model: y = 0.5 + 1.5x - 0.8x² + noise");

    let models = [
        (RegressionModel::Linear, 1),
        (RegressionModel::Quadratic, 2),
        (RegressionModel::Cubic, 3),
    ];

    let mut model_scores = Vec::new();

    for (model_type, degree) in models.iter() {
        println!("\n🔬 Evaluating {:?} model...", model_type);

        let mut rng = StdRng::seed_from_u64(42 + *degree as u64);
        let model_fn = || polynomial_regression_model(x_data.clone(), y_data.clone(), *degree);
        let samples = adaptive_mcmc_chain(&mut rng, model_fn, 300, 60);

        // Compute approximate marginal likelihood (harmonic mean estimator)
        let log_likelihoods: Vec<f64> = samples
            .iter()
            .map(|(_, trace)| trace.log_likelihood)
            .collect();

        let max_ll = log_likelihoods
            .iter()
            .fold(f64::NEG_INFINITY, |a, &b| a.max(b));
        let shifted_lls: Vec<f64> = log_likelihoods.iter().map(|ll| ll - max_ll).collect();
        let mean_exp_ll =
            shifted_lls.iter().map(|ll| ll.exp()).sum::<f64>() / shifted_lls.len() as f64;
        let marginal_log_likelihood = max_ll + mean_exp_ll.ln();

        model_scores.push((*model_type, marginal_log_likelihood));

        println!(
            "   - Marginal log-likelihood: {:.2}",
            marginal_log_likelihood
        );

        // Show coefficient estimates
        for d in 0..=*degree {
            let coef_samples: Vec<f64> = samples
                .iter()
                .map(|(_, trace)| trace.get_f64(&addr!("coef", d)).unwrap())
                .collect();
            let mean_coef = coef_samples.iter().sum::<f64>() / coef_samples.len() as f64;
            println!("     Coefficient x^{}: {:.3}", d, mean_coef);
        }
    }

    // Find best model
    model_scores.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());

    println!("\n🏆 Model Ranking:");
    for (i, (model, score)) in model_scores.iter().enumerate() {
        let relative_score = score - model_scores[0].1;
        println!(
            "   {}. {:?}: {:.2} (Δ = {:.2})",
            i + 1,
            model,
            score,
            relative_score
        );
    }

    println!("\n💡 The Quadratic model should win (matches true data generating process)!");
    println!();
}

Model Selection Criteria

Criterion	Formula	Interpretation
Marginal Likelihood	$p (y ∥ M)$	Higher is better
Bayes Factor	$B F_{12} = \frac{p ( y ∥ M _{1} )}{p ( y ∥ M _{2} )}$	> 3: strong evidence for $M_{1}$
DIC	$Deviance + p_{D}$	Lower is better
WAIC	$- 2 \times lppd + 2 \times p_{WAIC}$	Lower is better

Model Selection Guidelines

Start Simple: Begin with linear models, add complexity as needed
Cross-Validation: Use holdout data to validate model predictions
Domain Knowledge: Consider scientific plausibility, not just statistical fit
Multiple Criteria: Don't rely on a single selection criterion

Regularized Regression

High-dimensional regression requires regularization to prevent overfitting. Ridge regression achieves this through hierarchical priors.

Ridge Regression Theory

Penalty Formulation: $minimize i = 1 \sum n (y_{i} - x_{i}^{T} β)^{2} + λ j = 1 \sum p β_{j}^{2}$

Bayesian Equivalent: $β_{j} \sim N (0, 1/ λ) j = 1, 2, \dots, p$

The regularization parameter $λ$ controls shrinkage:

Large $λ$ : Strong shrinkage (high bias, low variance)
Small $λ$ : Weak shrinkage (low bias, high variance)

Implementation

use fugue::*;
// Ridge regression (L2 regularization) through hierarchical priors
fn ridge_regression_model(x_data: Vec<Vec<f64>>, y_data: Vec<f64>, lambda: f64) -> Model<Vec<f64>> {
    let p = x_data[0].len(); // number of features

    prob! {
        // Sample coefficients with ridge penalty
        let beta_0 <- sample(addr!("beta", 0), Normal::new(0.0, 1.0 / lambda.sqrt()).unwrap());
        let beta_1 <- sample(addr!("beta", 1), Normal::new(0.0, 1.0 / lambda.sqrt()).unwrap());
        let beta_2 <- sample(addr!("beta", 2), Normal::new(0.0, 1.0 / lambda.sqrt()).unwrap());
        let coefficients = vec![beta_0, beta_1, beta_2];

        let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());

        // Clone coefficients for use in closure
        let coefficients_for_observations = coefficients.clone();
        let _observations <- plate!(i in x_data.iter().zip(y_data.iter()).enumerate().take(2) => {
            let (idx, (x_i, y_i)) = i;
            let mut mean_i = 0.0;
            for (j, beta_j) in coefficients_for_observations.iter().enumerate() {
                if j < p && j < x_i.len() {
                    mean_i += beta_j * x_i[j];
                }
            }
            observe(addr!("y", idx), Normal::new(mean_i, sigma).unwrap(), *y_i)
        });

        pure(coefficients)
    }
}

fn regularized_regression_demo() {
    println!("=== Regularized Regression (Ridge) ===\n");

    // Generate high-dimensional data with few relevant features
    let n = 40;
    let p = 8; // 8 features, only 3 are relevant

    let mut x_data = Vec::new();
    let mut y_data = Vec::new();

    let true_coefs = [2.0, -1.5, 0.0, 1.2, 0.0, 0.0, 0.0, -0.8]; // Only indices 0,1,3,7 matter

    for i in 0..n {
        let mut rng = StdRng::seed_from_u64(1000 + i as u64);
        let x_i: Vec<f64> = (0..p)
            .map(|_| Normal::new(0.0, 1.0).unwrap().sample(&mut rng))
            .collect();

        let true_mean: f64 = x_i.iter().zip(true_coefs.iter()).map(|(x, c)| x * c).sum();
        let y_i = true_mean + Normal::new(0.0, 0.5).unwrap().sample(&mut rng);

        x_data.push(x_i);
        y_data.push(y_i);
    }

    println!("📊 High-dimensional regression:");
    println!("   - {} observations, {} features", n, p);
    println!("   - True coefficients: [2.0, -1.5, 0.0, 1.2, 0.0, 0.0, 0.0, -0.8]");
    println!("   - Only 4 out of 8 features are relevant");

    // Compare different regularization strengths
    let lambdas = [0.1, 1.0, 10.0];

    for &lambda in lambdas.iter() {
        println!("\n🔬 Ridge regression with λ = {}:", lambda);

        let mut rng = StdRng::seed_from_u64(42 + (lambda * 100.0) as u64);
        let model_fn = || ridge_regression_model(x_data.clone(), y_data.clone(), lambda);
        let samples = adaptive_mcmc_chain(&mut rng, model_fn, 300, 60);

        println!("   Coefficient estimates (true values in parentheses):");
        for (j, &true_coef) in true_coefs.iter().enumerate().take(p) {
            let coef_samples: Vec<f64> = samples
                .iter()
                .map(|(_, trace)| trace.get_f64(&addr!("beta", j)).unwrap())
                .collect();
            let mean_coef = coef_samples.iter().sum::<f64>() / coef_samples.len() as f64;
            println!("     β{}: {:6.3} ({:5.1})", j, mean_coef, true_coef);
        }

        // Compute prediction accuracy (simplified)
        let predictions: Vec<f64> = x_data
            .iter()
            .map(|x_i| {
                let mut pred = 0.0;
                for (j, &x_val) in x_i.iter().enumerate().take(p) {
                    let coef_samples: Vec<f64> = samples
                        .iter()
                        .map(|(_, trace)| trace.get_f64(&addr!("beta", j)).unwrap())
                        .collect();
                    let mean_coef = coef_samples.iter().sum::<f64>() / coef_samples.len() as f64;
                    pred += mean_coef * x_val;
                }
                pred
            })
            .collect();

        let mse = y_data
            .iter()
            .zip(predictions.iter())
            .map(|(y, pred)| (y - pred).powi(2))
            .sum::<f64>()
            / n as f64;

        println!("   - Mean Squared Error: {:.4}", mse);
    }

    println!("\n💡 Higher λ shrinks coefficients toward zero (regularization effect)");
    println!("   Optimal λ balances bias-variance tradeoff!");
    println!();
}

Regularization Effects

graph TB
    A["High-Dimensional Data<br/>p >> n"] --> B["Regularization"]

    B --> C["λ = 0.1<br/>Weak Shrinkage"]
    B --> D["λ = 1.0<br/>Moderate Shrinkage"]
    B --> E["λ = 10.0<br/>Strong Shrinkage"]

    C --> F["Low Bias<br/>High Variance"]
    D --> G["Balanced<br/>Optimal MSE"]
    E --> H["High Bias<br/>Low Variance"]

Advantages of Bayesian Ridge:

Automatic λ Selection: Through hierarchical priors on precision
Uncertainty Quantification: Full posterior for all parameters
Feature Selection: Coefficients with narrow posteriors around zero

Advanced Extensions

Hierarchical Linear Models

use fugue::*;

// Group-level regression with varying intercepts
fn hierarchical_regression_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>
) -> Model<(f64, f64, Vec<f64>)> {
    let n_groups = group_ids.iter().max().unwrap() + 1;

    prob!(
        // Global parameters
        let global_slope <- sample(addr!("global_slope"), Normal::new(0.0, 5.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(2.0, 0.5).unwrap());
        let sigma_group <- sample(addr!("sigma_group"), Gamma::new(2.0, 1.0).unwrap());

        // Group-specific intercepts
        let mut group_intercepts = Vec::new();
        for g in 0..n_groups {
            let intercept_g <- sample(
                addr!("intercept", g),
                Normal::new(0.0, sigma_group).unwrap()
            );
            group_intercepts.push(intercept_g);
        }

        // Likelihood
        for (i, ((x_i, y_i), group_i)) in x_data.iter()
            .zip(y_data.iter())
            .zip(group_ids.iter())
            .enumerate()
        {
            let mean_i = group_intercepts[*group_i] + global_slope * x_i;
            let _obs <- observe(addr!("y", i), Normal::new(mean_i, sigma_y).unwrap(), *y_i);
        }

        pure((global_slope, sigma_y, group_intercepts))
    )
}

Spline Regression

use fugue::*;

// Bayesian cubic spline regression
fn spline_regression_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    knots: Vec<f64>
) -> Model<Vec<f64>> {
    let n_basis = knots.len() + 3; // Cubic splines

    prob!(
        // Smoothness prior
        let precision <- sample(addr!("precision"), Gamma::new(1.0, 0.1).unwrap());

        // Spline coefficients with smoothness penalty
        let mut coefficients = Vec::new();
        for j in 0..n_basis {
            let coef_j <- sample(
                addr!("coef", j),
                Normal::new(0.0, 1.0 / precision.sqrt()).unwrap()
            );
            coefficients.push(coef_j);
        }

        let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());

        // Likelihood (basis functions would be computed here)
        for (i, (x_i, y_i)) in x_data.iter().zip(y_data.iter()).enumerate() {
            // Compute basis function values at x_i
            let mut mean_i = 0.0;
            for (j, coef_j) in coefficients.iter().enumerate() {
                // basis_function(x_i, j, knots) would compute B-spline basis
                let basis_val = if j < knots.len() {
                    (x_i - knots[j]).max(0.0).powi(3)
                } else {
                    x_i.powi(j - knots.len())
                };
                mean_i += coef_j * basis_val;
            }
            let _obs <- observe(addr!("y", i), Normal::new(mean_i, sigma).unwrap(), *y_i);
        }

        pure(coefficients)
    )
}

Production Considerations

Scalability

For large datasets:

Minibatch MCMC: Use data subsets for likelihood computation
Variational Inference: Approximate posterior for faster computation
GPU Acceleration: Vectorized operations on GPU
Sparse Representations: Efficient storage for high-dimensional sparse data

Model Diagnostics

Essential checks for regression models:

use fugue::inference::diagnostics::*;

fn regression_diagnostics(samples: &[(f64, f64, f64)], x_data: &[f64], y_data: &[f64]) {
    // Residual analysis
    let predictions: Vec<f64> = samples.iter().map(|(intercept, slope, _)| {
        x_data.iter().map(|&x| intercept + slope * x).collect::<Vec<_>>()
    }).flatten().collect();

    // Compute residuals
    let residuals: Vec<f64> = y_data.iter().zip(predictions.iter())
        .map(|(y, pred)| y - pred).collect();

    // Check for patterns in residuals
    println!("Residual diagnostics:");
    println!("  Mean residual: {:.4}", residuals.iter().sum::<f64>() / residuals.len() as f64);
    println!("  Residual std: {:.4}", {
        let mean = residuals.iter().sum::<f64>() / residuals.len() as f64;
        (residuals.iter().map(|r| (r - mean).powi(2)).sum::<f64>() / (residuals.len() - 1) as f64).sqrt()
    });
}

Cross-Validation

use fugue::*;

fn k_fold_cross_validation<F>(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    k: usize,
    model_fn: F
) -> f64
where F: Fn(Vec<f64>, Vec<f64>) -> Model<(f64, f64, f64)>
{
    let n = x_data.len();
    let fold_size = n / k;
    let mut mse_scores = Vec::new();

    for fold in 0..k {
        let test_start = fold * fold_size;
        let test_end = if fold == k - 1 { n } else { (fold + 1) * fold_size };

        // Split data
        let mut train_x = Vec::new();
        let mut train_y = Vec::new();
        let mut test_x = Vec::new();
        let mut test_y = Vec::new();

        for i in 0..n {
            if i >= test_start && i < test_end {
                test_x.push(x_data[i]);
                test_y.push(y_data[i]);
            } else {
                train_x.push(x_data[i]);
                train_y.push(y_data[i]);
            }
        }

        // Train model (simplified - would run MCMC here)
        let mut rng = StdRng::seed_from_u64(fold as u64);
        let (params, _) = runtime::handler::run(
            PriorHandler { rng: &mut rng, trace: Trace::default() },
            model_fn(train_x, train_y)
        );

        // Predict on test set
        let predictions: Vec<f64> = test_x.iter()
            .map(|&x| params.0 + params.1 * x)
            .collect();

        // Compute MSE
        let mse = test_y.iter().zip(predictions.iter())
            .map(|(y, pred)| (y - pred).powi(2))
            .sum::<f64>() / test_y.len() as f64;

        mse_scores.push(mse);
    }

    mse_scores.iter().sum::<f64>() / k as f64
}

Real-World Applications

Economic Forecasting

// Example: GDP growth prediction
let gdp_model = prob!(
    // Macroeconomic predictors
    let beta_inflation <- sample(addr!("beta_inflation"), Normal::new(0.0, 2.0).unwrap());
    let beta_unemployment <- sample(addr!("beta_unemployment"), Normal::new(0.0, 2.0).unwrap());
    let beta_interest_rate <- sample(addr!("beta_interest_rate"), Normal::new(0.0, 2.0).unwrap());
    let intercept <- sample(addr!("intercept"), Normal::new(2.0, 1.0).unwrap()); // Prior: ~2% growth

    let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());

    // Quarterly GDP growth predictions
    for (i, (inflation, unemployment, interest_rate, gdp_growth)) in economic_data.iter().enumerate() {
        let expected_growth = intercept +
                            beta_inflation * inflation +
                            beta_unemployment * unemployment +
                            beta_interest_rate * interest_rate;

        let _obs <- observe(addr!("gdp", i), Normal::new(expected_growth, sigma).unwrap(), *gdp_growth);
    }

    pure((intercept, beta_inflation, beta_unemployment, beta_interest_rate))
);

Medical Research

// Example: Drug dose-response modeling
let dose_response_model = prob!(
    // Log-linear dose-response
    let log_ic50 <- sample(addr!("log_ic50"), Normal::new(0.0, 2.0).unwrap()); // IC50 concentration
    let hill_slope <- sample(addr!("hill_slope"), Normal::new(1.0, 0.5).unwrap()); // Cooperativity
    let baseline <- sample(addr!("baseline"), Normal::new(100.0, 10.0).unwrap()); // No drug effect
    let max_effect <- sample(addr!("max_effect"), Normal::new(0.0, 10.0).unwrap()); // Maximum inhibition

    let sigma <- sample(addr!("sigma"), Gamma::new(2.0, 0.5).unwrap());

    for (i, (log_dose, response)) in dose_response_data.iter().enumerate() {
        // Hill equation: E = baseline + (max_effect - baseline) / (1 + 10^(hill_slope * (log_ic50 - log_dose)))
        let hill_term = hill_slope * (log_ic50 - log_dose);
        let expected_response = baseline + (max_effect - baseline) / (1.0 + (10.0_f64).powf(hill_term));

        let _obs <- observe(addr!("response", i), Normal::new(expected_response, sigma).unwrap(), *response);
    }

    pure((log_ic50, hill_slope, baseline, max_effect))
);

Testing Your Understanding

Exercise 1: Multiple Regression

Extend basic linear regression to handle multiple predictors:

use fugue::*;

fn multiple_regression_model(
    x_data: Vec<Vec<f64>>, // Matrix: n observations × p predictors
    y_data: Vec<f64>
) -> Model<Vec<f64>> {
    let p = x_data[0].len(); // number of predictors

    prob!(
        // TODO: Implement multiple regression
        // - Create coefficient vector of length p
        // - Use matrix multiplication for linear predictor
        // - Add appropriate priors for high-dimensional case

        pure(vec![0.0; p]) // Placeholder
    )
}

Exercise 2: Heteroscedastic Regression

Model non-constant error variance:

use fugue::*;

fn heteroscedastic_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>
) -> Model<(f64, f64, f64, f64)> {
    prob!(
        // TODO: Implement regression with non-constant variance
        // - Model log(σ²) as linear function of x
        // - σ²ᵢ = exp(γ₀ + γ₁ * xᵢ)
        // - Use different variance for each observation

        pure((0.0, 0.0, 0.0, 0.0)) // Placeholder
    )
}

Exercise 3: Bayesian Variable Selection

Implement spike-and-slab priors for variable selection:

use fugue::*;

fn variable_selection_model(
    x_data: Vec<Vec<f64>>,
    y_data: Vec<f64>,
    inclusion_probability: f64
) -> Model<(Vec<f64>, Vec<bool>)> {
    let p = x_data[0].len();

    prob!(
        // TODO: Implement variable selection
        // - γⱼ ~ Bernoulli(π) for inclusion indicators  
        // - βⱼ | γⱼ ~ γⱼ * Normal(0, τ²) + (1-γⱼ) * δ₀
        // - Spike-and-slab prior structure

        pure((vec![0.0; p], vec![false; p])) // Placeholder
    )
}

Key Takeaways

Linear Regression Mastery

Bayesian Framework: Uncertainty quantification through posterior distributions
Model Extensions: Robustness, nonlinearity, and regularization through prior specification
Model Selection: Principled comparison using marginal likelihood and Bayes factors
Scalability: Hierarchical models and efficient computation for high-dimensional problems
Real-World Applications: Flexible framework adaptable to diverse scientific domains
Production Ready: Cross-validation, diagnostics, and robust inference workflows

Core Techniques:

✅ Basic Regression with uncertainty quantification
✅ Robust Methods for outlier resistance
✅ Polynomial Modeling for nonlinear relationships
✅ Bayesian Model Selection for optimal complexity
✅ Ridge Regression for high-dimensional problems
✅ Hierarchical Extensions for grouped data
✅ Production Deployment with diagnostics and validation

Linear regression in Fugue provides a solid foundation for more complex statistical models. The Bayesian approach naturally handles uncertainty, enables model comparison, and scales to modern high-dimensional problems through principled regularization.

Keyboard shortcuts

Fugue Docs