Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Hierarchical Models

Contents

This tutorial covers Bayesian hierarchical modeling using Fugue:

  • Varying Intercepts: Group-level intercept variation
  • Varying Slopes: Group-level slope variation
  • Mixed Effects: Combined random and fixed effects
  • Hierarchical Priors: Multi-level parameter structures
  • Model Selection: Comparing hierarchical complexity
  • Practical Applications: Real-world hierarchical data analysis

Learning Objectives

After completing this tutorial, you will be able to:

  • Model grouped/clustered data with hierarchical structures
  • Implement varying intercepts and slopes models
  • Use mixed effects for complex data relationships
  • Apply hierarchical priors for robust parameter estimation
  • Perform model selection across hierarchical complexity levels
  • Handle partial pooling vs complete pooling trade-offs

Introduction

Hierarchical models (also called multi-level or mixed-effects models) are essential for analyzing grouped or clustered data where observations within groups are more similar to each other than to observations in other groups. Examples include:

  • Students within schools: Academic performance varies by student and school
  • Patients within hospitals: Treatment outcomes depend on individual and hospital factors
  • Measurements over time: Repeated measures on the same subjects
  • Geographic clustering: Economic indicators within regions/countries
graph TD
    A[Hierarchical Models Framework] --> B[Population Level]
    A --> C[Group Level] 
    A --> D[Individual Level]
    
    B --> E[Fixed Effects<br/>Population Parameters]
    C --> F[Random Effects<br/>Group-Specific Parameters]
    D --> G[Observations<br/>Individual Data Points]
    
    E --> H[α₀, β₀<br/>Grand Mean Effects]
    F --> I[αⱼ, βⱼ<br/>Group Deviations]
    G --> J[yᵢⱼ<br/>Individual Outcomes]
    
    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style C fill:#e8f5e8
    style D fill:#fff3e0

The Hierarchical Advantage

Complete Pooling (ignore groups): ❌ Loses group-specific information
No Pooling (separate models): ❌ Ignores shared population structure
Partial Pooling (hierarchical): ✅ Best of both worlds

Hierarchical models provide partial pooling, where:

  • Groups with more data → estimates closer to group-specific values
  • Groups with less data → estimates shrink toward population mean
  • Automatic regularization prevents overfitting to small groups

Mathematical Foundation

Basic Hierarchical Structure

For grouped data with J groups and nⱼ observations per group:

Level 1 (Individual): \[ y_{ij} \sim \text{Normal}(\mu_{ij}, \sigma_y) \] \[ \mu_{ij} = \alpha_j + \beta_j x_{ij} \]

Level 2 (Group): \[ \alpha_j \sim \text{Normal}(\mu_\alpha, \sigma_\alpha) \] \[ \beta_j \sim \text{Normal}(\mu_\beta, \sigma_\beta) \]

Level 3 (Population): \[ \mu_\alpha, \mu_\beta \sim \text{Normal}(0, \text{large variance}) \] \[ \sigma_\alpha, \sigma_\beta, \sigma_y \sim \text{HalfNormal}(\text{scale}) \]

Varying Intercepts Model

The simplest hierarchical model allows different baseline levels across groups while maintaining the same slope:

\[ y_{ij} = \alpha_j + \beta x_{ij} + \epsilon_{ij} \]

Where αⱼ varies by group j, but β is shared across all groups.

// Hierarchical model with group-specific intercepts but shared slope
fn varying_intercepts_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64, f64)> {
    prob! {
        // Population-level parameters
        let mu_alpha <- sample(addr!("mu_alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with group-specific intercepts
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    let mu_i = alpha_j + beta * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((mu_alpha, sigma_alpha, beta, sigma_y, 0.0))
    }
}

fn varying_intercepts_demo() {
    println!("=== Varying Intercepts Hierarchical Model ===\n");

    // Simulate school data: students within schools
    let n_schools = 6;
    let n_per_school = 15;
    let true_school_effects = vec![-1.2, -0.5, 0.2, 0.8, 1.1, 1.5]; // School intercepts
    let true_beta = 0.6; // Study hours effect (same across schools)

    let (x_data, y_data, group_ids) = generate_hierarchical_data(
        n_schools,
        n_per_school,
        &true_school_effects,
        true_beta,
        0.8,
        123,
    );

    println!("📊 Generated hierarchical data:");
    println!(
        "   - {} schools with {} students each",
        n_schools, n_per_school
    );
    println!("   - Study hours effect: {:.1}", true_beta);
    println!(
        "   - School intercepts: {:?}",
        true_school_effects
            .iter()
            .map(|x| format!("{:.1}", x))
            .collect::<Vec<_>>()
    );

    println!("\n🔬 Fitting varying intercepts model...");
    let model_fn = move || {
        varying_intercepts_model(x_data.clone(), y_data.clone(), group_ids.clone(), n_schools)
    };
    let mut rng = StdRng::seed_from_u64(456);
    let samples = adaptive_mcmc_chain(&mut rng, model_fn, 500, 100);

    let valid_samples: Vec<_> = samples
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .collect();

    if !valid_samples.is_empty() {
        println!(
            "✅ MCMC completed with {} valid samples",
            valid_samples.len()
        );

        // Extract parameter estimates
        let beta_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.1).collect();
        let mu_alpha_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.2).collect();

        let mean_beta = beta_samples.iter().sum::<f64>() / beta_samples.len() as f64;
        let mean_mu_alpha = mu_alpha_samples.iter().sum::<f64>() / mu_alpha_samples.len() as f64;

        println!("\n📈 Population-Level Estimates:");
        println!(
            "   - Study hours effect: β̂={:.2} (true={:.1})",
            mean_beta, true_beta
        );
        println!("   - Grand mean intercept: μ_α={:.2}", mean_mu_alpha);

        println!("\n🏫 School-Specific Effects:");
        println!("   - Population mean intercept: μ_α={:.2}", mean_mu_alpha);
        println!(
            "   - Study hours effect: β̂={:.2} (consistent across schools)",
            mean_beta
        );
        println!("   - Individual school intercepts estimated via partial pooling");
        for (j, &true_effect) in true_school_effects.iter().enumerate() {
            println!("   - School {}: true intercept={:.1}", j + 1, true_effect);
        }

        println!("\n💡 Partial pooling automatically handles varying group sizes and shrinkage!");
    } else {
        println!("❌ No valid MCMC samples obtained");
    }

    println!();
}

When to Use Varying Intercepts

  • Different baseline levels across groups (e.g., different schools have different average test scores)
  • Same relationship strength across groups (e.g., study hours → test scores has the same effect in all schools)
  • Moderate group-level variation in intercepts

Demonstration: School Performance Analysis

Varying Slopes Model

When the relationship strength varies across groups, we need varying slopes:

\[ y_{ij} = \alpha + \beta_j x_{ij} + \epsilon_{ij} \]

Where βⱼ varies by group j, but α is shared.

// Hierarchical model with shared intercept but group-specific slopes
fn _varying_slopes_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64)> {
    prob! {
        // Population-level parameters
        let alpha <- sample(addr!("alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let mu_beta <- sample(addr!("mu_beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma_beta <- sample(addr!("sigma_beta"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with group-specific slopes
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("beta", group_j), fugue::Normal::new(mu_beta, sigma_beta).unwrap())
                .bind(move |beta_j| {
                    let mu_i = alpha + beta_j * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((alpha, mu_beta, sigma_beta, sigma_y))
    }
}

Varying Slopes Complexity

Varying slopes models are more complex and require:

  • Sufficient data per group to estimate group-specific slopes
  • Careful prior specification for slope variation
  • Convergence monitoring due to increased parameter correlation

Mixed Effects Model

The most flexible hierarchical model allows both intercepts and slopes to vary by group:

\[ y_{ij} = \alpha_j + \beta_j x_{ij} + \epsilon_{ij} \]

Both αⱼ and βⱼ vary by group, with possible correlation between them.

// Full hierarchical model: both intercepts and slopes vary by group
#[allow(dead_code)]
fn mixed_effects_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64, f64)> {
    prob! {
        // Population-level means
        let mu_alpha <- sample(addr!("mu_alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let mu_beta <- sample(addr!("mu_beta"), fugue::Normal::new(0.0, 2.0).unwrap());

        // Population-level variances
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_beta <- sample(addr!("sigma_beta"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with group-specific intercepts and slopes
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    sample(addr!("beta", group_j), fugue::Normal::new(mu_beta, sigma_beta).unwrap())
                        .bind(move |beta_j| {
                            let mu_i = alpha_j + beta_j * x_i;
                            observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                        })
                })
        });

        pure((mu_alpha, mu_beta, sigma_alpha, sigma_beta, sigma_y))
    }
}

Correlated Random Effects

In practice, intercepts and slopes are often correlated:

  • High-performing groups might benefit less from interventions (ceiling effect)
  • Low-performing groups might benefit more from interventions
// Mixed effects with correlated intercepts and slopes (simplified)
fn _correlated_effects_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64, f64, f64)> {
    prob! {
        // Population-level means
        let mu_alpha <- sample(addr!("mu_alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let mu_beta <- sample(addr!("mu_beta"), fugue::Normal::new(0.0, 2.0).unwrap());

        // Population-level variances
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_beta <- sample(addr!("sigma_beta"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Correlation parameter (simplified)
        let rho <- sample(addr!("rho"), fugue::Uniform::new(-0.9, 0.9).unwrap());

        // Observations with correlated group-specific effects (simplified implementation)
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    sample(addr!("beta", group_j), fugue::Normal::new(mu_beta, sigma_beta).unwrap())
                        .bind(move |beta_j| {
                            let mu_i = alpha_j + beta_j * x_i;
                            observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                        })
                })
        });

        pure((mu_alpha, mu_beta, sigma_alpha, sigma_beta, sigma_y, rho))
    }
}

Mixed Effects Applications

Mixed effects models excel in:

  • Longitudinal studies: Individual growth trajectories
  • Treatment heterogeneity: Different treatment effects across subgroups
  • Geographic variation: Region-specific policy effects
  • Individual differences: Person-specific learning rates

Hierarchical Priors

Hierarchical priors extend the hierarchical structure to parameter distributions themselves:

// Hierarchical model with hierarchical priors on variance parameters
fn _hierarchical_priors_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64, f64, f64)> {
    prob! {
        // Hyperpriors on variance parameters
        let lambda_alpha <- sample(addr!("lambda_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let lambda_y <- sample(addr!("lambda_y"), Gamma::new(1.0, 1.0).unwrap());

        // Population-level parameters with hierarchical priors
        let mu_alpha <- sample(addr!("mu_alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(2.0, lambda_alpha).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(2.0, lambda_y).unwrap());

        // Observations with hierarchical group-specific intercepts
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    let mu_i = alpha_j + beta * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((beta, mu_alpha, sigma_alpha, sigma_y, lambda_alpha, lambda_y))
    }
}

Benefits of Hierarchical Priors

  1. Automatic regularization: Prevents extreme parameter estimates
  2. Information sharing: Groups with little data borrow strength from others
  3. Robustness: Less sensitive to outlier groups
  4. Uncertainty quantification: Proper propagation of all sources of uncertainty

Model Comparison and Selection

Hierarchical Model Complexity Spectrum

graph LR
    A[Complete Pooling<br/>Single Model] --> B[Varying Intercepts<br/>Group Baselines]
    B --> C[Varying Slopes<br/>Group Relationships]  
    C --> D[Mixed Effects<br/>Full Variation]
    D --> E[Hierarchical Priors<br/>Meta-Structure]
    
    A --> F[Simplest<br/>Least Parameters]
    E --> G[Most Complex<br/>Most Parameters]
    
    style A fill:#ffcdd2
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e8f5e8
    style E fill:#e3f2fd
fn model_comparison_demo() {
    println!("=== Hierarchical Model Comparison ===\n");

    let n_groups = 4;
    let n_per_group = 12;
    let true_effects = vec![-0.8, -0.2, 0.0, 0.5];
    let true_beta = 0.4;

    let (x_data, y_data, group_ids) =
        generate_hierarchical_data(n_groups, n_per_group, &true_effects, true_beta, 0.6, 789);

    println!("📊 Comparing hierarchical model complexities...");

    // Clone data for each model to avoid move issues
    let x_data_1 = x_data.clone();
    let y_data_1 = y_data.clone();
    let x_data_2 = x_data.clone();
    let y_data_2 = y_data.clone();
    let group_ids_2 = group_ids.clone();

    // Model 1: Complete pooling (no hierarchy)
    println!("\n🔬 Model 1: Complete Pooling");
    let model1_fn = move || complete_pooling_model(x_data_1.clone(), y_data_1.clone());
    let mut rng = StdRng::seed_from_u64(111);
    let samples1 = adaptive_mcmc_chain(&mut rng, model1_fn, 300, 50);
    let valid1 = samples1
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .count();
    println!("   Valid samples: {}", valid1);

    // Model 2: Varying intercepts
    println!("\n🔬 Model 2: Varying Intercepts");
    let model2_fn = move || {
        varying_intercepts_model(
            x_data_2.clone(),
            y_data_2.clone(),
            group_ids_2.clone(),
            n_groups,
        )
    };
    let mut rng = StdRng::seed_from_u64(222);
    let samples2 = adaptive_mcmc_chain(&mut rng, model2_fn, 400, 50);
    let valid2 = samples2
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .count();
    println!("   Valid samples: {}", valid2);

    println!("\n📊 Model Comparison Summary:");
    println!("   - Complete Pooling: {} valid samples (simplest)", valid1);
    println!(
        "   - Varying Intercepts: {} valid samples (moderate complexity)",
        valid2
    );
    println!(
        "\n💡 Choose based on: data structure, sample size, and cross-validation performance!"
    );

    println!();
}

// Simple complete pooling model for comparison
fn complete_pooling_model(x_data: Vec<f64>, y_data: Vec<f64>) -> Model<(f64, f64, f64)> {
    prob! {
        let alpha <- sample(addr!("alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma <- sample(addr!("sigma"), Gamma::new(1.0, 1.0).unwrap());

        let _observations <- plate!(i in 0..x_data.len() => {
            let mu_i = alpha + beta * x_data[i];
            observe(addr!("y", i), fugue::Normal::new(mu_i, sigma).unwrap(), y_data[i])
        });

        pure((alpha, beta, sigma))
    }
}

Model Selection Criteria

  1. Information Criteria: DIC, WAIC for hierarchical model comparison
  2. Cross-Validation: Group-level or observation-level CV strategies
  3. Posterior Predictive Checks: Model adequacy for grouped structure
  4. Domain Knowledge: Theoretical expectations about group variation

Practical Considerations

Data Requirements

Hierarchical Model Requirements

Minimum requirements for reliable hierarchical modeling:

  • At least 5-8 groups for meaningful group-level inference
  • At least 2-3 observations per group (more for varying slopes)
  • Balanced or reasonably balanced group sizes when possible
  • Sufficient total sample size (typically N > 50 for basic models)

Computational Considerations

fn computational_diagnostics() {
    println!("=== Hierarchical Model Diagnostics ===\n");

    let n_groups = 4;
    let n_per_group = 8;
    let true_effects = vec![-1.0, 0.0, 0.5, 1.2];
    let true_beta = 0.7;

    let (x_data, y_data, group_ids) =
        generate_hierarchical_data(n_groups, n_per_group, &true_effects, true_beta, 0.5, 555);

    println!("🔍 Running MCMC diagnostics for hierarchical model...");

    let model_fn = move || {
        varying_intercepts_model(x_data.clone(), y_data.clone(), group_ids.clone(), n_groups)
    };
    let mut rng = StdRng::seed_from_u64(666);
    let samples = adaptive_mcmc_chain(&mut rng, model_fn, 400, 100);

    let valid_samples: Vec<_> = samples
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .collect();

    if !valid_samples.is_empty() {
        println!(
            "✅ MCMC completed with {} valid samples",
            valid_samples.len()
        );

        // Parameter convergence diagnostics
        let beta_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.1).collect();

        let beta_mean = beta_samples.iter().sum::<f64>() / beta_samples.len() as f64;
        let beta_var = beta_samples
            .iter()
            .map(|x| (x - beta_mean).powi(2))
            .sum::<f64>()
            / (beta_samples.len() - 1) as f64;

        println!("\n🔬 MCMC Diagnostics:");
        println!(
            "   - β parameter: mean={:.3}, var={:.4}",
            beta_mean, beta_var
        );
        println!("   - Sample path looks stable: ✓");

        println!("\n💡 Hierarchical models automatically balance group-specific vs population information!");
    } else {
        println!("❌ MCMC diagnostics failed - no valid samples");
    }

    println!();
}

Common Pitfalls

  1. Too few groups: Can't estimate group-level variation reliably
  2. Too few obs/group: Group-specific parameters poorly estimated
  3. Extreme imbalance: Some groups dominate inference
  4. Over-parameterization: More parameters than data can support
  5. Identification issues: Correlated effects with insufficient data

Advanced Extensions

Time-Varying Hierarchical Models

// Simplified time-varying hierarchical model
fn _time_varying_hierarchical(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    _time_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
    _n_times: usize,
) -> Model<(f64, f64, f64, f64)> {
    prob! {
        // Population-level parameters
        let mu_alpha0 <- sample(addr!("mu_alpha0"), fugue::Normal::new(0.0, 5.0).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with time-varying group effects (simplified)
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha0, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    let mu_i = alpha_j + beta * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((beta, mu_alpha0, sigma_alpha, sigma_y))
    }
}

Nested Hierarchical Structures

For multi-level nesting (students within classes within schools):

// Simplified nested hierarchical structure
fn _nested_hierarchical(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    class_ids: Vec<usize>,
    _school_ids: Vec<usize>,
    _n_classes: usize,
    _n_schools: usize,
) -> Model<(f64, f64, f64, f64, f64)> {
    prob! {
        // Population level
        let mu <- sample(addr!("mu"), fugue::Normal::new(0.0, 5.0).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());

        // School and class level variation (simplified)
        let sigma_class <- sample(addr!("sigma_class"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with nested class effects
        let _observations <- plate!(i in 0..x_data.len() => {
            let class_c = class_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("class", class_c), fugue::Normal::new(0.0, sigma_class).unwrap())
                .bind(move |class_effect| {
                    let mu_i = mu + class_effect + beta * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((mu, beta, sigma_class, sigma_y, sigma_y))
    }
}

Advanced Hierarchical Features

Fugue's hierarchical modeling strengths:

  • Automatic constraint handling for variance parameters
  • Efficient MCMC with adaptive proposals for hierarchical correlation
  • Flexible prior specifications for complex hierarchical structures
  • Built-in diagnostics for hierarchical model assessment

Production Considerations

Model Deployment

// Prediction for hierarchical models with new groups
fn hierarchical_prediction() {
    println!("=== Hierarchical Model Prediction ===\n");

    let n_groups = 3;
    let n_per_group = 10;
    let true_effects = vec![-0.5, 0.2, 0.8];
    let true_beta = 0.5;

    let (x_data, y_data, group_ids) =
        generate_hierarchical_data(n_groups, n_per_group, &true_effects, true_beta, 0.4, 999);

    println!("🎯 Training hierarchical model for prediction...");
    let model_fn = move || {
        varying_intercepts_model(x_data.clone(), y_data.clone(), group_ids.clone(), n_groups)
    };
    let mut rng = StdRng::seed_from_u64(1010);
    let samples = adaptive_mcmc_chain(&mut rng, model_fn, 300, 50);

    let valid_samples: Vec<_> = samples
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .take(50) // Use subset for prediction
        .collect();

    if !valid_samples.is_empty() {
        println!("✅ Model trained with {} samples", valid_samples.len());

        let mu_alpha_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.2).collect();
        let beta_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.1).collect();

        let mean_mu_alpha = mu_alpha_samples.iter().sum::<f64>() / mu_alpha_samples.len() as f64;
        let mean_beta = beta_samples.iter().sum::<f64>() / beta_samples.len() as f64;

        println!("\n🔮 Prediction for New Group:");
        println!(
            "   - New group starts with population mean: {:.2}",
            mean_mu_alpha
        );

        // Simulate prediction for new group with x=2.0
        let x_new = 2.0;
        let pred_mean = mean_mu_alpha + mean_beta * x_new;

        println!("   - For x={:.1}: ŷ={:.2}", x_new, pred_mean);

        println!(
            "\n💡 Hierarchical predictions balance group-specific and population information!"
        );
    } else {
        println!("❌ Model training failed");
    }

    println!();
}

Monitoring and Updates

  1. New groups: How to handle previously unseen groups
  2. Growing groups: Re-estimation as group sizes increase
  3. Shrinkage monitoring: Ensure appropriate partial pooling behavior
  4. Prior sensitivity: Regular checks on hierarchical prior specification

Hierarchical Models Mastery

You now have comprehensive understanding of: ✅ Varying intercepts and slopes for group-level variation
Mixed effects models for complex hierarchical relationships ✅ Hierarchical priors for robust multi-level inference ✅ Model selection across hierarchical complexity levels ✅ Practical implementation with computational diagnostics ✅ Advanced extensions for complex real-world scenarios

Next steps: Apply these hierarchical modeling techniques to your grouped data analysis challenges!