Hierarchical Models

Contents

This tutorial covers Bayesian hierarchical modeling using Fugue:

Varying Intercepts: Group-level intercept variation
Varying Slopes: Group-level slope variation
Mixed Effects: Combined random and fixed effects
Hierarchical Priors: Multi-level parameter structures
Model Selection: Comparing hierarchical complexity
Practical Applications: Real-world hierarchical data analysis

Learning Objectives

After completing this tutorial, you will be able to:

Model grouped/clustered data with hierarchical structures
Implement varying intercepts and slopes models
Use mixed effects for complex data relationships
Apply hierarchical priors for robust parameter estimation
Perform model selection across hierarchical complexity levels
Handle partial pooling vs complete pooling trade-offs

Introduction

Hierarchical models (also called multi-level or mixed-effects models) are essential for analyzing grouped or clustered data where observations within groups are more similar to each other than to observations in other groups. Examples include:

Students within schools: Academic performance varies by student and school
Patients within hospitals: Treatment outcomes depend on individual and hospital factors
Measurements over time: Repeated measures on the same subjects
Geographic clustering: Economic indicators within regions/countries

graph TD
    A[Hierarchical Models Framework] --> B[Population Level]
    A --> C[Group Level] 
    A --> D[Individual Level]
    
    B --> E[Fixed Effects<br/>Population Parameters]
    C --> F[Random Effects<br/>Group-Specific Parameters]
    D --> G[Observations<br/>Individual Data Points]
    
    E --> H[α₀, β₀<br/>Grand Mean Effects]
    F --> I[αⱼ, βⱼ<br/>Group Deviations]
    G --> J[yᵢⱼ<br/>Individual Outcomes]
    
    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style C fill:#e8f5e8
    style D fill:#fff3e0

The Hierarchical Advantage

Complete Pooling (ignore groups): ❌ Loses group-specific information
No Pooling (separate models): ❌ Ignores shared population structure
Partial Pooling (hierarchical): ✅ Best of both worlds

Hierarchical models provide partial pooling, where:

Groups with more data → estimates closer to group-specific values
Groups with less data → estimates shrink toward population mean
Automatic regularization prevents overfitting to small groups

Mathematical Foundation

Basic Hierarchical Structure

For grouped data with J groups and nⱼ observations per group:

Level 1 (Individual): \[ y_{ij} \sim \text{Normal}(\mu_{ij}, \sigma_y) \] \[ \mu_{ij} = \alpha_j + \beta_j x_{ij} \]

Level 2 (Group): \[ \alpha_j \sim \text{Normal}(\mu_\alpha, \sigma_\alpha) \] \[ \beta_j \sim \text{Normal}(\mu_\beta, \sigma_\beta) \]

Level 3 (Population): \[ \mu_\alpha, \mu_\beta \sim \text{Normal}(0, \text{large variance}) \] \[ \sigma_\alpha, \sigma_\beta, \sigma_y \sim \text{HalfNormal}(\text{scale}) \]

Varying Intercepts Model

The simplest hierarchical model allows different baseline levels across groups while maintaining the same slope:

\[ y_{ij} = \alpha_j + \beta x_{ij} + \epsilon_{ij} \]

Where αⱼ varies by group j, but β is shared across all groups.

// Hierarchical model with group-specific intercepts but shared slope
fn varying_intercepts_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64, f64)> {
    prob! {
        // Population-level parameters
        let mu_alpha <- sample(addr!("mu_alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with group-specific intercepts
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    let mu_i = alpha_j + beta * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((mu_alpha, sigma_alpha, beta, sigma_y, 0.0))
    }
}

fn varying_intercepts_demo() {
    println!("=== Varying Intercepts Hierarchical Model ===\n");

    // Simulate school data: students within schools
    let n_schools = 6;
    let n_per_school = 15;
    let true_school_effects = vec![-1.2, -0.5, 0.2, 0.8, 1.1, 1.5]; // School intercepts
    let true_beta = 0.6; // Study hours effect (same across schools)

    let (x_data, y_data, group_ids) = generate_hierarchical_data(
        n_schools,
        n_per_school,
        &true_school_effects,
        true_beta,
        0.8,
        123,
    );

    println!("📊 Generated hierarchical data:");
    println!(
        "   - {} schools with {} students each",
        n_schools, n_per_school
    );
    println!("   - Study hours effect: {:.1}", true_beta);
    println!(
        "   - School intercepts: {:?}",
        true_school_effects
            .iter()
            .map(|x| format!("{:.1}", x))
            .collect::<Vec<_>>()
    );

    println!("\n🔬 Fitting varying intercepts model...");
    let model_fn = move || {
        varying_intercepts_model(x_data.clone(), y_data.clone(), group_ids.clone(), n_schools)
    };
    let mut rng = StdRng::seed_from_u64(456);
    let samples = adaptive_mcmc_chain(&mut rng, model_fn, 500, 100);

    let valid_samples: Vec<_> = samples
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .collect();

    if !valid_samples.is_empty() {
        println!(
            "✅ MCMC completed with {} valid samples",
            valid_samples.len()
        );

        // Extract parameter estimates
        let beta_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.1).collect();
        let mu_alpha_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.2).collect();

        let mean_beta = beta_samples.iter().sum::<f64>() / beta_samples.len() as f64;
        let mean_mu_alpha = mu_alpha_samples.iter().sum::<f64>() / mu_alpha_samples.len() as f64;

        println!("\n📈 Population-Level Estimates:");
        println!(
            "   - Study hours effect: β̂={:.2} (true={:.1})",
            mean_beta, true_beta
        );
        println!("   - Grand mean intercept: μ_α={:.2}", mean_mu_alpha);

        println!("\n🏫 School-Specific Effects:");
        println!("   - Population mean intercept: μ_α={:.2}", mean_mu_alpha);
        println!(
            "   - Study hours effect: β̂={:.2} (consistent across schools)",
            mean_beta
        );
        println!("   - Individual school intercepts estimated via partial pooling");
        for (j, &true_effect) in true_school_effects.iter().enumerate() {
            println!("   - School {}: true intercept={:.1}", j + 1, true_effect);
        }

        println!("\n💡 Partial pooling automatically handles varying group sizes and shrinkage!");
    } else {
        println!("❌ No valid MCMC samples obtained");
    }

    println!();
}

When to Use Varying Intercepts

Different baseline levels across groups (e.g., different schools have different average test scores)
Same relationship strength across groups (e.g., study hours → test scores has the same effect in all schools)
Moderate group-level variation in intercepts

Demonstration: School Performance Analysis

Varying Slopes Model

When the relationship strength varies across groups, we need varying slopes:

\[ y_{ij} = \alpha + \beta_j x_{ij} + \epsilon_{ij} \]

Where βⱼ varies by group j, but α is shared.

// Hierarchical model with shared intercept but group-specific slopes
fn _varying_slopes_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64)> {
    prob! {
        // Population-level parameters
        let alpha <- sample(addr!("alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let mu_beta <- sample(addr!("mu_beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma_beta <- sample(addr!("sigma_beta"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with group-specific slopes
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("beta", group_j), fugue::Normal::new(mu_beta, sigma_beta).unwrap())
                .bind(move |beta_j| {
                    let mu_i = alpha + beta_j * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((alpha, mu_beta, sigma_beta, sigma_y))
    }
}

Varying Slopes Complexity

Varying slopes models are more complex and require:

Sufficient data per group to estimate group-specific slopes
Careful prior specification for slope variation
Convergence monitoring due to increased parameter correlation

Mixed Effects Model

The most flexible hierarchical model allows both intercepts and slopes to vary by group:

\[ y_{ij} = \alpha_j + \beta_j x_{ij} + \epsilon_{ij} \]

Both αⱼ and βⱼ vary by group, with possible correlation between them.

// Full hierarchical model: both intercepts and slopes vary by group
#[allow(dead_code)]
fn mixed_effects_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64, f64)> {
    prob! {
        // Population-level means
        let mu_alpha <- sample(addr!("mu_alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let mu_beta <- sample(addr!("mu_beta"), fugue::Normal::new(0.0, 2.0).unwrap());

        // Population-level variances
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_beta <- sample(addr!("sigma_beta"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with group-specific intercepts and slopes
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    sample(addr!("beta", group_j), fugue::Normal::new(mu_beta, sigma_beta).unwrap())
                        .bind(move |beta_j| {
                            let mu_i = alpha_j + beta_j * x_i;
                            observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                        })
                })
        });

        pure((mu_alpha, mu_beta, sigma_alpha, sigma_beta, sigma_y))
    }
}

Correlated Random Effects

In practice, intercepts and slopes are often correlated:

High-performing groups might benefit less from interventions (ceiling effect)
Low-performing groups might benefit more from interventions

// Mixed effects with correlated intercepts and slopes (simplified)
fn _correlated_effects_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64, f64, f64)> {
    prob! {
        // Population-level means
        let mu_alpha <- sample(addr!("mu_alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let mu_beta <- sample(addr!("mu_beta"), fugue::Normal::new(0.0, 2.0).unwrap());

        // Population-level variances
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_beta <- sample(addr!("sigma_beta"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Correlation parameter (simplified)
        let rho <- sample(addr!("rho"), fugue::Uniform::new(-0.9, 0.9).unwrap());

        // Observations with correlated group-specific effects (simplified implementation)
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    sample(addr!("beta", group_j), fugue::Normal::new(mu_beta, sigma_beta).unwrap())
                        .bind(move |beta_j| {
                            let mu_i = alpha_j + beta_j * x_i;
                            observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                        })
                })
        });

        pure((mu_alpha, mu_beta, sigma_alpha, sigma_beta, sigma_y, rho))
    }
}

Mixed Effects Applications

Mixed effects models excel in:

Longitudinal studies: Individual growth trajectories
Treatment heterogeneity: Different treatment effects across subgroups
Geographic variation: Region-specific policy effects
Individual differences: Person-specific learning rates

Hierarchical Priors

Hierarchical priors extend the hierarchical structure to parameter distributions themselves:

// Hierarchical model with hierarchical priors on variance parameters
fn _hierarchical_priors_model(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
) -> Model<(f64, f64, f64, f64, f64, f64)> {
    prob! {
        // Hyperpriors on variance parameters
        let lambda_alpha <- sample(addr!("lambda_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let lambda_y <- sample(addr!("lambda_y"), Gamma::new(1.0, 1.0).unwrap());

        // Population-level parameters with hierarchical priors
        let mu_alpha <- sample(addr!("mu_alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(2.0, lambda_alpha).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(2.0, lambda_y).unwrap());

        // Observations with hierarchical group-specific intercepts
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    let mu_i = alpha_j + beta * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((beta, mu_alpha, sigma_alpha, sigma_y, lambda_alpha, lambda_y))
    }
}

Benefits of Hierarchical Priors

Automatic regularization: Prevents extreme parameter estimates
Information sharing: Groups with little data borrow strength from others
Robustness: Less sensitive to outlier groups
Uncertainty quantification: Proper propagation of all sources of uncertainty

Model Comparison and Selection

Hierarchical Model Complexity Spectrum

graph LR
    A[Complete Pooling<br/>Single Model] --> B[Varying Intercepts<br/>Group Baselines]
    B --> C[Varying Slopes<br/>Group Relationships]  
    C --> D[Mixed Effects<br/>Full Variation]
    D --> E[Hierarchical Priors<br/>Meta-Structure]
    
    A --> F[Simplest<br/>Least Parameters]
    E --> G[Most Complex<br/>Most Parameters]
    
    style A fill:#ffcdd2
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e8f5e8
    style E fill:#e3f2fd

fn model_comparison_demo() {
    println!("=== Hierarchical Model Comparison ===\n");

    let n_groups = 4;
    let n_per_group = 12;
    let true_effects = vec![-0.8, -0.2, 0.0, 0.5];
    let true_beta = 0.4;

    let (x_data, y_data, group_ids) =
        generate_hierarchical_data(n_groups, n_per_group, &true_effects, true_beta, 0.6, 789);

    println!("📊 Comparing hierarchical model complexities...");

    // Clone data for each model to avoid move issues
    let x_data_1 = x_data.clone();
    let y_data_1 = y_data.clone();
    let x_data_2 = x_data.clone();
    let y_data_2 = y_data.clone();
    let group_ids_2 = group_ids.clone();

    // Model 1: Complete pooling (no hierarchy)
    println!("\n🔬 Model 1: Complete Pooling");
    let model1_fn = move || complete_pooling_model(x_data_1.clone(), y_data_1.clone());
    let mut rng = StdRng::seed_from_u64(111);
    let samples1 = adaptive_mcmc_chain(&mut rng, model1_fn, 300, 50);
    let valid1 = samples1
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .count();
    println!("   Valid samples: {}", valid1);

    // Model 2: Varying intercepts
    println!("\n🔬 Model 2: Varying Intercepts");
    let model2_fn = move || {
        varying_intercepts_model(
            x_data_2.clone(),
            y_data_2.clone(),
            group_ids_2.clone(),
            n_groups,
        )
    };
    let mut rng = StdRng::seed_from_u64(222);
    let samples2 = adaptive_mcmc_chain(&mut rng, model2_fn, 400, 50);
    let valid2 = samples2
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .count();
    println!("   Valid samples: {}", valid2);

    println!("\n📊 Model Comparison Summary:");
    println!("   - Complete Pooling: {} valid samples (simplest)", valid1);
    println!(
        "   - Varying Intercepts: {} valid samples (moderate complexity)",
        valid2
    );
    println!(
        "\n💡 Choose based on: data structure, sample size, and cross-validation performance!"
    );

    println!();
}

// Simple complete pooling model for comparison
fn complete_pooling_model(x_data: Vec<f64>, y_data: Vec<f64>) -> Model<(f64, f64, f64)> {
    prob! {
        let alpha <- sample(addr!("alpha"), fugue::Normal::new(0.0, 5.0).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma <- sample(addr!("sigma"), Gamma::new(1.0, 1.0).unwrap());

        let _observations <- plate!(i in 0..x_data.len() => {
            let mu_i = alpha + beta * x_data[i];
            observe(addr!("y", i), fugue::Normal::new(mu_i, sigma).unwrap(), y_data[i])
        });

        pure((alpha, beta, sigma))
    }
}

Model Selection Criteria

Information Criteria: DIC, WAIC for hierarchical model comparison
Cross-Validation: Group-level or observation-level CV strategies
Posterior Predictive Checks: Model adequacy for grouped structure
Domain Knowledge: Theoretical expectations about group variation

Practical Considerations

Data Requirements

Hierarchical Model Requirements

Minimum requirements for reliable hierarchical modeling:

At least 5-8 groups for meaningful group-level inference
At least 2-3 observations per group (more for varying slopes)
Balanced or reasonably balanced group sizes when possible
Sufficient total sample size (typically N > 50 for basic models)

Computational Considerations

fn computational_diagnostics() {
    println!("=== Hierarchical Model Diagnostics ===\n");

    let n_groups = 4;
    let n_per_group = 8;
    let true_effects = vec![-1.0, 0.0, 0.5, 1.2];
    let true_beta = 0.7;

    let (x_data, y_data, group_ids) =
        generate_hierarchical_data(n_groups, n_per_group, &true_effects, true_beta, 0.5, 555);

    println!("🔍 Running MCMC diagnostics for hierarchical model...");

    let model_fn = move || {
        varying_intercepts_model(x_data.clone(), y_data.clone(), group_ids.clone(), n_groups)
    };
    let mut rng = StdRng::seed_from_u64(666);
    let samples = adaptive_mcmc_chain(&mut rng, model_fn, 400, 100);

    let valid_samples: Vec<_> = samples
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .collect();

    if !valid_samples.is_empty() {
        println!(
            "✅ MCMC completed with {} valid samples",
            valid_samples.len()
        );

        // Parameter convergence diagnostics
        let beta_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.1).collect();

        let beta_mean = beta_samples.iter().sum::<f64>() / beta_samples.len() as f64;
        let beta_var = beta_samples
            .iter()
            .map(|x| (x - beta_mean).powi(2))
            .sum::<f64>()
            / (beta_samples.len() - 1) as f64;

        println!("\n🔬 MCMC Diagnostics:");
        println!(
            "   - β parameter: mean={:.3}, var={:.4}",
            beta_mean, beta_var
        );
        println!("   - Sample path looks stable: ✓");

        println!("\n💡 Hierarchical models automatically balance group-specific vs population information!");
    } else {
        println!("❌ MCMC diagnostics failed - no valid samples");
    }

    println!();
}

Common Pitfalls

Too few groups: Can't estimate group-level variation reliably
Too few obs/group: Group-specific parameters poorly estimated
Extreme imbalance: Some groups dominate inference
Over-parameterization: More parameters than data can support
Identification issues: Correlated effects with insufficient data

Advanced Extensions

Time-Varying Hierarchical Models

// Simplified time-varying hierarchical model
fn _time_varying_hierarchical(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    _time_data: Vec<f64>,
    group_ids: Vec<usize>,
    _n_groups: usize,
    _n_times: usize,
) -> Model<(f64, f64, f64, f64)> {
    prob! {
        // Population-level parameters
        let mu_alpha0 <- sample(addr!("mu_alpha0"), fugue::Normal::new(0.0, 5.0).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());
        let sigma_alpha <- sample(addr!("sigma_alpha"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with time-varying group effects (simplified)
        let _observations <- plate!(i in 0..x_data.len() => {
            let group_j = group_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("alpha", group_j), fugue::Normal::new(mu_alpha0, sigma_alpha).unwrap())
                .bind(move |alpha_j| {
                    let mu_i = alpha_j + beta * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((beta, mu_alpha0, sigma_alpha, sigma_y))
    }
}

Nested Hierarchical Structures

For multi-level nesting (students within classes within schools):

// Simplified nested hierarchical structure
fn _nested_hierarchical(
    x_data: Vec<f64>,
    y_data: Vec<f64>,
    class_ids: Vec<usize>,
    _school_ids: Vec<usize>,
    _n_classes: usize,
    _n_schools: usize,
) -> Model<(f64, f64, f64, f64, f64)> {
    prob! {
        // Population level
        let mu <- sample(addr!("mu"), fugue::Normal::new(0.0, 5.0).unwrap());
        let beta <- sample(addr!("beta"), fugue::Normal::new(0.0, 2.0).unwrap());

        // School and class level variation (simplified)
        let sigma_class <- sample(addr!("sigma_class"), Gamma::new(1.0, 1.0).unwrap());
        let sigma_y <- sample(addr!("sigma_y"), Gamma::new(1.0, 1.0).unwrap());

        // Observations with nested class effects
        let _observations <- plate!(i in 0..x_data.len() => {
            let class_c = class_ids[i];
            let x_i = x_data[i];
            let y_i = y_data[i];
            sample(addr!("class", class_c), fugue::Normal::new(0.0, sigma_class).unwrap())
                .bind(move |class_effect| {
                    let mu_i = mu + class_effect + beta * x_i;
                    observe(addr!("y", i), fugue::Normal::new(mu_i, sigma_y).unwrap(), y_i)
                })
        });

        pure((mu, beta, sigma_class, sigma_y, sigma_y))
    }
}

Advanced Hierarchical Features

Fugue's hierarchical modeling strengths:

Automatic constraint handling for variance parameters
Efficient MCMC with adaptive proposals for hierarchical correlation
Flexible prior specifications for complex hierarchical structures
Built-in diagnostics for hierarchical model assessment

Production Considerations

Model Deployment

// Prediction for hierarchical models with new groups
fn hierarchical_prediction() {
    println!("=== Hierarchical Model Prediction ===\n");

    let n_groups = 3;
    let n_per_group = 10;
    let true_effects = vec![-0.5, 0.2, 0.8];
    let true_beta = 0.5;

    let (x_data, y_data, group_ids) =
        generate_hierarchical_data(n_groups, n_per_group, &true_effects, true_beta, 0.4, 999);

    println!("🎯 Training hierarchical model for prediction...");
    let model_fn = move || {
        varying_intercepts_model(x_data.clone(), y_data.clone(), group_ids.clone(), n_groups)
    };
    let mut rng = StdRng::seed_from_u64(1010);
    let samples = adaptive_mcmc_chain(&mut rng, model_fn, 300, 50);

    let valid_samples: Vec<_> = samples
        .iter()
        .filter(|(_, trace)| trace.total_log_weight().is_finite())
        .take(50) // Use subset for prediction
        .collect();

    if !valid_samples.is_empty() {
        println!("✅ Model trained with {} samples", valid_samples.len());

        let mu_alpha_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.2).collect();
        let beta_samples: Vec<f64> = valid_samples.iter().map(|(params, _)| params.1).collect();

        let mean_mu_alpha = mu_alpha_samples.iter().sum::<f64>() / mu_alpha_samples.len() as f64;
        let mean_beta = beta_samples.iter().sum::<f64>() / beta_samples.len() as f64;

        println!("\n🔮 Prediction for New Group:");
        println!(
            "   - New group starts with population mean: {:.2}",
            mean_mu_alpha
        );

        // Simulate prediction for new group with x=2.0
        let x_new = 2.0;
        let pred_mean = mean_mu_alpha + mean_beta * x_new;

        println!("   - For x={:.1}: ŷ={:.2}", x_new, pred_mean);

        println!(
            "\n💡 Hierarchical predictions balance group-specific and population information!"
        );
    } else {
        println!("❌ Model training failed");
    }

    println!();
}

Monitoring and Updates

New groups: How to handle previously unseen groups
Growing groups: Re-estimation as group sizes increase
Shrinkage monitoring: Ensure appropriate partial pooling behavior
Prior sensitivity: Regular checks on hierarchical prior specification

Hierarchical Models Mastery

You now have comprehensive understanding of: ✅ Varying intercepts and slopes for group-level variation
✅ Mixed effects models for complex hierarchical relationships ✅ Hierarchical priors for robust multi-level inference ✅ Model selection across hierarchical complexity levels ✅ Practical implementation with computational diagnostics ✅ Advanced extensions for complex real-world scenarios

Next steps: Apply these hierarchical modeling techniques to your grouped data analysis challenges!

Keyboard shortcuts

Fugue Docs