Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Working with Distributions

Fugue's type-safe distribution system represents a principled approach to probabilistic programming, eliminating entire classes of runtime errors through rigorous type theory while preserving the full expressiveness of statistical modeling. This guide demonstrates the mathematical foundations and practical applications of Fugue's distribution architecture.

Type Theory Foundation

Fugue's distribution system is grounded in dependent type theory, where each distribution is parameterized not just by its parameters , but by its support type . This ensures that and eliminates the need for runtime type checking or unsafe casting operations.

Type Safety in Practice

Traditional probabilistic programming libraries return f64 for everything, leading to casting overhead and runtime errors. Fugue distributions return their natural types:

    // Demonstrate natural return types
    let coin = Bernoulli::new(0.5).unwrap();
    let flip: bool = coin.sample(&mut rng); // Natural boolean

    if flip {
        println!("🪙 Coin flip result: Heads!");
    } else {
        println!("🪙 Coin flip result: Tails!");
    }

No casting, no comparisons with floating-point values—just natural boolean logic.

Continuous Distributions

Continuous distributions in Fugue model phenomena over uncountable domains . The probability density function satisfies the normalization condition:

For computational stability, Fugue operates in log-space by default, computing to avoid numerical underflow:

Log-Space Computation

Working directly with densities can cause severe numerical issues when . Fugue's log_prob() method computes , which remains numerically stable even for extreme tail probabilities.

    // Working with continuous distributions
    let standard_normal = Normal::new(0.0, 1.0).unwrap();
    let sample: f64 = standard_normal.sample(&mut rng);
    println!("📊 Standard normal sample: {:.3}", sample);

    // Compute log-probability density
    let log_density = standard_normal.log_prob(&0.0); // Peak of standard normal
    println!(
        "📈 Log-density at x=0: {:.3} (peak of standard normal)",
        log_density
    );

    // Custom parameters
    let measurement_model = Normal::new(10.0, 0.5).unwrap();
    let measurement = measurement_model.sample(&mut rng);
    println!("🔬 Sensor measurement (μ=10.0, σ=0.5): {:.3}", measurement);

Key Points:

  • sample() returns f64 for direct arithmetic
  • log_prob() computes log-density (avoids numerical underflow)
  • Parameter validation happens at construction time

Tip

Always work with log-probabilities for numerical stability. Only convert to regular probabilities when necessary for interpretation.

Discrete Distributions

Discrete distributions operate over countable support sets or finite sets. The probability mass function satisfies:

Fugue enforces this constraint at construction time and leverages natural integer types to eliminate precision loss from floating-point representation:

Integer Precision Preservation

Unlike floating-point representations that can introduce rounding errors, Fugue's native u64 and usize types preserve exact integer values. This is crucial for count data where must remain precisely representable.

    // Working with discrete distributions

    // Count data
    let event_rate = Poisson::new(3.0).unwrap();
    let count: u64 = event_rate.sample(&mut rng);
    println!("📅 Event count (λ=3.0): {} events", count);

    // Log-probability mass
    let prob_3_events = event_rate.log_prob(&3);
    println!(
        "🎯 Log-probability of exactly 3 events: {:.3}",
        prob_3_events
    );

    // Use counts directly in calculations
    let total_cost = count * 50; // Direct arithmetic with u64
    println!("💰 Total cost ({} events × $50): ${}", count, total_cost);

Benefits:

  • u64 counts support direct arithmetic without casting
  • No precision loss from floating-point representations
  • Natural integration with Rust's type system

Safe Categorical Sampling

Categorical distributions return usize for safe array indexing:

    // Safe categorical sampling
    let choices = vec![0.3, 0.5, 0.2]; // Three categories
    let categorical = Categorical::new(choices).unwrap();
    let selected: usize = categorical.sample(&mut rng);

    // Safe array indexing (no bounds checking needed)
    let options = ["Option A", "Option B", "Option C"];
    println!(
        "🎲 Categorical choice (weights: 0.3, 0.5, 0.2): {}",
        options[selected]
    );

    // Uniform categorical
    let uniform_choice = Categorical::uniform(5).unwrap();
    let idx: usize = uniform_choice.sample(&mut rng);
    println!("🎯 Uniform random index (0-4): {}", idx);

Note

The usize return type eliminates bounds checking errors—the sampled index is guaranteed to be valid for the probability vector length.

Parameter Validation

Fugue enforces mathematical constraints through compile-time and runtime validation. Each distribution family has a parameter space defining valid configurations:

graph TD
    A[Parameter Input θ] --> B{θ ∈ Θ?}
    B -->|Yes| C[Distribution Construction]
    B -->|No| D[ValidationError]
    C --> E[Type-Safe Sampling]
    D --> F[Early Failure Detection]

Constraint Examples:

  • Normal Distribution: requires
  • Beta Distribution: requires
  • Categorical Distribution: and
    // Distribution parameter validation

    // This will return an error
    match Normal::new(0.0, -1.0) {
        Ok(_) => println!("✅ Normal(μ=0.0, σ=-1.0) created successfully"),
        Err(e) => println!("❌ Normal(μ=0.0, σ=-1.0) failed: {:?}", e),
    }

    // Beta distribution parameters must be positive
    match Beta::new(0.0, 1.0) {
        Ok(_) => println!("✅ Beta(α=0.0, β=1.0) created successfully"),
        Err(e) => println!("❌ Beta(α=0.0, β=1.0) failed: {:?}", e),
    }

    // Poisson rate must be non-negative
    match Poisson::new(-1.0) {
        Ok(_) => println!("✅ Poisson(λ=-1.0) created successfully"),
        Err(e) => println!("❌ Poisson(λ=-1.0) failed: {:?}", e),
    }

Validation Rules:

  • Normal: σ > 0
  • Beta: α > 0, β > 0
  • Poisson: λ ≥ 0
  • Categorical: probabilities sum to 1, all non-negative

Storing Mixed Distributions

Use trait objects for collections of distributions with the same return type:

    // Storing different distributions together
    let continuous_dists: Vec<Box<dyn Distribution<f64>>> = vec![
        Normal::new(0.0, 1.0).unwrap().clone_box(),
        Beta::new(2.0, 5.0).unwrap().clone_box(),
        Uniform::new(-1.0, 1.0).unwrap().clone_box(),
    ];

    // Sample from each
    for (i, dist) in continuous_dists.iter().enumerate() {
        let sample = dist.sample(&mut rng);
        let dist_name = match i {
            0 => "Normal(0,1)",
            1 => "Beta(2,5)",
            2 => "Uniform(-1,1)",
            _ => "Unknown",
        };
        println!("📦 {} sample: {:.3}", dist_name, sample);
    }

This enables dynamic distribution selection and model composition patterns.

Practical Modeling Patterns

Common modeling scenarios demonstrate natural type usage:

    // Practical modeling examples

    // Model a sensor with noise
    let true_temperature = 20.5; // True value
    let sensor_noise = Normal::new(0.0, 0.2).unwrap(); // Measurement error
    let measured_temp = true_temperature + sensor_noise.sample(&mut rng);
    println!(
        "🌡️  True temperature: {:.2}°C → Measured: {:.2}°C",
        true_temperature, measured_temp
    );

    // Count model for arrivals
    let arrival_rate = Poisson::new(2.5).unwrap(); // 2.5 arrivals per hour
    let hourly_arrivals = arrival_rate.sample(&mut rng);
    println!(
        "🚪 Expected arrivals: 2.5/hour → Actual: {} arrivals",
        hourly_arrivals
    );

    // Decision model
    let decision_prob = 0.7;
    let decision = Bernoulli::new(decision_prob).unwrap();
    let will_buy = decision.sample(&mut rng);
    if will_buy {
        println!("🛒 Customer decision (p=0.7): Will make a purchase");
    } else {
        println!("🚶 Customer decision (p=0.7): Will not purchase");
    }

Each distribution serves its natural domain without artificial conversions.

Working with Log-Probabilities

Logarithmic probability computation is essential for numerical stability in probabilistic programming. Consider the log-sum-exp operation for computing:

where . This formulation prevents overflow when is large:

Numerical Stability Theorem

Direct computation of where will underflow to machine zero for moderate . Log-space computation of remains stable for arbitrarily small probabilities, preserving up to 15-17 digits of precision in IEEE 754 double precision.

    // Working with log-probabilities

    let normal = Normal::new(100.0, 15.0).unwrap();

    // Multiple observations
    let observations = vec![98.5, 102.1, 99.8, 101.5, 97.2];
    let mut total_log_prob = 0.0;

    println!(
        "📋 Evaluating {} observations under Normal(μ=100.0, σ=15.0):",
        observations.len()
    );
    for obs in &observations {
        let log_p = normal.log_prob(obs);
        total_log_prob += log_p;
        println!("   x={}: log P(x) = {:.3}", obs, log_p);
    }

    println!("🔢 Joint log-probability: {:.3}", total_log_prob);

    // Convert back to probability (be careful with underflow!)
    if total_log_prob > -700.0 {
        // Avoid underflow
        let probability = total_log_prob.exp();
        println!("📊 Joint probability: {:.2e}", probability);
    } else {
        println!("⚠️  Joint probability too small to represent as f64");
    }

Warning

Converting large negative log-probabilities back to regular probabilities can underflow to zero. Keep computations in log-space when possible.

Advanced Patterns

For complex modeling scenarios, see these patterns:

Hierarchical Models

    // Hierarchical prior structure
    let global_mean = Normal::new(0.0, 10.0).unwrap();
    let mu = global_mean.sample(&mut rng);

    let group_precision = Gamma::new(2.0, 0.5).unwrap();
    let tau = group_precision.sample(&mut rng);
    let sigma = (1.0 / tau).sqrt(); // Convert precision to std dev

    // Individual observations from hierarchical model
    let individual = Normal::new(mu, sigma).unwrap();
    let observation = individual.sample(&mut rng);

    println!("🌐 Global mean: {:.3}", mu);
    println!("📊 Group std dev: {:.3}", sigma);
    println!("👤 Individual observation: {:.3}", observation);

Mixture Components

    // Mixture model components
    let mixture_weights = vec![0.6, 0.3, 0.1];
    let component_selector = Categorical::new(mixture_weights).unwrap();
    let selected_component: usize = component_selector.sample(&mut rng);

    // Different components
    let components = [
        Normal::new(-2.0, 0.5).unwrap(),
        Normal::new(0.0, 1.0).unwrap(),
        Normal::new(3.0, 0.8).unwrap(),
    ];

    let sample = components[selected_component].sample(&mut rng);
    println!(
        "🎯 Selected component {}: sample = {:.3}",
        selected_component, sample
    );

Conjugate Priors

    // Beta-Bernoulli conjugacy
    let prior_alpha = 2.0;
    let prior_beta = 8.0;
    let prior = Beta::new(prior_alpha, prior_beta).unwrap();
    let p: f64 = prior.sample(&mut rng);

    // Simulate some trials
    let trials = 20;
    let mut successes = 0;
    let bernoulli = Bernoulli::new(p).unwrap();

    for _ in 0..trials {
        if bernoulli.sample(&mut rng) {
            successes += 1;
        }
    }

    // Posterior parameters (conjugate update)
    let posterior_alpha = prior_alpha + successes as f64;
    let posterior_beta = prior_beta + (trials - successes) as f64;
    let posterior = Beta::new(posterior_alpha, posterior_beta).unwrap();
    let updated_p = posterior.sample(&mut rng);

    println!("🎲 Prior p: {:.3}", p);
    println!("📈 Observed: {}/{} successes", successes, trials);
    println!("🔄 Posterior p: {:.3}", updated_p);

Testing Your Distributions

Always test distribution properties and parameter validation:

    #[test]
    fn test_distribution_properties() {
        let mut rng = thread_rng();

        // Test type safety
        let coin = Bernoulli::new(0.5).unwrap();
        let flip: bool = coin.sample(&mut rng);
        assert!(flip == true || flip == false); // Must be boolean

        // Test parameter validation
        assert!(Normal::new(0.0, -1.0).is_err());
        assert!(Beta::new(0.0, 1.0).is_err());
        assert!(Poisson::new(-1.0).is_err());

        // Test valid distributions
        let normal = Normal::new(0.0, 1.0).unwrap();
        let sample = normal.sample(&mut rng);
        let log_prob = normal.log_prob(&sample);
        assert!(log_prob.is_finite());
    }

Common Pitfalls

  1. Underflow in probability space: Use log-probabilities for accumulation
  2. Parameter validation: Check constructor errors, don't assume success
  3. Precision with counts: Use u64 return types directly, avoid f64 conversion
  4. Categorical indexing: Trust the usize return—it's guaranteed valid

Next Steps

The type-safe distribution system eliminates entire classes of runtime errors while making statistical code more readable and maintainable.