ForecastingFeatured Project

Call volume forecasting per operator

2024
Client:International Banking Group

Bayesian hierarchical model for predicting call volumes at the operator level, enabling data-driven workforce planning and staffing decisions.

Key result
From heuristic rules to uncertainty-aware forecasts

The challenge

A large international banking group operates a multi-channel customer service center handling thousands of inbound calls daily. Accurate call volume estimation at the operator level is critical to ensure adequate staffing and service levels, optimize workforce planning, avoid over- or under-utilization of agents, and control operational costs while maintaining customer satisfaction.

The challenge was to predict the number of calls handled per operator over a given time period, accounting for differences in schedules, experience, and operating conditions.

As a first baseline, a linear regression model was considered due to its simplicity and interpretability. However, exploratory data analysis quickly revealed structural issues: call volumes are discrete counts (0, 1, 2, …), not continuous; predictions from linear regression could be negative or non-integer, which is not meaningful in this context; and variance increased with the mean, violating linear regression assumptions. These observations indicated a model–data mismatch.

The solution

Given the nature of the target variable (call counts), the problem was reframed using a count-based probabilistic model. The approach started with Poisson regression as a natural framework for count data, using a log-link function to ensure strictly positive predictions. An exposure offset accounted for different working durations (e.g., hours worked per operator), and a hierarchical structure modeled systematic differences between operators while sharing information across the population.

The final solution was implemented as a Bayesian hierarchical count model using PyMC, allowing for operator-level random effects (experience, efficiency, behavioral differences), global effects from contextual variables (shift type, workload indicators, tenure), and full uncertainty quantification via posterior distributions.

When over-dispersion was detected, the model was extended to a Negative Binomial formulation, improving robustness and predictive performance. This approach aligns the statistical assumptions of the model with the real-world data-generating process.

Technical approach

  • Poisson regression as the foundation for count data, with log-link for strictly positive predictions
  • Exposure offset to account for varying operator working hours
  • Hierarchical structure to model operator-level effects while pooling information across the population
  • Bayesian inference via PyMC for full uncertainty quantification
  • Extension to Negative Binomial when over-dispersion was detected
  • Posterior predictive checks for model validation

Results

Prediction Quality
No invalid outputs
Eliminated negative or fractional call predictions
Distribution Fit
Improved alignment
Better match with observed call distributions
Low-Volume Stability
Hierarchical shrinkage
More stable forecasts for operators with sparse data
Decision Support
Uncertainty intervals
Transparent confidence bounds for planning

Overall impact

The model enabled operations and workforce teams to move from heuristic staffing rules to data-driven forecasts, simulate staffing scenarios under different schedules or demand assumptions, and better anticipate peak-load risk while avoiding unnecessary overstaffing. By matching the statistical model to the operational reality, the project demonstrated how sound modeling fundamentals can materially improve decision quality.

Key lessons

  • 1
    Sometimes the difference between a misleading model and a useful decision tool is not more complexity but choosing the right statistical framework for the data.
  • 2
    Count data requires count models: forcing continuous methods on discrete outcomes leads to invalid predictions.
  • 3
    Hierarchical modeling provides natural regularization for sparse data segments while still capturing individual differences.
  • 4
    Uncertainty quantification is not optional for operational decisions it enables risk-aware planning.

Tech stack

PythonPyMCBayesian InferencePoisson RegressionNegative BinomialPosterior Predictive Checks

Similar project?

Need help with a similar challenge? Let's discuss how I can help.

Get in Touch