Reliable maintenance of machine learning models

Hello!

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com

Maintaining ML models is NOTHING LIKE THIS

Maintaining ML models is never done

Both software and statistical products

What does performance mean?

My model is performing well!

👩🏼‍🔧 My model returns predictions quickly, doesn’t use too much memory or processing power, and doesn’t have outages.

Metrics

  • latency
  • memory and CPU usage
  • uptime

My model is performing well!

👩🏽‍🔬 My model returns predictions that are close to the true values for the predicted quantity.

Metrics

  • accuracy
  • ROC AUC
  • F1 score
  • RMSE
  • log loss

Failures in statistical performance can be silent

MODEL DRIFT

DATA DRIFT

Monitor your inputs

Monitor your inputs

Monitor your inputs

DATA DRIFT

Monitor your inputs

CONCEPT DRIFT

Monitor your outputs

Monitor your outputs

library(vetiver)

laundry_service_monitoring |> 
  vetiver_compute_metrics(date, "week", customer, .pred)
#> # A tibble: 30 × 5
#>    .index        .n .metric  .estimator .estimate
#>    <date>     <int> <chr>    <chr>          <dbl>
#>  1 2023-03-05    14 accuracy binary         0.857
#>  2 2023-03-05    14 kap      binary         0.708
#>  3 2023-03-09    34 accuracy binary         0.882
#>  4 2023-03-09    34 kap      binary         0.767
#>  5 2023-03-16    25 accuracy binary         0.8  
#>  6 2023-03-16    25 kap      binary         0.525
#>  7 2023-03-23    32 accuracy binary         0.844
#>  8 2023-03-23    32 kap      binary         0.685
#>  9 2023-03-30    36 accuracy binary         0.806
#> 10 2023-03-30    36 kap      binary         0.611
#> # ℹ 20 more rows

Monitor your outputs

library(vetiver)

laundry_service_monitoring |> 
  vetiver_compute_metrics(date, "week", customer, .pred) |> 
  vetiver_plot_metrics()

Feedback loops 🔁

Deployment of an ML model may cause data and/or concept drift

Examples

  • Movie recommendation systems on Netflix, Disney+, Hulu
  • Identifying fraudulent credit card transactions at Stripe
  • Predictive policing models

Stages of model monitoring maturity

  1. Manual 🙂

  2. Reproducible 🤓

  3. Automated 🤩

Resilient models that are successful in the long term

Learn more

Post questions at pos.it/slido-CD 🎯

Thank you!

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com