MLOps with vetiver

R-Ladies Rome

Julia Silge

2023-01-24

Hello!

Who are you?

  • Data scientist πŸ‘©β€πŸ’»
  • Statistician 🌟
  • Data analyst πŸ“ˆ
  • Software engineer πŸ› οΈ

If you develop a model…

you can operationalize that model!

If you develop a model…

you likely should operationalize that model!

Housing in Seattle 🏘️

glimpse(housing_prices)
#> Rows: 21,613
#> Columns: 6
#> $ price       <dbl> 221900, 538000, 180000, 604000, 510000, 1225000, 257500, 2…
#> $ bedrooms    <int> 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2, 3…
#> $ bathrooms   <dbl> 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2.50…
#> $ sqft_living <int> 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 1890,…
#> $ yr_built    <int> 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2003…
#> $ date        <dttm> 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-02-1…

Housing in Seattle 🏘️

housing_wf
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> price ~ bedrooms + bathrooms + sqft_living + yr_built
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~200,      num.threads = 1, verbose = FALSE, seed = sample.int(10^5,          1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  200 
#> Sample size:                      11756 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 5 
#> Variable importance mode:         none 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       58388723094 
#> R squared (OOB):                  0.5796486

What is MLOps? πŸ€”

MLOps is…

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

MLOps with vetiver

library(vetiver)
v <- vetiver_model(housing_wf, "home-prices")
v
#> 
#> ── home-prices ─ <bundled_workflow> model for deployment 
#> A ranger regression modeling workflow using 4 features

Make it easy to do the right thing

  • Robust and human-friendly checking of new data
  • Track and document software dependencies of models
  • Model cards for transparent, responsible reporting

MLOps is…

  • Versioning

βœ… managing change in models

MLOps is…

  • Versioning
  • Deploying

🎯 putting models in REST APIs

MLOps is…

  • Versioning
  • Deploying
  • Monitoring

πŸ‘€ tracking model performance

Version your model

library(pins)
board <- board_connect()
board %>% vetiver_pin_write(v)

Create a REST API

library(plumber)
pr() %>%
  vetiver_api(v)
#> # Plumber router with 2 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> β”œβ”€β”€[queryString]
#> β”œβ”€β”€[body]
#> β”œβ”€β”€[cookieParser]
#> β”œβ”€β”€[sharedSecret]
#> β”œβ”€β”€/logo
#> β”‚  β”‚ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/vetiver
#> β”œβ”€β”€/ping (GET)
#> └──/predict (POST)
## next pipe to `pr_run()` for local API

Where does vetiver work?

  • Posit’s pro products, like Connect: vetiver_deploy_rsconnect()

  • A public or private cloud, using Docker: vetiver_prepare_docker()

Monitor your model

new_metrics <-
  augment(v, housing_val) %>%
  vetiver_compute_metrics(date, "week", price, .pred)

vetiver_pin_metrics(
  board,
  new_metrics, 
  "julia.silge/housing-metrics",
  overwrite = TRUE
)
#> # A tibble: 90 Γ— 5
#>    .index                 .n .metric .estimator  .estimate
#>    <dttm>              <int> <chr>   <chr>           <dbl>
#>  1 2014-11-02 00:00:00   224 rmse    standard   202771.   
#>  2 2014-11-02 00:00:00   224 rsq     standard        0.426
#>  3 2014-11-02 00:00:00   224 mae     standard   139488.   
#>  4 2014-11-06 00:00:00   373 rmse    standard   222177.   
#>  5 2014-11-06 00:00:00   373 rsq     standard        0.554
#>  6 2014-11-06 00:00:00   373 mae     standard   150809.   
#>  7 2014-11-13 00:00:00   427 rmse    standard   255889.   
#>  8 2014-11-13 00:00:00   427 rsq     standard        0.554
#>  9 2014-11-13 00:00:00   427 mae     standard   148054.   
#> 10 2014-11-20 00:00:00   376 rmse    standard   251007.   
#> # … with 80 more rows

Monitor your model

new_metrics %>%
  ## you can operate on your metrics as needed:
  filter(.metric %in% c("rmse", "mae"), .n > 20) %>%
  vetiver_plot_metrics() + 
  ## you can also operate on the ggplot:
  scale_size(range = c(2, 5))

Using vetiver

  • allows those new to MLOps to get started quickly
  • supports scaling safely as an org matures

What does vetiver do?

  • Version

  • Deploy

  • Monitor

your R and Python models

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

Why should data practitioners be excited about MLOps?

  • Connect your work to the β€œreal world”
  • Scale your impact

Learn more

Thank you!

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com