MLOps with vetiver

R-Ladies Rome

Julia Silge

2023-01-24

Hello!

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com

Who are you?

Data scientist 👩‍💻
Statistician 🌟
Data analyst 📈
Software engineer 🛠️

If you develop a model…

you can operationalize that model!

If you develop a model…

you likely should operationalize that model!

Housing in Seattle 🏘️

glimpse(housing_prices)
#> Rows: 21,613
#> Columns: 6
#> $ price       <dbl> 221900, 538000, 180000, 604000, 510000, 1225000, 257500, 2…
#> $ bedrooms    <int> 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2, 3…
#> $ bathrooms   <dbl> 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2.50…
#> $ sqft_living <int> 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 1890,…
#> $ yr_built    <int> 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2003…
#> $ date        <dttm> 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-02-1…

Housing in Seattle 🏘️

housing_wf
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> price ~ bedrooms + bathrooms + sqft_living + yr_built
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~200,      num.threads = 1, verbose = FALSE, seed = sample.int(10^5,          1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  200 
#> Sample size:                      11756 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 5 
#> Variable importance mode:         none 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       58388723094 
#> R squared (OOB):                  0.5796486

What is MLOps? 🤔

MLOps is…

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

MLOps with vetiver

library(vetiver)
v <- vetiver_model(housing_wf, "home-prices")
v
#> 
#> ── home-prices ─ <bundled_workflow> model for deployment 
#> A ranger regression modeling workflow using 4 features

Make it easy to do the right thing

Robust and human-friendly checking of new data
Track and document software dependencies of models
Model cards for transparent, responsible reporting

MLOps is…

Versioning

✅ managing change in models

MLOps is…

Versioning
Deploying

🎯 putting models in REST APIs

MLOps is…

Versioning
Deploying
Monitoring

👀 tracking model performance

Version your model

library(pins)
board <- board_connect()
board %>% vetiver_pin_write(v)

https://colorado.posit.co/rsc/seattle-housing-pin/

Create a REST API

library(plumber)
pr() %>%
  vetiver_api(v)
#> # Plumber router with 2 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/logo
#> │  │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/vetiver
#> ├──/ping (GET)
#> └──/predict (POST)
## next pipe to `pr_run()` for local API

Where does vetiver work?

Posit’s pro products, like Connect: vetiver_deploy_rsconnect()
A public or private cloud, using Docker: vetiver_prepare_docker()

https://colorado.posit.co/rsc/seattle-housing/

Monitor your model

new_metrics <-
  augment(v, housing_val) %>%
  vetiver_compute_metrics(date, "week", price, .pred)

vetiver_pin_metrics(
  board,
  new_metrics, 
  "julia.silge/housing-metrics",
  overwrite = TRUE
)
#> # A tibble: 90 × 5
#>    .index                 .n .metric .estimator  .estimate
#>    <dttm>              <int> <chr>   <chr>           <dbl>
#>  1 2014-11-02 00:00:00   224 rmse    standard   202771.   
#>  2 2014-11-02 00:00:00   224 rsq     standard        0.426
#>  3 2014-11-02 00:00:00   224 mae     standard   139488.   
#>  4 2014-11-06 00:00:00   373 rmse    standard   222177.   
#>  5 2014-11-06 00:00:00   373 rsq     standard        0.554
#>  6 2014-11-06 00:00:00   373 mae     standard   150809.   
#>  7 2014-11-13 00:00:00   427 rmse    standard   255889.   
#>  8 2014-11-13 00:00:00   427 rsq     standard        0.554
#>  9 2014-11-13 00:00:00   427 mae     standard   148054.   
#> 10 2014-11-20 00:00:00   376 rmse    standard   251007.   
#> # … with 80 more rows

Monitor your model

new_metrics %>%
  ## you can operate on your metrics as needed:
  filter(.metric %in% c("rmse", "mae"), .n > 20) %>%
  vetiver_plot_metrics() + 
  ## you can also operate on the ggplot:
  scale_size(range = c(2, 5))

https://colorado.posit.co/rsc/seattle-housing-dashboard/

Using vetiver

allows those new to MLOps to get started quickly
supports scaling safely as an org matures

What does vetiver do?

Version
Deploy
Monitor

your R and Python models

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

Why should data practitioners be excited about MLOps?

Connect your work to the “real world”
Scale your impact

Learn more

Documentation at https://vetiver.rstudio.com/
Isabel Zimmerman’s talk from rstudio::conf() 2022 on Demystifying MLOps
Webinar by Julia and Isabel for Posit Enterprise Meetup
Julia’s recent screencast on deploying a model with Docker
End-to-end demos from Posit Solution Engineering in R and Python

Thank you!

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com