MLOps with vetiver

What is β€œproduction” anyway?

Hello!

Who are you?

  • Data scientist πŸ‘©β€πŸ’»
  • Statistician 🌟
  • Data analyst πŸ“ˆ
  • Software engineer πŸ› οΈ

If you develop a model…

you can operationalize that model!

If you develop a model…

you likely should operationalize that model!

Housing in Seattle 🏘️

glimpse(housing_prices)
#> Rows: 21,613
#> Columns: 6
#> $ price       <dbl> 221900, 538000, 180000, 604000, 510000, 1225000, 257500, 2…
#> $ bedrooms    <int> 3, 3, 2, 4, 3, 4, 3, 3, 3, 3, 3, 2, 3, 3, 5, 4, 3, 4, 2, 3…
#> $ bathrooms   <dbl> 1.00, 2.25, 1.00, 3.00, 2.00, 4.50, 2.25, 1.50, 1.00, 2.50…
#> $ sqft_living <int> 1180, 2570, 770, 1960, 1680, 5420, 1715, 1060, 1780, 1890,…
#> $ yr_built    <int> 1955, 1951, 1933, 1965, 1987, 2001, 1995, 1963, 1960, 2003…
#> $ date        <dttm> 2014-10-13, 2014-12-09, 2015-02-25, 2014-12-09, 2015-02-1…

Housing in Seattle 🏘️

housing_wf
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: rand_forest()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> price ~ bedrooms + bathrooms + sqft_living + yr_built
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~200,      num.threads = 1, verbose = FALSE, seed = sample.int(10^5,          1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  200 
#> Sample size:                      11756 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 5 
#> Variable importance mode:         none 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       58092312589 
#> R squared (OOB):                  0.5817825

What is MLOps? πŸ€”

MLOps is…

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

MLOps with vetiver

Vetiver, the oil of tranquility, is used as a stabilizing ingredient in perfumery to preserve more volatile fragrances.

MLOps with vetiver

library(vetiver)
v <- vetiver_model(housing_wf, "home-prices")
v
#> 
#> ── home-prices ─ <bundled_workflow> model for deployment 
#> A ranger regression modeling workflow using 4 features

Make it easy to do the right thing

  • Robust and human-friendly checking of new data
  • Track and document software dependencies of models
  • Model cards for transparent, responsible reporting

MLOps is…

  • Versioning
    • Managing change in models βœ…
  • Deploying
    • Putting models in REST APIs 🎯
  • Monitoring
    • Tracking model performance πŸ‘€

Version your model

library(pins)
board <- board_connect()
board |> vetiver_pin_write(v)

Create a REST API

library(plumber)
pr() |>
  vetiver_api(v)
#> # Plumber router with 4 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> β”œβ”€β”€[queryString]
#> β”œβ”€β”€[body]
#> β”œβ”€β”€[cookieParser]
#> β”œβ”€β”€[sharedSecret]
#> β”œβ”€β”€/logo
#> β”‚  β”‚ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/vetiver
#> β”œβ”€β”€/metadata (GET)
#> β”œβ”€β”€/ping (GET)
#> β”œβ”€β”€/predict (POST)
#> └──/prototype (GET)
## next pipe to `pr_run()` for local API

Where does vetiver work?

  • Posit’s pro products, like Connect: vetiver_deploy_rsconnect()

  • AWS SageMaker: vetiver_deploy_sagemaker()

  • A public or private cloud, using Docker: vetiver_prepare_docker()

Monitor your model

new_metrics <-
  augment(v, housing_val) |>
  vetiver_compute_metrics(date, "week", price, .pred)

vetiver_pin_metrics(
  board,
  new_metrics,
  "julia.silge/housing-metrics",
  overwrite = TRUE
)
#> # A tibble: 90 Γ— 5
#>    .index                 .n .metric .estimator  .estimate
#>    <dttm>              <int> <chr>   <chr>           <dbl>
#>  1 2014-11-02 00:00:00   224 rmse    standard   206519.   
#>  2 2014-11-02 00:00:00   224 rsq     standard        0.414
#>  3 2014-11-02 00:00:00   224 mae     standard   139904.   
#>  4 2014-11-06 00:00:00   373 rmse    standard   222259.   
#>  5 2014-11-06 00:00:00   373 rsq     standard        0.555
#>  6 2014-11-06 00:00:00   373 mae     standard   150022.   
#>  7 2014-11-13 00:00:00   427 rmse    standard   253473.   
#>  8 2014-11-13 00:00:00   427 rsq     standard        0.562
#>  9 2014-11-13 00:00:00   427 mae     standard   145938.   
#> 10 2014-11-20 00:00:00   376 rmse    standard   251856.   
#> # β„Ή 80 more rows

Monitor your model

new_metrics |>
  ## you can operate on your metrics as needed:
  filter(.metric %in% c("rmse", "mae"), .n > 20) |>
  vetiver_plot_metrics() +
  ## you can also operate on the ggplot:
  scale_size(range = c(2, 5))

Using vetiver

  • allows those new to MLOps to get started quickly
  • supports scaling safely as an org matures

What does vetiver do?

  • Version

  • Deploy

  • Monitor

your R and Python models

MLOps is…

a set of practices to deploy and maintain machine learning models in production reliably and efficiently

Why should data practitioners be excited about MLOps?

  • Connect your work to the β€œreal world”
  • Scale your impact

Learn more

Thank you!

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com