The right tool for the job

SciPy 2024 | Julia Silge

Hello!

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com

Tools for data science

Using multiple programming languages

  • What does it cost?
  • What do you gain?
  • What can we give?

COST

For the individual

  • It is expensive to learn new things
  • There are benefits to specialization

In an organization

  • Consistency
  • Complexity

There should be one, and preferably only one, obvious way to do it

Pins πŸ“Œ


Python

import pins

board = pins.board_temp()
board.pin_write(
  very_nice_data, 
  "important-stuff", 
  type = "parquet")

R

library(pins)

board <- board_temp()
board |> pin_write(
  very_nice_data, 
  "important-stuff", 
  type = "parquet")

Pins πŸ“Œ


Python

import pins

board = pins.board_temp()
board.pin_read("important-stuff")

R

library(pins)

board <- board_temp()
board |> pin_read("important-stuff")


  • Cost for individuals
  • Cost for our organization

GAIN

In an organization

  • Everyone can be more productive

Practicality beats purity

Vetiver 🏺


Python

from vetiver import VetiverModel, VetiverAPI

v = VetiverModel(
  model_fit, 
  "my-important-model", 
  prototype_data = X_train)

api = VetiverAPI(v)
api.run()

R

library(vetiver)
library(plumber)

v <- vetiver_model(
  model_fit, 
  "my-important-model")

pr() |>
  vetiver_api(v) |>
  pr_run()

MLOps is…

  • Versioning
    • Managing change in models βœ…
  • Deploying
    • Putting models in REST APIs 🎯
  • Monitoring
    • Tracking model performance πŸ‘€

For the individual

  • You can scale your impact
  • Consider the long term
  • Increase your vocabulary

GIVE

Building tools

  • Learn from one community
  • Bring to a different one

Positron

Thank you!

@juliasilge

@juliasilge@fosstodon.org

youtube.com/juliasilge

juliasilge.com