6 - Tuning Hyperparameters

Machine learning with tidymodels

Hyperparameters

Some model or preprocessing parameters cannot be estimated directly from your data

Choose the best parameter

ring_rec <-
  recipe(rings ~ ., data = ring_train) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_ns(shucked_weight, deg_free = 4)

How do we know that 4️⃣ is a good value?

Choose the best parameter

ring_rec <-
  recipe(rings ~ ., data = ring_train) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_ns(shucked_weight, deg_free = tune())

Splines and nonlinear relationships

Use the `tune_*()` functions to tune models

The main two strategies for optimization are:

Grid search 💠 which tests a pre-defined set of candidate values
Iterative search 🌀 which suggests/estimates new values of candidate parameters to evaluate

Choose the best parameter

ring_rec <-
  recipe(rings ~ ., data = ring_train) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_ns(shucked_weight, deg_free = tune())

spline_wf <- workflow(ring_rec, linear_reg())
spline_wf
#> ══ Workflow ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────
#> 2 Recipe Steps
#> 
#> • step_dummy()
#> • step_ns()
#> 
#> ── Model ─────────────────────────────────────────────────────────────
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm

Choose the best parameter

set.seed(123)
spline_res <- tune_grid(spline_wf, ring_folds)
spline_res
#> # Tuning results
#> # 5-fold cross-validation using stratification 
#> # A tibble: 5 × 4
#>   splits             id    .metrics          .notes          
#>   <list>             <chr> <list>            <list>          
#> 1 <split [2670/670]> Fold1 <tibble [18 × 5]> <tibble [0 × 3]>
#> 2 <split [2672/668]> Fold2 <tibble [18 × 5]> <tibble [0 × 3]>
#> 3 <split [2672/668]> Fold3 <tibble [18 × 5]> <tibble [0 × 3]>
#> 4 <split [2673/667]> Fold4 <tibble [18 × 5]> <tibble [0 × 3]>
#> 5 <split [2673/667]> Fold5 <tibble [18 × 5]> <tibble [0 × 3]>

Your turn

Use tune_grid() to tune your workflow with a recipe.

Collect the metrics from the results.

Use autoplot() to visualize the results.

Try show_best() to understand which parameter values are best.

05:00

Tuning results

collect_metrics(spline_res)
#> # A tibble: 18 × 7
#>    deg_free .metric .estimator  mean     n std_err .config             
#>       <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#>  1       13 rmse    standard   2.19      5 0.0397  Preprocessor1_Model1
#>  2       13 rsq     standard   0.540     5 0.00888 Preprocessor1_Model1
#>  3        8 rmse    standard   2.18      5 0.0395  Preprocessor2_Model1
#>  4        8 rsq     standard   0.541     5 0.00836 Preprocessor2_Model1
#>  5       11 rmse    standard   2.18      5 0.0402  Preprocessor3_Model1
#>  6       11 rsq     standard   0.541     5 0.00895 Preprocessor3_Model1
#>  7        4 rmse    standard   2.18      5 0.0403  Preprocessor4_Model1
#>  8        4 rsq     standard   0.542     5 0.00790 Preprocessor4_Model1
#>  9        7 rmse    standard   2.18      5 0.0398  Preprocessor5_Model1
#> 10        7 rsq     standard   0.542     5 0.00836 Preprocessor5_Model1
#> 11       14 rmse    standard   2.19      5 0.0409  Preprocessor6_Model1
#> 12       14 rsq     standard   0.540     5 0.00921 Preprocessor6_Model1
#> 13        2 rmse    standard   2.20      5 0.0428  Preprocessor7_Model1
#> 14        2 rsq     standard   0.535     5 0.00820 Preprocessor7_Model1
#> 15        6 rmse    standard   2.18      5 0.0406  Preprocessor8_Model1
#> 16        6 rsq     standard   0.542     5 0.00805 Preprocessor8_Model1
#> 17        3 rmse    standard   2.18      5 0.0411  Preprocessor9_Model1
#> 18        3 rsq     standard   0.542     5 0.00843 Preprocessor9_Model1

Tuning results

collect_metrics(spline_res, summarize = FALSE)
#> # A tibble: 90 × 6
#>    id    deg_free .metric .estimator .estimate .config             
#>    <chr>    <int> <chr>   <chr>          <dbl> <chr>               
#>  1 Fold1       13 rmse    standard       2.11  Preprocessor1_Model1
#>  2 Fold1       13 rsq     standard       0.513 Preprocessor1_Model1
#>  3 Fold2       13 rmse    standard       2.24  Preprocessor1_Model1
#>  4 Fold2       13 rsq     standard       0.537 Preprocessor1_Model1
#>  5 Fold3       13 rmse    standard       2.31  Preprocessor1_Model1
#>  6 Fold3       13 rsq     standard       0.544 Preprocessor1_Model1
#>  7 Fold4       13 rmse    standard       2.11  Preprocessor1_Model1
#>  8 Fold4       13 rsq     standard       0.569 Preprocessor1_Model1
#>  9 Fold5       13 rmse    standard       2.15  Preprocessor1_Model1
#> 10 Fold5       13 rsq     standard       0.540 Preprocessor1_Model1
#> # … with 80 more rows
#> # ℹ Use `print(n = ...)` to see more rows

Tuning results

autoplot(spline_res, metric = "rmse")

Tuning results

show_best(spline_res)
#> # A tibble: 5 × 7
#>   deg_free .metric .estimator  mean     n std_err .config             
#>      <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1        3 rmse    standard    2.18     5  0.0411 Preprocessor9_Model1
#> 2        6 rmse    standard    2.18     5  0.0406 Preprocessor8_Model1
#> 3        4 rmse    standard    2.18     5  0.0403 Preprocessor4_Model1
#> 4        7 rmse    standard    2.18     5  0.0398 Preprocessor5_Model1
#> 5       11 rmse    standard    2.18     5  0.0402 Preprocessor3_Model1

Optimize tuning parameters

Try different values and measure their performance
Find good values for these parameters
Finalize the model by fitting the model with these parameters to the entire training set

Tree depth in a decision tree?

Yes ✅

Number of PCA components to retain?

Yes ✅

Bayesian priors for model parameters?

Hmmmm, probably not ❌

Is the random seed a tuning parameter?

Nope ❌

Customize grid search

You can control the grid used to search the parameter space
Use the grid_*() functions, or create your own tibble

tibble(deg_free = 1:10)
#> # A tibble: 10 × 1
#>    deg_free
#>       <int>
#>  1        1
#>  2        2
#>  3        3
#>  4        4
#>  5        5
#>  6        6
#>  7        7
#>  8        8
#>  9        9
#> 10       10

Customize grid search

You can control the grid used to search the parameter space
Use the grid_*() functions, or create your own tibble

grid_regular(list(deg_free = spline_degree()), levels = 5)
#> # A tibble: 5 × 1
#>   deg_free
#>      <int>
#> 1        1
#> 2        3
#> 3        5
#> 4        7
#> 5       10

Customize grid search

You can control the grid used to search the parameter space
Use the grid_*() functions, or create your own tibble

grid_regular(list(deg_free = spline_degree(), tree_depth()), levels = 5)
#> # A tibble: 25 × 2
#>    deg_free tree_depth
#>       <int>      <int>
#>  1        1          1
#>  2        3          1
#>  3        5          1
#>  4        7          1
#>  5       10          1
#>  6        1          4
#>  7        3          4
#>  8        5          4
#>  9        7          4
#> 10       10          4
#> # … with 15 more rows
#> # ℹ Use `print(n = ...)` to see more rows

Customize grid search

You can control the grid used to search the parameter space
Use the grid_*() functions, or create your own tibble

grid_latin_hypercube(list(deg_free = spline_degree(), tree_depth()), size = 5)
#> # A tibble: 5 × 2
#>   deg_free tree_depth
#>      <int>      <int>
#> 1        4          8
#> 2        6         13
#> 3        2          2
#> 4        7         12
#> 5        9          5

Boosted trees 🌳🌲🌴🌵🌴🌳🌳🌴🌲🌵🌴🌲🌳🌴🌳🌵🌵🌴🌲🌲🌳🌴🌳🌴🌲🌴🌵🌴🌲🌴🌵🌲🌵🌴🌲🌳🌴🌵🌳🌴🌳🌲

Boosted trees 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵

Ensemble many decision tree models

Review how a decision tree model works:

Series of splits or if/then statements based on predictors
First the tree grows until some condition is met (maximum depth, no more data)
Then the tree is pruned to reduce its complexity

Single decision tree

Boosted trees 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵

Boosting methods fit a sequence of tree-based models:

Each tree is dependent on the one before and tries to compensate for any poor results in the previous trees
This is like gradient ascent/descent methods

Boosted tree tuning parameters

Most modern boosting methods have a lot of tuning parameters!

For tree growth and pruning (min_n, max_depth, etc)
For boosting (trees, stop_iter, learn_rate)

We’ll use early stopping to stop boosting when a few iterations produce consecutively worse results.

Comparing tree ensembles

Random forest

Independent trees
Bootstrapped data
No pruning
1000’s of trees

Boosting

Dependent trees
Tune tree parameters
Far fewer trees

Build an xgboost workflow

xgb_spec <-
  boost_tree(
    trees = 500, min_n = tune(), stop_iter = tune(), tree_depth = tune(),
    learn_rate = tune(), loss_reduction = tune()
  ) %>%
  set_mode("regression") %>% 
  set_engine("xgboost", validation = 0.1)

xgb_rec <- 
  recipe(rings ~ ., data = ring_train) %>%
  step_dummy(all_nominal_predictors())

xgb_wf <- workflow(xgb_rec, xgb_spec)

Your turn

Create your boosted tree workflow.

03:00

Tuning

This will take some time to run ⏳

set.seed(9)
ctrl_abalone <- control_grid(save_pred = TRUE)
xgb_res <-
  tune_grid(xgb_wf, resamples = ring_folds, grid = 15, control = ctrl_abalone)

Your turn

Start tuning the boosted tree model!

We won’t wait for everyone’s tuning to finish, but take this time to get it started before we move on.

03:00

Tuning results

xgb_res
#> # Tuning results
#> # 5-fold cross-validation using stratification 
#> # A tibble: 5 × 5
#>   splits             id    .metrics          .notes           .predictions
#>   <list>             <chr> <list>            <list>           <list>      
#> 1 <split [2670/670]> Fold1 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>    
#> 2 <split [2672/668]> Fold2 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>    
#> 3 <split [2672/668]> Fold3 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>    
#> 4 <split [2673/667]> Fold4 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>    
#> 5 <split [2673/667]> Fold5 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>

Tuning results

autoplot(xgb_res)

Compare models

Best logistic regression results:

spline_res %>% 
  show_best(metric = "rmse", n = 1) %>% 
  select(.metric, .estimator, mean, n, std_err, .config)
#> # A tibble: 1 × 6
#>   .metric .estimator  mean     n std_err .config             
#>   <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1 rmse    standard    2.18     5  0.0411 Preprocessor9_Model1

Best boosting results:

xgb_res %>% 
  show_best(metric = "rmse", n = 1) %>% 
  select(.metric, .estimator, mean, n, std_err, .config)
#> # A tibble: 1 × 6
#>   .metric .estimator  mean     n std_err .config              
#>   <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
#> 1 rmse    standard    2.17     5  0.0589 Preprocessor1_Model14

Your turn

Can you get better RMSE results with xgboost?

Try increasing learn_rate beyond the original range.

20:00

Finalize and fit the model

best_rmse <- select_best(spline_res, metric = "rmse")
best_rmse
#> # A tibble: 1 × 2
#>   deg_free .config             
#>      <int> <chr>               
#> 1        3 Preprocessor9_Model1

Finalize and fit the model

best_rmse <- select_best(spline_res, metric = "rmse")

final_res <-
  spline_wf %>% 
  finalize_workflow(best_rmse) %>%
  last_fit(ring_split)

final_res
#> # Resampling results
#> # Manual resampling 
#> # A tibble: 1 × 6
#>   splits             id               .metrics .notes   .predictions .workflow 
#>   <list>             <chr>            <list>   <list>   <list>       <list>    
#> 1 <split [3340/837]> train/test split <tibble> <tibble> <tibble>     <workflow>

Remember that last_fit() fits one time with the training set, then evaluates one time with the testing set.

Your turn

Finalize your workflow with the best parameters.

You could use either the spline or xgboost workflow.

Create a final fit.

08:00

Estimates of RMSE

Holdout results from tuning:

spline_res %>% 
  show_best(metric = "rmse", n = 1) %>% 
  select(.metric, mean, n, std_err)
#> # A tibble: 1 × 4
#>   .metric  mean     n std_err
#>   <chr>   <dbl> <int>   <dbl>
#> 1 rmse     2.18     5  0.0411

Test set results:

final_res %>% collect_metrics()
#> # A tibble: 2 × 4
#>   .metric .estimator .estimate .config             
#>   <chr>   <chr>          <dbl> <chr>               
#> 1 rmse    standard       2.23  Preprocessor1_Model1
#> 2 rsq     standard       0.534 Preprocessor1_Model1

Final fitted workflow

Extract the final fitted workflow (fit using the training set):

fitted_wf <- extract_workflow(final_res)

# use this object to predict or deploy
predict(fitted_wf, ring_test[1:3,])
#> # A tibble: 3 × 1
#>   .pred
#>   <dbl>
#> 1 11.4 
#> 2  7.82
#> 3 10.0

Next steps

Use explainers to characterize the model and the predictions
Document the model
Deploy the model
Create an applicability domain model to help monitor our data over time

6 - Tuning Hyperparameters

Hyperparameters

Choose the best parameter

Choose the best parameter

Splines and nonlinear relationships

Use the tune_*() functions to tune models

Choose the best parameter

Choose the best parameter

Your turn

Tuning results

Tuning results

Tuning results

Tuning results

Optimize tuning parameters

Tree depth in a decision tree?

Number of PCA components to retain?

Bayesian priors for model parameters?

Is the random seed a tuning parameter?

Customize grid search

Customize grid search

Customize grid search

Customize grid search

Boosted trees 🌳🌲🌴🌵🌴🌳🌳🌴🌲🌵🌴🌲🌳🌴🌳🌵🌵🌴🌲🌲🌳🌴🌳🌴🌲🌴🌵🌴🌲🌴🌵🌲🌵🌴🌲🌳🌴🌵🌳🌴🌳🌲

Boosted trees 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵

Review how a decision tree model works:

Single decision tree

Boosted trees 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵

Boosted tree tuning parameters

Comparing tree ensembles

Random forest

Boosting

Build an xgboost workflow

Your turn

Tuning

Your turn

Tuning results

Tuning results

Compare models

Your turn

Finalize and fit the model

Finalize and fit the model

Your turn

Estimates of RMSE

Final fitted workflow

Next steps

Use the `tune_*()` functions to tune models