6 - Tuning Hyperparameters

Machine learning with tidymodels

Hyperparameters

Some model or preprocessing parameters cannot be estimated directly from your data

Choose the best parameter

ring_rec <-
  recipe(rings ~ ., data = ring_train) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_ns(shucked_weight, deg_free = 4)

How do we know that 4️⃣ is a good value?

Choose the best parameter

ring_rec <-
  recipe(rings ~ ., data = ring_train) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_ns(shucked_weight, deg_free = tune())

Splines and nonlinear relationships

Use the tune_*() functions to tune models

The main two strategies for optimization are:

  • Grid search 💠 which tests a pre-defined set of candidate values

  • Iterative search 🌀 which suggests/estimates new values of candidate parameters to evaluate

Choose the best parameter

ring_rec <-
  recipe(rings ~ ., data = ring_train) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_ns(shucked_weight, deg_free = tune())

spline_wf <- workflow(ring_rec, linear_reg())
spline_wf
#> ══ Workflow ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: linear_reg()
#> 
#> ── Preprocessor ──────────────────────────────────────────────────────
#> 2 Recipe Steps
#> 
#> • step_dummy()
#> • step_ns()
#> 
#> ── Model ─────────────────────────────────────────────────────────────
#> Linear Regression Model Specification (regression)
#> 
#> Computational engine: lm

Choose the best parameter

set.seed(123)
spline_res <- tune_grid(spline_wf, ring_folds)
spline_res
#> # Tuning results
#> # 5-fold cross-validation using stratification 
#> # A tibble: 5 × 4
#>   splits             id    .metrics          .notes          
#>   <list>             <chr> <list>            <list>          
#> 1 <split [2670/670]> Fold1 <tibble [18 × 5]> <tibble [0 × 3]>
#> 2 <split [2672/668]> Fold2 <tibble [18 × 5]> <tibble [0 × 3]>
#> 3 <split [2672/668]> Fold3 <tibble [18 × 5]> <tibble [0 × 3]>
#> 4 <split [2673/667]> Fold4 <tibble [18 × 5]> <tibble [0 × 3]>
#> 5 <split [2673/667]> Fold5 <tibble [18 × 5]> <tibble [0 × 3]>

Your turn

Use tune_grid() to tune your workflow with a recipe.

Collect the metrics from the results.

Use autoplot() to visualize the results.

Try show_best() to understand which parameter values are best.

05:00

Tuning results

collect_metrics(spline_res)
#> # A tibble: 18 × 7
#>    deg_free .metric .estimator  mean     n std_err .config             
#>       <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#>  1       13 rmse    standard   2.19      5 0.0397  Preprocessor1_Model1
#>  2       13 rsq     standard   0.540     5 0.00888 Preprocessor1_Model1
#>  3        8 rmse    standard   2.18      5 0.0395  Preprocessor2_Model1
#>  4        8 rsq     standard   0.541     5 0.00836 Preprocessor2_Model1
#>  5       11 rmse    standard   2.18      5 0.0402  Preprocessor3_Model1
#>  6       11 rsq     standard   0.541     5 0.00895 Preprocessor3_Model1
#>  7        4 rmse    standard   2.18      5 0.0403  Preprocessor4_Model1
#>  8        4 rsq     standard   0.542     5 0.00790 Preprocessor4_Model1
#>  9        7 rmse    standard   2.18      5 0.0398  Preprocessor5_Model1
#> 10        7 rsq     standard   0.542     5 0.00836 Preprocessor5_Model1
#> 11       14 rmse    standard   2.19      5 0.0409  Preprocessor6_Model1
#> 12       14 rsq     standard   0.540     5 0.00921 Preprocessor6_Model1
#> 13        2 rmse    standard   2.20      5 0.0428  Preprocessor7_Model1
#> 14        2 rsq     standard   0.535     5 0.00820 Preprocessor7_Model1
#> 15        6 rmse    standard   2.18      5 0.0406  Preprocessor8_Model1
#> 16        6 rsq     standard   0.542     5 0.00805 Preprocessor8_Model1
#> 17        3 rmse    standard   2.18      5 0.0411  Preprocessor9_Model1
#> 18        3 rsq     standard   0.542     5 0.00843 Preprocessor9_Model1

Tuning results

collect_metrics(spline_res, summarize = FALSE)
#> # A tibble: 90 × 6
#>    id    deg_free .metric .estimator .estimate .config             
#>    <chr>    <int> <chr>   <chr>          <dbl> <chr>               
#>  1 Fold1       13 rmse    standard       2.11  Preprocessor1_Model1
#>  2 Fold1       13 rsq     standard       0.513 Preprocessor1_Model1
#>  3 Fold2       13 rmse    standard       2.24  Preprocessor1_Model1
#>  4 Fold2       13 rsq     standard       0.537 Preprocessor1_Model1
#>  5 Fold3       13 rmse    standard       2.31  Preprocessor1_Model1
#>  6 Fold3       13 rsq     standard       0.544 Preprocessor1_Model1
#>  7 Fold4       13 rmse    standard       2.11  Preprocessor1_Model1
#>  8 Fold4       13 rsq     standard       0.569 Preprocessor1_Model1
#>  9 Fold5       13 rmse    standard       2.15  Preprocessor1_Model1
#> 10 Fold5       13 rsq     standard       0.540 Preprocessor1_Model1
#> # … with 80 more rows
#> # ℹ Use `print(n = ...)` to see more rows

Tuning results

autoplot(spline_res, metric = "rmse")

Tuning results

show_best(spline_res)
#> # A tibble: 5 × 7
#>   deg_free .metric .estimator  mean     n std_err .config             
#>      <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1        3 rmse    standard    2.18     5  0.0411 Preprocessor9_Model1
#> 2        6 rmse    standard    2.18     5  0.0406 Preprocessor8_Model1
#> 3        4 rmse    standard    2.18     5  0.0403 Preprocessor4_Model1
#> 4        7 rmse    standard    2.18     5  0.0398 Preprocessor5_Model1
#> 5       11 rmse    standard    2.18     5  0.0402 Preprocessor3_Model1

Optimize tuning parameters

  • Try different values and measure their performance

  • Find good values for these parameters

  • Finalize the model by fitting the model with these parameters to the entire training set

Tree depth in a decision tree?

Yes ✅

Number of PCA components to retain?

Yes ✅

Bayesian priors for model parameters?

Hmmmm, probably not ❌

Is the random seed a tuning parameter?

Nope ❌

Customize grid search

  • You can control the grid used to search the parameter space

  • Use the grid_*() functions, or create your own tibble

grid_regular(list(deg_free = spline_degree()), levels = 5)
#> # A tibble: 5 × 1
#>   deg_free
#>      <int>
#> 1        1
#> 2        3
#> 3        5
#> 4        7
#> 5       10

Customize grid search

  • You can control the grid used to search the parameter space

  • Use the grid_*() functions, or create your own tibble

grid_regular(list(deg_free = spline_degree(), tree_depth()), levels = 5)
#> # A tibble: 25 × 2
#>    deg_free tree_depth
#>       <int>      <int>
#>  1        1          1
#>  2        3          1
#>  3        5          1
#>  4        7          1
#>  5       10          1
#>  6        1          4
#>  7        3          4
#>  8        5          4
#>  9        7          4
#> 10       10          4
#> # … with 15 more rows
#> # ℹ Use `print(n = ...)` to see more rows

Customize grid search

  • You can control the grid used to search the parameter space

  • Use the grid_*() functions, or create your own tibble

grid_latin_hypercube(list(deg_free = spline_degree(), tree_depth()), size = 5)
#> # A tibble: 5 × 2
#>   deg_free tree_depth
#>      <int>      <int>
#> 1        4          8
#> 2        6         13
#> 3        2          2
#> 4        7         12
#> 5        9          5

Boosted trees 🌳🌲🌴🌵🌴🌳🌳🌴🌲🌵🌴🌲🌳🌴🌳🌵🌵🌴🌲🌲🌳🌴🌳🌴🌲🌴🌵🌴🌲🌴🌵🌲🌵🌴🌲🌳🌴🌵🌳🌴🌳🌲

Boosted trees 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵

  • Ensemble many decision tree models

Review how a decision tree model works:

  • Series of splits or if/then statements based on predictors

  • First the tree grows until some condition is met (maximum depth, no more data)

  • Then the tree is pruned to reduce its complexity

Single decision tree

Boosted trees 🌳🌲🌴🌵🌳🌳🌴🌲🌵🌴🌳🌵

Boosting methods fit a sequence of tree-based models:

  • Each tree is dependent on the one before and tries to compensate for any poor results in the previous trees

  • This is like gradient ascent/descent methods

Boosted tree tuning parameters

Most modern boosting methods have a lot of tuning parameters!

  • For tree growth and pruning (min_n, max_depth, etc)

  • For boosting (trees, stop_iter, learn_rate)

We’ll use early stopping to stop boosting when a few iterations produce consecutively worse results.

Comparing tree ensembles

Random forest

  • Independent trees
  • Bootstrapped data
  • No pruning
  • 1000’s of trees

Boosting

  • Dependent trees
  • Tune tree parameters
  • Far fewer trees

Build an xgboost workflow

xgb_spec <-
  boost_tree(
    trees = 500, min_n = tune(), stop_iter = tune(), tree_depth = tune(),
    learn_rate = tune(), loss_reduction = tune()
  ) %>%
  set_mode("regression") %>% 
  set_engine("xgboost", validation = 0.1)

xgb_rec <- 
  recipe(rings ~ ., data = ring_train) %>%
  step_dummy(all_nominal_predictors())

xgb_wf <- workflow(xgb_rec, xgb_spec) 

Your turn

Create your boosted tree workflow.

03:00

Tuning

This will take some time to run ⏳

set.seed(9)
ctrl_abalone <- control_grid(save_pred = TRUE)
xgb_res <-
  tune_grid(xgb_wf, resamples = ring_folds, grid = 15, control = ctrl_abalone)

Your turn

Start tuning the boosted tree model!

We won’t wait for everyone’s tuning to finish, but take this time to get it started before we move on.

03:00

Tuning results

xgb_res
#> # Tuning results
#> # 5-fold cross-validation using stratification 
#> # A tibble: 5 × 5
#>   splits             id    .metrics          .notes           .predictions
#>   <list>             <chr> <list>            <list>           <list>      
#> 1 <split [2670/670]> Fold1 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>    
#> 2 <split [2672/668]> Fold2 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>    
#> 3 <split [2672/668]> Fold3 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>    
#> 4 <split [2673/667]> Fold4 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>    
#> 5 <split [2673/667]> Fold5 <tibble [30 × 9]> <tibble [0 × 3]> <tibble>

Tuning results

autoplot(xgb_res)

Compare models

Best logistic regression results:

spline_res %>% 
  show_best(metric = "rmse", n = 1) %>% 
  select(.metric, .estimator, mean, n, std_err, .config)
#> # A tibble: 1 × 6
#>   .metric .estimator  mean     n std_err .config             
#>   <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1 rmse    standard    2.18     5  0.0411 Preprocessor9_Model1

Best boosting results:

xgb_res %>% 
  show_best(metric = "rmse", n = 1) %>% 
  select(.metric, .estimator, mean, n, std_err, .config)
#> # A tibble: 1 × 6
#>   .metric .estimator  mean     n std_err .config              
#>   <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
#> 1 rmse    standard    2.17     5  0.0589 Preprocessor1_Model14

Your turn

Can you get better RMSE results with xgboost?

Try increasing learn_rate beyond the original range.

20:00

Finalize and fit the model

best_rmse <- select_best(spline_res, metric = "rmse")
best_rmse
#> # A tibble: 1 × 2
#>   deg_free .config             
#>      <int> <chr>               
#> 1        3 Preprocessor9_Model1

Finalize and fit the model

best_rmse <- select_best(spline_res, metric = "rmse")

final_res <-
  spline_wf %>% 
  finalize_workflow(best_rmse) %>%
  last_fit(ring_split)

final_res
#> # Resampling results
#> # Manual resampling 
#> # A tibble: 1 × 6
#>   splits             id               .metrics .notes   .predictions .workflow 
#>   <list>             <chr>            <list>   <list>   <list>       <list>    
#> 1 <split [3340/837]> train/test split <tibble> <tibble> <tibble>     <workflow>

Remember that last_fit() fits one time with the training set, then evaluates one time with the testing set.

Your turn

Finalize your workflow with the best parameters.

You could use either the spline or xgboost workflow.

Create a final fit.

08:00

Estimates of RMSE

Holdout results from tuning:

spline_res %>% 
  show_best(metric = "rmse", n = 1) %>% 
  select(.metric, mean, n, std_err)
#> # A tibble: 1 × 4
#>   .metric  mean     n std_err
#>   <chr>   <dbl> <int>   <dbl>
#> 1 rmse     2.18     5  0.0411

Test set results:

final_res %>% collect_metrics()
#> # A tibble: 2 × 4
#>   .metric .estimator .estimate .config             
#>   <chr>   <chr>          <dbl> <chr>               
#> 1 rmse    standard       2.23  Preprocessor1_Model1
#> 2 rsq     standard       0.534 Preprocessor1_Model1

Final fitted workflow

Extract the final fitted workflow (fit using the training set):

fitted_wf <- extract_workflow(final_res)

# use this object to predict or deploy
predict(fitted_wf, ring_test[1:3,])
#> # A tibble: 3 × 1
#>   .pred
#>   <dbl>
#> 1 11.4 
#> 2  7.82
#> 3 10.0

Next steps