Add validation_data to a rulelist — set_validation

Returns a rulelist with three new attributes set: validation_data, y_name and weight. Methods such as augment, calculate, prune, reorder require this to be set.

set_validation_data(x, validation_data, y_name, weight = 1)

Arguments

x: A rulelist
validation_data: (dataframe) Data to used for computing some metrics. It is expected to contain y_name column.
y_name: (string) Name of the dependent variable column.
weight: (non-negative numeric vector, default: 1) Weight per observation/row of validation_data. This is expected to have same length as the number of rows in validation_data. Only exception is when it is a single positive number, which means that all rows have equal weight.

Value

A rulelist with some extra attributes set.

Examples

att = modeldata::attrition
set.seed(100)
index = sample(c(TRUE, FALSE), nrow(att), replace = TRUE)
model_c5 = C50::C5.0(Attrition ~., data = att[index, ], rules = TRUE)

tidy_c5 = tidy(model_c5)
tidy_c5
#> ---- Rulelist --------------------------------
#> ▶ Keys: trial_nbr
#> ▶ Number of distinct keys: 1
#> ▶ Number of rules: 23
#> ▶ Model type: C5
#> ▶ Estimation type: classification
#> ▶ Is validation data set: FALSE
#> 
#> 
#>    rule_nbr trial_nbr LHS                         RHS   support confidence  lift
#>       <int>     <int> <chr>                       <fct>   <int>      <dbl> <dbl>
#>  1        1         1 ( Age > 30 ) & ( DistanceF… No         69      0.986   1.2
#>  2        2         1 ( DistanceFromHome <= 12 )… No        149      0.960   1.1
#>  3        3         1 ( Department == 'Research_… No        211      0.953   1.1
#>  4        4         1 ( Age > 30 ) & ( DistanceF… No        249      0.948   1.1
#>  5        5         1 ( JobInvolvement %in% c('M… No        353      0.944   1.1
#>  6        6         1 ( OverTime == 'No' ) & ( S… No        263      0.943   1.1
#>  7        7         1 ( Education %in% c('Master… No        101      0.942   1.1
#>  8        8         1 ( OverTime == 'No' ) & ( R… No         95      0.938   1.1
#>  9        9         1 ( BusinessTravel %in% c('N… No        352      0.915   1.1
#> 10       10         1 ( Education %in% c('Below_… No        265      0.910   1.1
#> # ℹ 13 more rows
#> ----------------------------------------------

tidy_c5_2 = set_validation_data(tidy_c5,
                                validation_data = att[!index, ],
                                y_name = "Attrition",
                                weight = 1 # default
                                )
tidy_c5_2
#> ---- Rulelist --------------------------------
#> ▶ Keys: trial_nbr
#> ▶ Number of distinct keys: 1
#> ▶ Number of rules: 23
#> ▶ Model type: C5
#> ▶ Estimation type: classification
#> ▶ Is validation data set: TRUE
#> 
#> 
#>    rule_nbr trial_nbr LHS                         RHS   support confidence  lift
#>       <int>     <int> <chr>                       <fct>   <int>      <dbl> <dbl>
#>  1        1         1 ( Age > 30 ) & ( DistanceF… No         69      0.986   1.2
#>  2        2         1 ( DistanceFromHome <= 12 )… No        149      0.960   1.1
#>  3        3         1 ( Department == 'Research_… No        211      0.953   1.1
#>  4        4         1 ( Age > 30 ) & ( DistanceF… No        249      0.948   1.1
#>  5        5         1 ( JobInvolvement %in% c('M… No        353      0.944   1.1
#>  6        6         1 ( OverTime == 'No' ) & ( S… No        263      0.943   1.1
#>  7        7         1 ( Education %in% c('Master… No        101      0.942   1.1
#>  8        8         1 ( OverTime == 'No' ) & ( R… No         95      0.938   1.1
#>  9        9         1 ( BusinessTravel %in% c('N… No        352      0.915   1.1
#> 10       10         1 ( Education %in% c('Below_… No        265      0.910   1.1
#> # ℹ 13 more rows
#> ----------------------------------------------
tidy_c5 # not altered
#> ---- Rulelist --------------------------------
#> ▶ Keys: trial_nbr
#> ▶ Number of distinct keys: 1
#> ▶ Number of rules: 23
#> ▶ Model type: C5
#> ▶ Estimation type: classification
#> ▶ Is validation data set: FALSE
#> 
#> 
#>    rule_nbr trial_nbr LHS                         RHS   support confidence  lift
#>       <int>     <int> <chr>                       <fct>   <int>      <dbl> <dbl>
#>  1        1         1 ( Age > 30 ) & ( DistanceF… No         69      0.986   1.2
#>  2        2         1 ( DistanceFromHome <= 12 )… No        149      0.960   1.1
#>  3        3         1 ( Department == 'Research_… No        211      0.953   1.1
#>  4        4         1 ( Age > 30 ) & ( DistanceF… No        249      0.948   1.1
#>  5        5         1 ( JobInvolvement %in% c('M… No        353      0.944   1.1
#>  6        6         1 ( OverTime == 'No' ) & ( S… No        263      0.943   1.1
#>  7        7         1 ( Education %in% c('Master… No        101      0.942   1.1
#>  8        8         1 ( OverTime == 'No' ) & ( R… No         95      0.938   1.1
#>  9        9         1 ( BusinessTravel %in% c('N… No        352      0.915   1.1
#> 10       10         1 ( Education %in% c('Below_… No        265      0.910   1.1
#> # ℹ 13 more rows
#> ----------------------------------------------

Add `validation_data` to a rulelist

Arguments

Value

See also

Examples