Returns a rulelist with three new attributes set:
validation_data
, y_name
and weight
. Methods such as
augment, calculate,
prune, reorder require this to be set.
set_validation_data(x, validation_data, y_name, weight = 1)
A rulelist
(dataframe) Data to used for computing some metrics.
It is expected to contain y_name
column.
(string) Name of the dependent variable column.
(non-negative numeric vector, default: 1) Weight per
observation/row of validation_data
. This is expected to have same length
as the number of rows in validation_data
. Only exception is when it is a
single positive number, which means that all rows have equal weight.
A rulelist with some extra attributes set.
att = modeldata::attrition
set.seed(100)
index = sample(c(TRUE, FALSE), nrow(att), replace = TRUE)
model_c5 = C50::C5.0(Attrition ~., data = att[index, ], rules = TRUE)
tidy_c5 = tidy(model_c5)
tidy_c5
#> ---- Rulelist --------------------------------
#> ▶ Keys: trial_nbr
#> ▶ Number of distinct keys: 1
#> ▶ Number of rules: 23
#> ▶ Model type: C5
#> ▶ Estimation type: classification
#> ▶ Is validation data set: FALSE
#>
#>
#> rule_nbr trial_nbr LHS RHS support confidence lift
#> <int> <int> <chr> <fct> <int> <dbl> <dbl>
#> 1 1 1 ( Age > 30 ) & ( DistanceF… No 69 0.986 1.2
#> 2 2 1 ( DistanceFromHome <= 12 )… No 149 0.960 1.1
#> 3 3 1 ( Department == 'Research_… No 211 0.953 1.1
#> 4 4 1 ( Age > 30 ) & ( DistanceF… No 249 0.948 1.1
#> 5 5 1 ( JobInvolvement %in% c('M… No 353 0.944 1.1
#> 6 6 1 ( OverTime == 'No' ) & ( S… No 263 0.943 1.1
#> 7 7 1 ( Education %in% c('Master… No 101 0.942 1.1
#> 8 8 1 ( OverTime == 'No' ) & ( R… No 95 0.938 1.1
#> 9 9 1 ( BusinessTravel %in% c('N… No 352 0.915 1.1
#> 10 10 1 ( Education %in% c('Below_… No 265 0.910 1.1
#> # ℹ 13 more rows
#> ----------------------------------------------
tidy_c5_2 = set_validation_data(tidy_c5,
validation_data = att[!index, ],
y_name = "Attrition",
weight = 1 # default
)
tidy_c5_2
#> ---- Rulelist --------------------------------
#> ▶ Keys: trial_nbr
#> ▶ Number of distinct keys: 1
#> ▶ Number of rules: 23
#> ▶ Model type: C5
#> ▶ Estimation type: classification
#> ▶ Is validation data set: TRUE
#>
#>
#> rule_nbr trial_nbr LHS RHS support confidence lift
#> <int> <int> <chr> <fct> <int> <dbl> <dbl>
#> 1 1 1 ( Age > 30 ) & ( DistanceF… No 69 0.986 1.2
#> 2 2 1 ( DistanceFromHome <= 12 )… No 149 0.960 1.1
#> 3 3 1 ( Department == 'Research_… No 211 0.953 1.1
#> 4 4 1 ( Age > 30 ) & ( DistanceF… No 249 0.948 1.1
#> 5 5 1 ( JobInvolvement %in% c('M… No 353 0.944 1.1
#> 6 6 1 ( OverTime == 'No' ) & ( S… No 263 0.943 1.1
#> 7 7 1 ( Education %in% c('Master… No 101 0.942 1.1
#> 8 8 1 ( OverTime == 'No' ) & ( R… No 95 0.938 1.1
#> 9 9 1 ( BusinessTravel %in% c('N… No 352 0.915 1.1
#> 10 10 1 ( Education %in% c('Below_… No 265 0.910 1.1
#> # ℹ 13 more rows
#> ----------------------------------------------
tidy_c5 # not altered
#> ---- Rulelist --------------------------------
#> ▶ Keys: trial_nbr
#> ▶ Number of distinct keys: 1
#> ▶ Number of rules: 23
#> ▶ Model type: C5
#> ▶ Estimation type: classification
#> ▶ Is validation data set: FALSE
#>
#>
#> rule_nbr trial_nbr LHS RHS support confidence lift
#> <int> <int> <chr> <fct> <int> <dbl> <dbl>
#> 1 1 1 ( Age > 30 ) & ( DistanceF… No 69 0.986 1.2
#> 2 2 1 ( DistanceFromHome <= 12 )… No 149 0.960 1.1
#> 3 3 1 ( Department == 'Research_… No 211 0.953 1.1
#> 4 4 1 ( Age > 30 ) & ( DistanceF… No 249 0.948 1.1
#> 5 5 1 ( JobInvolvement %in% c('M… No 353 0.944 1.1
#> 6 6 1 ( OverTime == 'No' ) & ( S… No 263 0.943 1.1
#> 7 7 1 ( Education %in% c('Master… No 101 0.942 1.1
#> 8 8 1 ( OverTime == 'No' ) & ( R… No 95 0.938 1.1
#> 9 9 1 ( BusinessTravel %in% c('N… No 352 0.915 1.1
#> 10 10 1 ( Education %in% c('Below_… No 265 0.910 1.1
#> # ℹ 13 more rows
#> ----------------------------------------------