A rulelist
is ordered list of rules stored as a dataframe. Each row,
specifies a rule (LHS), expected outcome (RHS) and some other details.
It has these mandatory columns:
rule_nbr
: (integer vector) Rule number
LHS
: (character vector) A rule is a string that can be parsed using base::parse()
RHS
: (character vector or a literal)
| rule_nbr|LHS |RHS | support| confidence| lift|
|--------:|:--------------------------------------------------------------------|:---------|-------:|----------:|--------:|
| 1|( island %in% c('Biscoe') ) & ( flipper_length_mm > 203 ) |Gentoo | 122| 1.0000000| 2.774193|
| 2|( island %in% c('Biscoe') ) & ( flipper_length_mm <= 203 ) |Adelie | 46| 0.9565217| 2.164760|
| 3|( island %in% c('Dream', 'Torgersen') ) & ( bill_length_mm > 44.1 ) |Chinstrap | 65| 0.9538462| 4.825339|
| 4|( island %in% c('Dream', 'Torgersen') ) & ( bill_length_mm <= 44.1 ) |Adelie | 111| 0.9459459| 2.140825|
A rulelist
can be created using tidy()
on some supported model fits
(run: utils::methods(tidy)
). It can also be created manually from a
existing dataframe using as_rulelist.
Columns identified as 'keys' along with rule_nbr
form a unique
combination
-- a group of rules. For example, rule-based C5 model with multiple trials
creates rules per each trial_nbr
. predict
method understands 'keys',
thereby provides/predicts a rule number (for each row in new data / test
data) within the same trial_nbr
.
A rulelist has these mandatory attributes:
estimation_type
: One among regression
, classification
A rulelist has these optional attributes:
keys
: (character vector)Names of the column that forms a key.
model_type
: (string) Name of the model
This helps a few methods like augment, calculate, prune, reorder require few additional attributes which can be set using set_validation_data.
Predict: Given a dataframe (possibly without a
dependent variable column aka 'test data'), predicts the first rule (as
ordered in the rulelist) per 'keys' that is applicable for each row. When
multiple = TRUE
, returns all rules applicable for a row (per key).
Augment: Outputs summary statistics per rule over validation data and returns a rulelist with a new dataframe-column.
Calculate: Computes metrics for a rulelist in a
cumulative manner such as cumulative_coverage
, cumulative_overlap
,
cumulative_accuracy
.
Prune: Suggests pruning a rulelist such that some expectation are met (based on metrics). Example: cumulative_coverage of 80% can be met with a first few rules.
Reorder: Reorders a rulelist in order to maximize a metric.
Rulelists are essentially dataframes. Hence, any dataframe operations which preferably preserve attributes will output a rulelist. as_rulelist and as.data.frame will help in moving back and forth between rulelist and dataframe worlds.
as_rulelist: Create a rulelist
from a
dataframe with some mandatory columns.
set_keys: Set or Unset 'keys' of a rulelist
.
to_sql_case: Outputs a SQL case statement for a rulelist
.
convert_rule_flavor: Converts R
-parsable rule strings to python/SQL
parsable rule strings.