Structure

A rulelist is ordered list of rules stored as a dataframe. Each row, specifies a rule (LHS), expected outcome (RHS) and some other details.

It has these mandatory columns:

  • rule_nbr: (integer vector) Rule number

  • LHS: (character vector) A rule is a string that can be parsed using base::parse()

  • RHS: (character vector or a literal)

Example

| rule_nbr|LHS                                                                  |RHS       | support| confidence|     lift|
|--------:|:--------------------------------------------------------------------|:---------|-------:|----------:|--------:|
|        1|( island %in% c('Biscoe') ) & ( flipper_length_mm > 203 )            |Gentoo    |     122|  1.0000000| 2.774193|
|        2|( island %in% c('Biscoe') ) & ( flipper_length_mm <= 203 )           |Adelie    |      46|  0.9565217| 2.164760|
|        3|( island %in% c('Dream', 'Torgersen') ) & ( bill_length_mm > 44.1 )  |Chinstrap |      65|  0.9538462| 4.825339|
|        4|( island %in% c('Dream', 'Torgersen') ) & ( bill_length_mm <= 44.1 ) |Adelie    |     111|  0.9459459| 2.140825|

Create a rulelist

A rulelist can be created using tidy() on some supported model fits (run: utils::methods(tidy)). It can also be created manually from a existing dataframe using as_rulelist.

Keys and attributes

Columns identified as 'keys' along with rule_nbr form a unique combination -- a group of rules. For example, rule-based C5 model with multiple trials creates rules per each trial_nbr. predict method understands 'keys', thereby provides/predicts a rule number (for each row in new data / test data) within the same trial_nbr.

A rulelist has these mandatory attributes:

  • estimation_type: One among regression, classification

    A rulelist has these optional attributes:

  • keys: (character vector)Names of the column that forms a key.

  • model_type: (string) Name of the model

Set Validation data

This helps a few methods like augment, calculate, prune, reorder require few additional attributes which can be set using set_validation_data.

Methods for rulelist

  1. Predict: Given a dataframe (possibly without a dependent variable column aka 'test data'), predicts the first rule (as ordered in the rulelist) per 'keys' that is applicable for each row. When multiple = TRUE, returns all rules applicable for a row (per key).

  2. Augment: Outputs summary statistics per rule over validation data and returns a rulelist with a new dataframe-column.

  3. Calculate: Computes metrics for a rulelist in a cumulative manner such as cumulative_coverage, cumulative_overlap, cumulative_accuracy.

  4. Prune: Suggests pruning a rulelist such that some expectation are met (based on metrics). Example: cumulative_coverage of 80% can be met with a first few rules.

  5. Reorder: Reorders a rulelist in order to maximize a metric.

Manipulating a rulelist

Rulelists are essentially dataframes. Hence, any dataframe operations which preferably preserve attributes will output a rulelist. as_rulelist and as.data.frame will help in moving back and forth between rulelist and dataframe worlds.

Utilities for a rulelist

  1. as_rulelist: Create a rulelist from a dataframe with some mandatory columns.

  2. set_keys: Set or Unset 'keys' of a rulelist.

  3. to_sql_case: Outputs a SQL case statement for a rulelist.

  4. convert_rule_flavor: Converts R-parsable rule strings to python/SQL parsable rule strings.

See also