Data Mining - Rule Based Classification
IF-THEN Rules
Rule-based classifier makes use of a set of IF-THEN rules for classification. We can express a rule in
the following from –
Let us consider a rule R1,
R1: IF age = youth AND student = yes THEN buy_computer = yes
Points to remember –
The IF part of the rule is called rule antecedent or precondition.
The THEN part of the rule is called rule consequent.
The antecedent part the condition consist of one or more attribute tests and these
tests are logically ANDed.
The consequent part consists of class prediction.
Note − We can also write rule R1 as follows –
R1: (age = youth) ^ (student = yes))(buys computer = yes)
If the condition holds true for a given tuple, then the antecedent is satisfied.
Rule Extraction
Here we will learn how to build a rule-based classifier by extracting IF-THEN rules from a
decision tree.
To extract a rule from a decision tree −
1. One rule is created for each path from the root to the leaf node.
2. To form a rule antecedent, each splitting criterion is logically ANDed.
3. The leaf node holds the class prediction, forming the rule consequent.
Advantages of Rule Based Data Mining Classifiers
Highly expressive.
Easy to interpret.
Easy to generate.
Capability to classify new records rapidly.
Performance is comparable to other classifiers.
Better suited for handling imbalanced classes
Disadvantages
The rule-based classifier has the following disadvantages
1. Harder to handle missing values in the test set
2. If the ruleset is large then it is complex to apply the rule for classification.
3. For large training set the large number of rules generated requires a large amount of memory.
4. During rule generation extra computation is needed to simplify and purn the rules.
Applications of Rule-Based Classification:
Credit Scoring: Assessing creditworthiness by analyzing factors like income, credit history, and
debt-to-income ratio.
Predictive Maintenance: Predicting when maintenance is needed before equipment breaks
down.
Spam Filtering: Identifying and blocking unwanted spam messages based on keywords or
sender information.
Quality Control: Analyzing product quality data to identify defects and improve production
processes.
Medical Diagnosis: Assisting in diagnosing diseases based on symptoms and test results.
Fraud Detection: Identifying fraudulent transactions or activities.
Algo. For Rule Based classifier
Step 1: For each attribute A in the list of attributes:
i) For each possible attribute value V in Attribute_Values[A]
Step 2 : Create a rule R with the condition "IF A = V." -
Step 3: Calculate the accuracy of the rule R .
Step 4: Select the rule R with the highest accuracy.
Step 5: Add R to the set of Rules.
Step 6: Stop