બધા વિષયો
TechnologyMar 28, 2026· 9 min read

Decision Tree અને Random Forest: "20 Questions" ગેમ જ એક ML આલ્ગોરિધમ છે!

Decision Tree અને Random Forest શું છે? "20 Questions Game" ના ઉદાહરણથી Gini Impurity, Bootstrap Sampling, અને Feature Importance સમજો. Loan Approval, Cancer Detection, અને Fraud Detection માં તેનો ઉપયોગ — Python કોડ સાથેની સરળ ગુજરાતી માર્ગદર્શિકા.

Decision Tree & Random Forest: "20 Questions" Game જ ML Algorithm છે!

નાનપણ માં "20 Questions" Game રમ્યા?

"Animal છે?" — હા "4 પગ છે?" — હા "ઘરમાં રાખાય?" — હા "Bark કરે?" — હા "Dog!" 🐕

Congratulations — તમે Decision Tree Run કર્યો!

ML ની દુનિયામાં Decision Tree ઠીક આ Game ની Logic ઉ કામ કરે — Questions → Answers → Final Decision.


Decision Tree — "Question Tree"

Decision Tree = Data ઉ Based Questions પૂછો → Branch → Final Answer.

Structure:

                [વ્યક્તિ Loan લેશે?]
                       |
            ________________________
           |                        |
     [Income > 50k?]          [Income ≤ 50k?]
           |                        |
      _____|_____              _____|_____
     |           |            |           |
[Credit    [Credit        [Job Stable?] [Reject]
Score>700] Score≤700]         |
     |           |         _____|_____
  [Approve]  [Reject]    |           |
                       [Approve]  [Reject]

3 Parts:

  • Root Node — પ્રથમ Question (સૌથી Important Feature)
  • Branch — Answer ની Direction (Yes/No, >/<)
  • Leaf Node — Final Decision (Approve/Reject, Cat/Dog)

Real Example — Titanic Survival

[Male કે Female?]
      |
   ___________
  |           |
[Male]     [Female]
  |           |
[Age > 9?] [Survive ✅]
  |
________
|       |
[Yes] [No ≤ 9]
  |       |
[Siblings [Survive ✅]
 > 2?]
  |
______
|     |
[No] [Yes]
 |     |
[✅] [❌]

ફક્ત 3 Questions — Titanic Survival Predict!


Decision Tree ક્યારે Split કરે?

Model "Best Question" ક્યારે નક્કી? — Information Gain / Gini Impurity.

Gini Impurity (સરળ):

  • Pure Node = ફક્ત એક Class = Gini = 0 (Best)
  • Mixed Node = 50-50 Mix = Gini = 0.5 (Worst)

Model એ Feature Select કરે — Split પછી સૌથી Pure Nodes.

💡 Analogy: Library માં Books Sort — Color? Author? Genre? — Genre Sort = સૌથી Useful Split — Reader ઝટ Book શોધે. Decision Tree Same Logic.


Decision Tree — Python Code

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Data Load
iris = load_iris()
X, y = iris.data, iris.target

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Model Train
model = DecisionTreeClassifier(max_depth=3)
model.fit(X_train, y_train)

# Predict & Evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
# Accuracy: 0.9667

Decision Tree ની Problems

Problem 1 — Overfitting:

Deep Tree = Training Data ના બધા Exception Memorize:

# ઘણો Deep = Overfit
model = DecisionTreeClassifier(max_depth=None)  # ❌

# Better — Limit Depth
model = DecisionTreeClassifier(max_depth=5)     # ✅

Problem 2 — Unstable:

Data માં થોડો Change → Completely Different Tree!

Original Data:  Income > 50k → Root Node
+1 Data Point:  Age > 30    → Root Node Changed!

Problem 3 — Biased:

ઘણા Values ધરાવતા Feature (Age: 1-100) ← Simple Feature (Gender: M/F) — Unfair Advantage.


Random Forest — "ઘણા Trees, એક Decision"

Decision Tree ની Problems નો Solution = Random Forest.

Random Forest = ઘણા Decision Trees ભેગા → Majority Vote → Final Answer.

🌳🌳🌳🌳🌳 → "Loan Approve?" → 4 Trees: Yes, 3 Trees: No → Majority: Yes ✅


Random Forest ક્ કઈ રીતે બને?

Step 1 — Bootstrap Sampling (Bagging): 1000 Rows Data → Random 800 Rows Select → Tree 1 1000 Rows Data → Random 800 Rows Select → Tree 2 ... 1000 Rows Data → Random 800 Rows Select → Tree 100

Step 2 — Random Feature Selection: 10 Features Total → દરેક Split ઉ Random 3 Features Consider → Different Trees, Different Perspective.

Step 3 — Voting:

Tree 1:  Spam ✅
Tree 2:  Not Spam ❌
Tree 3:  Spam ✅
Tree 4:  Spam ✅
Tree 5:  Not Spam ❌
──────────────────
Final:   Spam ✅ (3 vs 2 — Majority)

💡 Analogy — Doctor's Second Opinion: 1 Doctor ની Opinion = Decision Tree 5 Doctors ની Opinion → Majority = Random Forest ઘણા Expert = Better Decision!


Random Forest — Python Code

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Random Forest — 100 Trees
rf_model = RandomForestClassifier(
    n_estimators=100,    # Trees ની સંખ્યા
    max_depth=5,         # Tree Depth Limit
    random_state=42
)
rf_model.fit(X_train, y_train)

y_pred = rf_model.predict(X_test)
print(classification_report(y_test, y_pred))

Output:

              precision  recall  f1-score
    setosa         1.00    1.00      1.00
versicolor         1.00    1.00      1.00
 virginica         1.00    1.00      1.00
  accuracy                           1.00

Feature Importance — "ક્યો Feature Important?"

Random Forest ની Bonus Power — Feature Importance — ક્યો Column Model Decision ઉ સૌથી Influence?

import pandas as pd
import matplotlib.pyplot as plt

# Feature Importance
importances = rf_model.feature_importances_
feature_names = iris.feature_names

fi_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importances
}).sort_values('Importance', ascending=False)

print(fi_df)
#              Feature  Importance
# petal length (cm)      0.4423
# petal width  (cm)      0.4187
# sepal length (cm)      0.0954
# sepal width  (cm)      0.0436

💡 Business Use: Customer Churn Predict — "Price, Service, Competitor" — ક્યો Factor સૌથી Important? — Random Forest Feature Importance ઝટ Answer!


Decision Tree vs Random Forest

Decision Tree Random Forest
Trees 1 100+
Overfitting ઘણો Risk ઓછો (Averaging)
Accuracy Moderate High
Speed ઝડપ ધીમો (ઘણા Trees)
Interpretable ✅ Visual ❌ Black Box
Feature Importance Basic ✅ Reliable
Small Data Moderate
Large Data ❌ Overfit

ક્યારે ક્યો Algorithm?

Situation Use
Explain Decision જોઈ (Bank, Medical) Decision Tree
Maximum Accuracy Random Forest
Fast Training Decision Tree
Feature Importance Random Forest
Visual / Presentation Decision Tree
Production Model Random Forest

Real World Applications

🏦 Banking — Loan Approval: Features: Income, Credit Score, Age, Job Stability Random Forest → Approve/Reject + Why (Feature Importance)

🏥 Medical — Disease Prediction: Features: Symptoms, Age, Test Results Decision Tree → Doctor ને Explainable Logic

📧 Spam Detection: Features: Words, Links, Sender Random Forest → High Accuracy Spam Filter

🛒 E-commerce — Customer Churn: Features: Purchase History, Activity, Complaints Random Forest → "આ Customer જવાનો — Offer આપો!"

💳 Fraud Detection: Features: Amount, Location, Time, Merchant Random Forest → Real-time Transaction Fraud Alert


નિષ્કર્ષ

🌱 Decision Tree = 1 Tree, Simple, Explainable, Overfit Risk 🌲🌲🌲 Random Forest = ઘણા Trees, Accurate, Robust, Black Box

Simple Problem + Explain જોઈ  →  Decision Tree
Complex Problem + Accuracy   →  Random Forest

"20 Questions Game" → Decision Tree → Random Forest — ML ના સૌ પ્રથમ Powerful Algorithms — Zero Math, Pure Logic!" 🎯

આ પણ વાંચો (Related Articles)

પ્રતિભાવ આપો