In this article
Model Evaluation: Accuracy, Precision, Recall, F1 — Model સારો છે કે ખોટો?
ML Model Train થઈ ગઈ. હવે?
"Model સારી છે" — કઈ રીતે ખ્યાલ આવ્યો? "95% Accuracy" — Enough?
ઘણી વાર Accuracy Mislead કરે. એક ઉદાહરણ:
🏥 Cancer Detection Model — 100 Patients:
- 95 Healthy, 5 Cancer
- Model Predict: બધા Healthy
- Accuracy: 95% — "Wow!"
- પણ 5 Cancer Patients Miss — Model Useless!
Accuracy High = Model Good — આ Assumption ખોટી. સાચું Evaluation જોઈએ — Precision, Recall, F1.
Confusion Matrix — Evaluation ની શરૂઆત
Confusion Matrix = Model ની Prediction ની Report Card — 4 Boxes:
ACTUAL
Positive Negative
PREDICTED Positive | TP | FP |
Negative | FN | TN |
4 Terms:
| Term | Full Form | અર્થ | ઉદાહરણ |
|---|---|---|---|
| TP | True Positive | સાચું Positive | Cancer છે, Model કહ્યું Cancer ✅ |
| TN | True Negative | સાચું Negative | Healthy છે, Model કહ્યું Healthy ✅ |
| FP | False Positive | ખોટું Positive | Healthy છે, Model કહ્યું Cancer ❌ |
| FN | False Negative | ખોટું Negative | Cancer છે, Model કહ્યું Healthy ❌ |
💡 સરળ Rule: પહેલો શબ્દ = Prediction સાચી (True) કે ખોટી (False). બીજો શબ્દ = Model શું Predict કર્યું (Positive/Negative).
Accuracy — "Overall Score"
Accuracy = (TP + TN) / (TP + TN + FP + FN)
ઉદાહરણ — Email Spam Filter (100 Emails):
- 90 Normal, 10 Spam
- Model: 88 Normal સાચા (TN), 8 Spam સાચા (TP), 2 Normal ને Spam (FP), 2 Spam Miss (FN)
Accuracy = (88 + 8) / 100 = 96%
Accuracy ક્યારે Enough?
- Classes Balanced — Spam 50%, Normal 50%
- FP અને FN ની Cost સરખી
Accuracy ક્યારે Mislead?
- Imbalanced Data — 95% Healthy, 5% Cancer
- FP/FN ની Cost અલગ
Precision — "Positive Predict કર્યા, ક્યા સાચા?"
Precision = TP / (TP + FP)
અર્થ: Model "Positive" Predict કરે ત્યારે — ક્યા % ખરેખર Positive?
ઉદાહરણ — Spam Filter: Model 15 Emails ને Spam Predict:
- 12 ખરેખર Spam (TP)
- 3 Normal Emails ખોટા Spam (FP)
Precision = 12 / (12 + 3) = 12/15 = 80%
Model Spam Predict કરે ત્યારે — 80% વખત સાચો.
💡 Precision Important ક્યારે? — FP (False Alarm) Costly હોય.
- Spam Filter — Normal Email Delete = Problem
- Court System — Innocent ને Guilty = Problem
Recall — "Actual Positive, ક્યા Catch થ્યા?"
Recall = TP / (TP + FN)
અર્થ: ખરેખર Positive Cases માંથી — Model ક્યા % Catch કર્યા?
ઉદાહરણ — Cancer Detection: 100 Patients, 10 ને Cancer:
- Model 8 Cancer Detect (TP)
- 2 Cancer Miss (FN)
Recall = 8 / (8 + 2) = 8/10 = 80%
Model 80% Cancer Patients Detect — 20% Miss.
💡 Recall Important ક્યારે? — FN (Miss) Costly હોય.
- Cancer Detection — Miss = Life Threatening
- Fraud Detection — Miss = Money Loss
- COVID Test — Miss = Spread
Precision vs Recall — Tradeoff
Precision ↑ = Recall ↓ — ઘણી વખત Trade-off!
ઉદાહરણ — Spam Filter:
High Precision (Conservative Model):
- ફક્ત 100% Sure Spam Flag → FP ઓછા → Precision High
- ઘણા Spam Miss → FN વધે → Recall ઓછો
High Recall (Aggressive Model):
- ઘણા Emails Spam Flag → Spam Miss ઓછા → Recall High
- Normal Emails પણ Flag → FP વધે → Precision ઓછો
🎯 Analogy — Net Casting:
- Wide Net (High Recall): ઘણી માછલી પકડાઈ — ઘણો Garbage પણ
- Small Net (High Precision): ઓછો Garbage — ઘણી માછલી Escape
- Smart Net (F1 Balance): સારી માછલી ઝડપ, Garbage ઓછો
F1 Score — Precision + Recall Balance
F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
F1 = Precision અને Recall ની Harmonic Mean — Balance Measure.
ઉદાહરણ:
Precision = 80%, Recall = 80%
F1 = 2 × (0.80 × 0.80) / (0.80 + 0.80) = 0.80 = 80%
Precision = 90%, Recall = 50%
F1 = 2 × (0.90 × 0.50) / (0.90 + 0.50) = 0.64 = 64%
F1 ક્યારે ઉપયોગ?
- Imbalanced Classes
- Precision + Recall બંને Important
- Single Score Comparison
ચારે Metrics — ક્યારે ક્યો?
| Metric | ઉપયોગ ક્યારે? | ઉદાહરણ |
|---|---|---|
| Accuracy | Balanced Classes, FP=FN Cost | General Classification |
| Precision | FP Costly | Spam Filter, Legal |
| Recall | FN Costly | Cancer, Fraud, COVID |
| F1 Score | Imbalanced + Both Matter | Medical, NLP, Most ML |
Python Code — sklearn ઉ Evaluation
from sklearn.metrics import (accuracy_score, precision_score,
recall_score, f1_score,
confusion_matrix, classification_report)
# Actual vs Predicted
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
# Individual Metrics
print("Accuracy: ", accuracy_score(y_true, y_pred))
print("Precision:", precision_score(y_true, y_pred))
print("Recall: ", recall_score(y_true, y_pred))
print("F1 Score: ", f1_score(y_true, y_pred))
# Confusion Matrix
print("\nConfusion Matrix:")
print(confusion_matrix(y_true, y_pred))
# Full Report (Best!)
print("\nClassification Report:")
print(classification_report(y_true, y_pred))
Output:
Accuracy: 0.80
Precision: 0.80
Recall: 0.80
F1 Score: 0.80
Confusion Matrix:
[[3 1]
[1 4]]
Classification Report:
precision recall f1-score support
0 0.75 0.75 0.75 4
1 0.80 0.80 0.80 5
accuracy 0.80 10
💡
classification_report()— One Line, Full Picture — Production Model Evaluate ઉ આ ఒ Use!
Real World Scenarios
Scenario 1 — COVID Test:
- FN = COVID Positive, Test Negative = Dangerous — Spread!
- Recall Maximum — Miss ન કરો
Scenario 2 — YouTube Recommendation:
- FP = Wrong Video Recommend = User Annoyed
- FN = Good Video Miss = Acceptable
- Precision Important
Scenario 3 — Bank Loan Approval:
- FP = Bad Customer ને Loan = NPA (Loss)
- FN = Good Customer Reject = Business Miss
- Balance — F1 Score
Scenario 4 — Fire Alarm:
- FN = Fire છે, Alarm ન વાગ્યો = Disaster!
- FP = Fire નથી, Alarm વાગ્યો = Inconvenience
- Recall Maximum
AUC-ROC — Advanced Evaluation (Bonus)
ROC Curve = Different Threshold ઉ Precision-Recall Plot.
AUC (Area Under Curve):
- AUC = 1.0 → Perfect Model
- AUC = 0.5 → Random Guess (Useless)
- AUC = 0.85+ → Good Model
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_true, y_prob)
print(f"AUC-ROC: {auc:.2f}")
💡 AUC-ROC — Threshold ઉ Depend ન — Overall Model Power Measure.
Summary — 1 Minute Cheat Sheet
Accuracy = Overall Correct / Total
Precision = TP / (TP + FP) → FP ઓછા જોઈએ
Recall = TP / (TP + FN) → FN ઓછા જોઈએ
F1 Score = Balance of P & R → Both Important
Imbalanced Data? → F1 / Recall
FP Costly? → Precision
FN Costly? → Recall
Balanced Data? → Accuracy OK
નિષ્કર્ષ
Accuracy Alone = Incomplete Picture. Precision + Recall + F1 = Full Story.
Cancer Detection — Recall. Spam Filter — Precision. Most Cases — F1. Right Metric Choose = Right Decision.
ML Model ની "Report Card" = Confusion Matrix + Classification Report — sklearn ની 2 Lines, Full Evaluation! 🎯