-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Description
Description:
I am using EasyOCR to extract data from Egyptian ID cards. While names and addresses are mostly fine, the 14-digit National ID number is consistently failing.
The Problem:
- Digit Dropout: The OCR typically returns only 10 or 11 digits out of 14.
- Misinterpretation: Zeros
(0)are often skipped or seen as whitespace. Digits like1and8are frequently swapped. - Fragmentation: The gaps between number groups in the ID layout cause the reader to break the ID into multiple parts.
Pre-processing Steps Taken:
# 1. Image Enhancement
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
l, a, b = cv2.split(lab)
l_sharpened = cv2.addWeighted(l, 1.5, cv2.GaussianBlur(l, (9,9), 10.0), -0.5, 0)
enhanced = cv2.detailEnhance(cv2.merge((l_sharpened, a, b)), sigma_s=10, sigma_r=0.15)
# 2. Binarization
_, thresh = cv2.threshold(cv2.bitwise_not(cv2.cvtColor(enhanced, cv2.COLOR_BGR2GRAY)), 135, 255, cv2.THRESH_BINARY)
cleaned = cv2.bitwise_not(cv2.morphologyEx(thresh, cv2.MORPH_OPEN, np.ones((2,2))))OCR Attempt:
# Tested various combinations of these
reader.readtext(crop, allowlist='0123456789٠١٢٣٤٥٦٧٨٩', paragraph=False, mag_filter=True)Expected Result: 28610271500813 (14 digits)
Actual Result: 18621527138 (11 digits, misread and missing zeros)
How can I force EasyOCR to recognize the full sequence including the zeros and small gaps?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels