Skip to content

[Bug] Egyptian ID National Number (14 digits) dropped and misread by EasyOCR #1453

@mohieyelkiouty

Description

@mohieyelkiouty

Description:

I am using EasyOCR to extract data from Egyptian ID cards. While names and addresses are mostly fine, the 14-digit National ID number is consistently failing.

The Problem:

  • Digit Dropout: The OCR typically returns only 10 or 11 digits out of 14.
  • Misinterpretation: Zeros (0) are often skipped or seen as whitespace. Digits like 1 and 8 are frequently swapped.
  • Fragmentation: The gaps between number groups in the ID layout cause the reader to break the ID into multiple parts.

Pre-processing Steps Taken:

# 1. Image Enhancement
lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
l, a, b = cv2.split(lab)
l_sharpened = cv2.addWeighted(l, 1.5, cv2.GaussianBlur(l, (9,9), 10.0), -0.5, 0)
enhanced = cv2.detailEnhance(cv2.merge((l_sharpened, a, b)), sigma_s=10, sigma_r=0.15)

# 2. Binarization
_, thresh = cv2.threshold(cv2.bitwise_not(cv2.cvtColor(enhanced, cv2.COLOR_BGR2GRAY)), 135, 255, cv2.THRESH_BINARY)
cleaned = cv2.bitwise_not(cv2.morphologyEx(thresh, cv2.MORPH_OPEN, np.ones((2,2))))

OCR Attempt:

# Tested various combinations of these
reader.readtext(crop, allowlist='0123456789٠١٢٣٤٥٦٧٨٩', paragraph=False, mag_filter=True)

Expected Result: 28610271500813 (14 digits)
Actual Result: 18621527138 (11 digits, misread and missing zeros)

How can I force EasyOCR to recognize the full sequence including the zeros and small gaps?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions