π©π½βπ« Have you ever watched background noise and bleedthrough completely derail your data extraction results?
Picture this: you're trying to extract text from scanned documents, but shadows and numbers, letters, and words from the page behind are bleeding through, random specks are scattered across the image, and that Safeway ad in the background is now messing up the numbers on the front.
What should have been clean, readable text turns into a garbled mess of misidentified characters and phantom words.
Like that:
If you've worked with document digitization, you know this frustration all too well. Background noise, bleedthrough, uneven lighting, and image artifacts don't just make documents look messyβthey wreak havoc on Data Extraction accuracy.
Your carefully assumed accuracy and user satisfaction numbers are going down by nonsensical extraction results, turning "Invoice #12345" into "1nv0ice #l2B4S" and making your extracted data practically unusable.
β
But what if there was a way to clean up these images before they ever reach our data extraction engine? What if Veryfi can automatically remove noise, eliminate bleedthrough, and enhance text clarity in one process?
β
We've solved this problem. Veryfi image cleanup algorithm handles comprehensive image preprocessing and cleanup, transforming noisy, problematic scans into crisp!
What is bleedthrough and why is it a problem?
Bleedthrough occurs when text or images from the reverse side of a document become visible on the front side. This creates visual noise that can significantly interfere with OCR text recognition, hence reducing the accuracy of data extraction.
Common issues caused by bleedthrough include:
Misplaced Information: OCR may confuse bleed-through text with legitimate document content, leading to incorrect data placement or extraction
Loss of Context: Overlapping text can obscure important details like transaction dates, vendor names, or item descriptions
False Positives and Negatives: Bleedthrough can cause OCR to incorrectly identify information that isn't actually present, or miss legitimate information that's obscured. Imagine a bottle of water product line price becomes "35" vs "3" simply because of a random number of the receipt's ad from the back leaking to the front.
π As noted in Accuracy factors documentation, document quality significantly impacts extraction results.
Other document quality issues & challenges:
- Image Dewtrapping and Line items corrections
- Blur detection
What is the bleedthrough text removal feature?
The bleedthrough text removal feature is an enhancement to Veryfi's image processing and data extraction capabilities that identifies and eliminates text that has "bled through" or "noise" that leaks to the front side of documents, usually receipts. Veryfi's smart image processing and cleaning technology helps improve the accuracy of data extraction by reducing visual noise that can interfere with extracted text.
How does the bleedthrough removal feature work?
Detect text patterns consistent with bleedthrough
Apply targeted noise reduction algorithms to the affected areas on the image
Preserve the integrity of the legitimate text on the document
Improve overall OCR text accuracy by reducing confusion from overlapping text
Let's see it in action:
β
Before
After
Which Data Extraction APIs support this feature?
The bleedthrough text removal feature is currently available for:
+
βπΌ Bleed-through detection and removal is in Beta phase; we will be happy to enable that for your account. Just reach out to [email protected].