Skip to main content
All CollectionsOCR API Platform API
Blur detection and image quality
Blur detection and image quality

Field "is_blurry" and meta ocr score provides the insights of image quality

Updated over a week ago

Blur detection is a part of Veryfi Fraud Detection and Prevention Framework and an important signal Veryfi returns about the Document quality.

Given that the accuracy of document extraction heavily relies on image quality, it becomes crucial to have control over the image quality of the content provided by users, ensuring the reliability of extraction results.

Within Veryfi JSON response, the 'is_blurry' field plays a vital role in indicating the quality of an image, distinguishing between clear and blurry images. This field returns a boolean value (true or false) to assess the image's quality status.

It's important to note that the 'is_blurry' field is not activated by default for user accounts. If you wish to include this field in your JSON response, we kindly request you get in touch with our support team. Once enabled for your account, you can immediately expect this field to be available in your JSON response, granting you more control over image quality assessment."

πŸ₯€ Why should I pay attention to image quality and blur detection?

  1. Image Quality Matters: The quality of the receipt/invoice images directly affects the accuracy of data extraction. Blurriness makes it challenging to recognize and extract text accurately. Distorted characters or blurred shapes can lead to OCR errors or misinterpretation of the text, affecting the integrity of the extracted data.

  2. Enhance User Experience: is_blurry returned with true in your JSON response helps you sort & flag potential documents with possible poor data extraction results. If Veryfi powers your Expense Management/CPG loyalty or else product, by incorporating is_blurry field you can choose whether to pass this submission to your product or give end-users a friendly warning that the extraction results might need be verified/reviewed.

    As your trusted partner Veryfi guarantees that is_blurry can help you to enable smooth automation, improve the data accuracy you pass to your users, and ultimately enhance the overall experience by managing the expectations for the data extraction results.

While the is_blurry flag is something we return after processing the submission, you might be interested in preventing your users from submitting blurry or low-quality images using Veryfi Lens for mobile.

How to Interpret & Assumptions

When you submit a single document for processing, the response will contain a list consisting of one flag.

However, if you submit multiple receipts' URLs, multi-page documents, or a zip file within the same request, the response will include a list of multiple flags. Each flag corresponds to a specific page, allowing you to assess the status individually for each page in the response.

Responses example:
​For one image:

"is_blurry": [false]

Meaning that the we think that this image is blurry


​For a zip that has 3 images:

"is_blurry": [

true,

true,

false

]

Meaning that first two pages are Blurry and the 3rd one is OK


Beta feature: meta ocr_score and image quality score

meta.ocr_score - for api/v8/partner/documents is a default field in meta-object.
API Docs:

There are at least ~6 indirect causes of poor data extraction related to image quality (blur, bleed-through, crumples, wrinkled, skew etc..). meta.ocr_score can serve as a signal for image quality trust score, though with some important considerations:

  1. High OCR Score

    • Clear, legible text

    • Good image resolution

    • Minimal noise or distortion

    • Proper document orientation

    • Good contrast between text and background
      ​​

  2. Low OCR Score (<0.92) might indicate:

    • Blurry images

    • Low-resolution scans

    • Poor lighting/contrast

    • Document skew or warping

    • Potential tampering or manipulation

    • Poor-quality scans/photos
      ​

What is behind meta.ocr_score?

This is a composite score that combines two aspects of OCR text quality:

  1. First component - average (ocr_score of all extracted fields):

  • This looks at the ocr_score specifically for the text in extracted fields

  • These are fields that have been identified as containing specific information. For example, fields like "Invoice Number", "Date", "Total Amount", etc.

  1. Second component - average (ocr_score of all OCR text):

  • This considers the ocr_score for ALL text detected in the document

  • Includes both extracted fields and any other text

  • Gives a general measure of overall recognized text recognition
    ​

The final ocr_score is a number with Possible values: <= 1


Build your own document trust score logic

  • Confidence Details

    a) ocr_score (per field) which is part of the confidence details for important fields. If the ocr_score is lower than 0.8, it's a signal that the extraction results may not be as stable. This lower score indicates that you should carefully review and validate the extracted data for accuracy.

    b) score"score" (per field) a confidence score, represents the confidence of mapping an extracted value to a particular field in JSON.
    ​

  • Image Size and Resolution

    Consider the image size in terms of width and height. If either the width or height is equal to or less than 500 pixels, it's a factor to take into account. meta.pages_height and meta.pages_width

πŸ“Œ Read More about Confidence Details
Refer to API Docs for JSON structure and schema

Have questions? Please contact support@veryfi.com.

Did this answer your question?