Skip to main content

Confidence Score Explained

Be confident about data extraction results

Updated over a week ago

What is a confidence score?

Confidence score refers to the level of certainty or reliability associated with an extracted value. Scores provide an indication of how accurate the extracted information is likely to be. By leveraging confidence details provided by Veryfi, you can assess the data extraction prediction and make informed decisions on how to handle it.

📍It is important to note that confidence details are not absolute measures of accuracy but serve as indicators or probabilities of reliability for relevant fields.

Confidence details are supported on Veryfi OCR APIs

Receipts / Invoices OCR API

API Docs https://api.veryfi.com/api/v8/partner/documents

W-9 Forms OCR API
API Docs https://api.veryfi.com/api/v8/partner/w9s

Bank Statements OCR API

API Docs https://api.veryfi.com/api/v8/partner/bank-statements

Bank Checks OCR API

API Docs https://api.veryfi.com/api/v8/partner/checks



​What confidence details does Veryfi return?

confidence_details is a POST request parameter. By default, it is set to False; If you set it to True, the API response will return additional lines for extracted values: "ocr_score", "score", and "enriched". You can also specify this parameter in your GET request; if you leave it empty, the default will be kept from the original POST request.

  • "ocr_score" - a confidence OCR score, is a measure of how confident the Veryfi OCR system is in the correctness of the recognized text. Each character identified by the Veryfi OCR engine is assigned a confidence score, indicating the system's overall level of certainty regarding the accuracy of the recognition.

  • "score" - a confidence score, represents the confidence of mapping an extracted value to a particular JSON field.

  • "enriched" - qual to true if the value was enriched (using other fields or data from other documents). It is absent otherwise.

📍JSON response structure changes if you enable confidence details. If your current implementation does not support confidence details, you may need to adjust it to use them in production. Please refer to API Docs for more details.

How to interpret the score

Let's take a look at the price field in the Receipts/Invoices API

first line item:

"price": {
"enriched": true,
"ocr_score": 0.96,
"score": 0.99,
"value": 4.97
}

Second line item:

"price": {
"ocr_score": 0.96,
"rotation": 0,
"score": 0.99,
"value": 4.97
}

Here, only the second line item has explicit price information, but our smart post-processing comprehends this structure and uses the information to enrich the price of the first line item

ocr_score - The probability that the value 4.97 is recognized correctly from the image/document, and not 9.97 or 4.87, for example

score - The probability that the value 4.97 corresponds to the line_items.total field, and not subtotal or tax, for example

value - 4.97

enriched - The value was not captured directly, but calculated or enriched in post-processing

Or date field in Bank Checks API

"date": {
"ocr_score": 0.99,
"score": 0.97,
"value": "2024-12-18"
}

The score ranges from 0 to 1, with higher scores indicating greater confidence in the accuracy of the recognized character.

ocr_score: 1.0 -> 1 = 100%

score: 0.74 -> 0.74 = 74%

💡 Pro Tip: Use scores to build Business Validation Logic for data handling. You can use either Veryfi Business Rules or Any Rules Engine you have in-house. The minimum recommended threshold is between 0.7 and 0.9, depending on the strictness of your use case, but don't rely on it blindly; different use cases can have different thresholds depending on data quality and the endpoint you use. We recommend making informed decisions backed up by data.

Things to note

When working with Veryfi APIs, you may observe different patterns in how confidence scores are returned. The system provides different types of confidence scores depending on how the data was obtained. It is important to understand these variations and take them into account during implementation.

1. Fields Directly Extracted from Document / Image

When information is explicitly visible in the document itself, the system returns both OCR and mapping confidence scores.

description": {
"ocr_score":0.96
"score":1
"value":"S CAT CHOW COMPLETE"
}

2. Fields Inferred from Context & Post-Processed Values

For fields derived through inference, rather than direct extraction, only the "score" value appears. This occurs with fields like categories, currency codes, and document types that are determined through contextual analysis and a smart post-processing logic.

"currency_code": {
"score":0.97
"value":"USD"
}

3. Unsupported or Empty Fields

No confidence scores appear in two scenarios:

  • When a field type doesn't support confidence scoring:

    "barcodes": [
    {
    "data": "99902081500020139414",
    "type": "CODE93"
    }
    ]
  • When a supported field isn't found in the document:

    "due_date": ""

3. line_items and tax_lines

line_items and tax_lines are always returned without scores; for ease of use, all scores from these objects are returned in a separate object, which is only returned when confidence_details are on: line_items_with_scores for line items, and tax_lines_with_scores for tax lines.

See the following sample:

Here, line_items.total contains only the response value. While line_items_with_scores.total includes bounding_box, bounding_region, ocr_score, score, and value.

Have any questions? Please contact us at [email protected].

Did this answer your question?