All Collections
OCR API Platform
API
Confidence Score Explained
Confidence Score Explained

Be confident in the data extraction results

Helen Birulia avatar
Written by Helen Birulia
Updated over a week ago

Want to be confident in the data extraction results

Confidence score refers to the level of certainty or reliability associated with the extracted data. They provide an indication of how accurate the extracted information is likely to be. By leveraging confidence details provided by Veryfi, you can assess the accuracy of data extraction and make informed decisions about the reliability of the extracted information.

📍It is important to note that confidence details are not absolute measures of accuracy but serve as indicators or probabilities of reliability for important fields for your use case.

What is confidence score at Veryfi

confidence_details is a request parameter that if added will return add additional lines to an extracted value "ocr_score", "score"

  • "ocr_score" - a confidence OCR score, is a measure of how confident the Veryfi OCR system is in the accuracy of the recognized text. Each character recognized by the Veryfi OCR engine is assigned a confidence score, indicating the system's level of certainty regarding the accuracy of the recognition.

  • "score" - a confidence score, represents the confidence of mapping an extracted value to a particular field in JSON.

📍JSON response structure changes if you call confidence details. If your current implementation does not support confidence details you may need to make an adjustment before using that in production. Please refer to API Docs for more details.

How to add confidence details to your JSON response

All you need to do is to add confidence_details parameter to your request: "confidence_details" : true

API v8 is the current production version /api/v8/partner/documents

API v7 is the maintenance mode /api/v7/partner/documents

How to interpret the score

Let's take a look at total field

"total": {
"ocr_score": 1.0,
"score": 0.74,
"value": 147.38
},

"ocr_score" - The probability that value 147.38 is extracted correctly

"score" - The probability that value 147.38 is "total"

"value"- 147.38

The score ranges from 0 to 1, with higher scores indicating greater confidence in the accuracy of the recognized character.

"ocr_score": 1.0 > 1 = 100%

"score": 0.74 > 0.74 = 74%

💡 Pro Tip Flag documents for manual verification if the score is below 0.7


Assumptions

You may notice that for some fields we return the score for some it does not.

1. The system returns both "ocr_score" , "score", "value"

The data extracted from the document and can be found on the document image.

e.g: "date", "invoice_number", "total", etc.

"date": {
"ocr_score": 1.0,
"score": 0.95,
"value": "2021-09-01 00:00:00"
},

OR

"total": {
"ocr_score": 1.0,
"score": 0.74,
"value": 147.38
},

*Applicable for both v7 and v8

2. The system returns "score" only

Some fields have only "score" which either means that the value of this field was not extracted from the document but rather inferred from the whole document data.

e.g: "category", "currency_code", "vendor_type", etc.

"default_category": {
"score": 0.87,
"value": "Job Supplies"
},

OR

"document_type": {
"score": 0.85,
"value": "invoice"
},

*Applicable for v7 only

Exceptions for v7

The system returns an empty value and no score

(a) System doesn't return the score for this particular parameter because this field doesn't support confidence scores.

"barcodes": [],

*Please find the full list of supported parameters below.

(b) The system returns an empty value for the parameter that supports confidence scores. In case most likely the data for this parameter was not found in the document by the model.

"due_date": "",

Due_date is missing in the Invoice sample.

4. The system returns the value for the parameter that supports confidence scores but without the score.

"subtotal": 133.98

We may assume that in this case score for “subtotal” could not be returned due to poor image quality or it was not found. In this case, most likely "subtotal" was calculated on the post-processing level.

Exceptions for v8

3. The system returns an empty value and no score

(a) System doesn't return the score for this particular parameter because this field doesn't support confidence scores.

"barcodes": [],

*Please find the full list of supported parameters below.

(b) The system returns an empty value for the parameter that supports confidence scores. In case most likely the data for this parameter was not found in the document by the model.

"due_date": "",

Due_date is missing in the Invoice sample.

4. The system returns the value for the parameter that supports confidence scores but without the score.

"total": 133.98

For v8 system will return a value only if the value of this field was not extracted from the document, but was enriched/calculated on the post-processing level.

e.g: the system failed to extract "total" and "total" was calculated on the post-processing level from the document context.

Have any questions? Please contact us at support@veryfi.com.

Did this answer your question?