What is Accuracy Report?

Accuracy reports help you measure how well Veryfi models are performing in extracting information from your documents. You can create and view different reports to track performance across various initiatives, vendors, or fields in the "Analytics" section model over model.

The goal of this article is to provide a detailed explanation of the inner workings of the accuracy reports tool, not to elaborate on its goal or how to create reports and use the tool more generally.

👨🏻‍🔧 For instructions on how to use the tool and create reports, please see this Article and the following video YouTube.

What is Ground Truth and why is it important?

Ground truth is the correct, manually reviewed data that Veryfi uses as a benchmark to measure accuracy. It is necessary to determine what the correct output is in order to measure accuracy, so a set of documents that has been manually reviewed for the fields of interest is what we call the “Ground truth”.

Think of it as your "gold standard" data:

It's created by human reviewers who carefully check and verify the correct values
You can tag these documents (data set) for easy filtering and navigation in future
Only evaluate fields and documents you've verified.

Why it is important to evaluate fields and documents you have verified?

Let's say you have an invoice where you've carefully checked and confirmed that the vendor name is "Acme Corp." However, you haven't yet reviewed whether the line items on that invoice are correct.

In this case:

✅ DO: Use this invoice to check how well the Veryfi extracts vendor names

❌ DON'T: Use this same invoice to check line item accuracy (Don't include line items to the list of fields to track accuracy for)

Why? Because while you know for certain the vendor name is correct (you checked it!), you haven't verified if the line items are correct. Using unverified data as "ground truth" could give you misleading accuracy results.

A practical example:

You have an invoice from Staples
You've manually verified:
- The vendor name is correct
- The invoice date is correct
But you haven't checked:
- The line items
- The tax amounts
Therefore, only use this invoice to measure accuracy for vendor name and date extraction - not for line items or tax amounts.

Think of it like grading a test - you can only grade the questions you have the correct answers for. If you don't have the answer key for certain questions, you can't accurately score those parts.

How to read reports?

Each report shows:

The field being measured (those you included into the report)
Number of extractions
Model version used and Date of analysis
F1 score for each field

You can click "Show Detail" to see side-by-side comparisons of what the model extracted versus the ground truth.

🧑🏻‍🔬 The expected number of extractions is based on the ground truth, and the score shown for each field is the F1 score. F1 is a metric commonly used for summarizing the performance of AI models since it takes into account both precision and recall, it is explained in further detail below.

How do we measure accuracy?

At Veryfi we use several sophisticated methods to ensure fair and practical accuracy measurements including but not limited to fuzzy matching, F1 score, Levenshtein distance, and Hunt–Szymanski algorithm.

A. Fuzzy matching (document level)

"Fuzzy matching" is used for certain fields where small differences don't impact usability of a result in most cases, the document-level fields that don’t require an exact match are the following: addresses, phone numbers, and names.

Addresses:

Pre-processing steps:

Remove all instances of "USA" and "United States" from the end of the address
Remove all dashes and last 4 digits from zip codes (e.g. "12345-6789" becomes "12345")
Replace full state names with abbreviations
Replace address terms with abbreviations (e.g. "st" for "street")
Remove common variations like "P.O. Box"
Remove punctuation and spaces from the address

Matching

Phone numbers:

Pre-processing steps:

We focus on the actual digits
For numbers with 8+ digits, we match the last 8 digits
For shorter numbers, all digits must match

Matching

If the ground truth number has at least 8 digits, at least the last 8 digits must match. If the phone number is less than 8 digits, all digits must match.

Names:

This applies to the fields “vendor.name", "bill_to.name", "ship_to.name", and "vendor.raw_name”.

Pre-processing steps:

We remove common business prefixes (e.g. "the", "sarl")
Remove common suffixes (e.g. "inc", "llc")
Remove all special characters and spaces, but allow multilingual (Ex: "é" for "e", Chinese characters)

Matching

In short: Consider names matching if they're at least 85% similar
🧑🏻‍🏫 Explained: We calculate the Levenshtein distance between the ground truth and extraction values and divide them over the average length of both strings, it is determined a match if this value is equal or below 0.15 considered an 85% similarity. Additionally, if both strings start with the same characters, and the overlapping characters are at least 50% of the length of the longer of the two names, it is a match.

Dates:

Date fields in our json response are datetime objects, but only the date portion is used for matching, time is ignored.

B. Fuzzy matching (array-like fields) / Special Handling for Line Items

line_items and tax_lines can appear more than once in a document, and must be matched in order to be compared, the matching algorithm is explained below. All fields for array-like objects use exact matching, except for line_item description.

Line_items.description

Pre-processing:

1. Remove all special characters and spaces, but allow multilingual (Ex: "é" for "e", Chinese characters)

Matching

In short: Descriptions are considered matching if they're at least 90% similar

🧑🏻‍🏫 Explained: We calculate the Levenshtein distance between the ground truth and extraction values and divide them over the average length of both strings; it is determined a match if this value is equal or below 0.1, considered a 90% similarity. Additionally, if both strings start with the same characters, and the overlapping characters are at least 50% of the length of the longer of the two strings, it is a match.

Array field matching

Sometimes, line_items can be skipped or broken down, so it is necessary to match each line_item in the extraction and the ground truth; We use a single field for this matching, which is the first one from this list to be present in the report: total, description, full_description, price. If no field from this list is in the report, the first field added to the report will be used; the algorithm used for matching is the Hunt-Szymanski.

Hunt-Szymanski algorithm:

Given two sentences, the algorithm will identify the largest common subsequence between them, allowing for gaps to be formed between the two of them. As an example:

Let's say you have two sentences:

This is a good sample sequence for the example
There is a good sample sequence used for an example

An exact match must be achieved between two words is necessary for two words to match, the resulting subsequence would be:

is a good sample sequence for example

Obtained from:

This is a good sample sequence for the example

There is a good sample sequence used for an example

In the case to line item totals, lets say that the ground truth has five line items, with the respective totals: 1.0, 2.0, 3.0, 5.0, 6.0; and the model prediction has six line items: 1.0, 1.0, 2.0, 4.0, 5.0, 6.0; The resulting match would be:

1.0 2.0 5.0 6.0

Coming from:

1.0 2.0 3.0 5.0 6.0

1.0 1.0 2.0 4.0 5.0 6.0

The line items in the report would be:

Index	Extracted total	Ground truth total
1	1.0	1.0
2	null	1.0
3	2.0	2.0
4	3.0	4.0
5	5.0	5.0
6	6.0	6.0

Levenshtein distance:

This distance is defined as the minimum number of edits necessary to transform one string of text to another, edits include insertions, deletions, and edits of a single character; let's say you have two line item descriptions:

Starbucks large ch@ramel vanilla coldmacchiato
$tarbuckslarge caramel vanilla cold macchiato

The result is indifferent to the string we start with, so let's take the second string and transform it into the second one:

First, we do an edit on the first character, transforming the $ to an S, and get:

Second string: Starbuckslarge caramel vanilla cold macchiato