When processing documents with Veryfi, two features help catch resubmissions: Duplicate Detection and Similarity Check. They work differently and catch different things.
Duplicate Detection
Duplicate Detection checks whether a newly submitted document matches one you've already processed. It runs automatically on every submission.
The system checks documents in a specific order:
1. If the uploaded file is byte-for-byte identical to a previously processed file, it's flagged immediately. This catches re-uploads of the exact same file.
2. Field-based check. The system compares extracted data fields from the new document against your existing documents. The default check requires all of these to match:
• Same vendor name
• Same date
• Same total
• Same invoice number
• etc identifiers
Customization: You can customize which fields and logic the system uses to evaluate duplicates. For example, some clients use custom rules that check different combinations of fields (invoice number OR reference number, store number, etc.) with AND/OR logic. Reach out to [email protected] if you need this.
API response: When a duplicate is detected, the response includes:
"is_duplicate": true, "duplicate_of": 12345678
duplicate_of contains the document ID of the original. In the Web Portal, duplicates are highlighted in red.
👉 Examples:
You upload the same invoice twice. If fields match, Veryfi will detect it as a duplicate and flag it.
A user uploads a paper receipt and later uploads the digital version of the same receipt sent by the POS via email. Even though one is a scan/photo and the other is a digital PDF, if the values match, Veryfi will flag it as a duplicate.
📕 Duplicate Detection is a default feature that is enabled for all accounts by default. 🔗 API Documentation
Similarity Check
Purpose: To identify documents that are not exactly the same, but very similar.
How it works: Veryfi uses a configurable threshold (e.g., 90%, 95%) to measure how similar a new document is compared to previously processed ones.
👉 Example:
A user submits a receipt to claim a reward. Later, they digitally change the total or the date and resubmit it. Since the values differ, it will not be caught by duplicate detection. However, the Similarity Check will flag it as highly similar to the original
API Integration: When a similarity check triggers, the API response includes the following JSON fields:
{
"fraud": {
"color": "red",
"decision": "Fraud",
"types": ["similar documents"]
}
}
And you will find the list of documents similar to that one in the field meta.duplicates:
{
"duplicates": [
{
"id": 313322741,
"score": 0.69,
"url": URL
},
{
"id": 313322743,
"score": 0.69,
"url": URL
},
{
"id": 313322742,
"score": 0.69,
"url": URL
}
],
}📕 Similarity check is an add-on feature that belongs to Fraud Suite. If you don’t see meta.duplicates in your account JSON response, please reach out to [email protected].
Image Quality and Accuracy
It’s important to note that poor-quality images can lead to both false positives and false negatives for duplicate detection and similarity check.
Blurry, cropped, or low-resolution images may cause Veryfi to misread fields, resulting in documents being incorrectly flagged as duplicates and vice versa.
Similarly, modifications or noise in the image may cause the similarity check to either miss fraud attempts or incorrectly flag legitimate documents.
For more details, see:
🔗 What affects data extraction accuracy?
Duplicate Spike Alert
Another useful tool is the Duplicate Spike Alert, which notifies you if there’s an unusual surge in duplicate submissions. This can help you quickly identify issues such as:
Users mistakenly upload the same file multiple times
System-level errors are causing resubmissions
Fraudulent behavior patterns
Quick Comparison
Feature | What it Detects | How it Works |
Duplicate Detection | Exact duplicate documents | Match key fields to determine if a new document is a duplicate |
Similarity Check | Near-duplicate or similar documents | Checks similarity based on a configurable % and by comparing the entire OCR text of the documents |

