When processing documents with Veryfi, you may come across two different features: Duplicate Detection and Similarity Check. While both help prevent duplicate or fraudulent submissions, they serve different purposes.
Duplicate Detection
Purpose: To identify if a document has already been processed before.
How it works: Veryfi uses several fields to determine whether a new document matches one that was already processed. If it does, the new one is flagged as a duplicate.
👉 Examples:
You upload the same invoice twice. Since all four fields match, Veryfi will detect it as a duplicate and flag it.
A user uploads a paper receipt and later uploads the digital version of the same receipt sent by the POS via email. Even though one is a scan/photo and the other is a digital PDF, if the values match, Veryfi will flag it as a duplicate.
Customization: Duplicate Detection can be customized for your use case. Clients can define which fields and logic should be checked when evaluating duplicates. If you are interested in customized duplicate detection, please reach out to [email protected].
API Integration: When a duplicate is detected, the API response will include the following JSON fields:
{
"duplicate_of": document_id,
"is_duplicate": true
}
Similarity Check
Purpose: To identify documents that are not exactly the same, but highly similar.
How it works: Veryfi uses a configurable threshold (e.g., 90%, 95%) to measure how similar a new document is compared to previously processed ones.
Use case: Mainly for fraud detection, especially in scenarios like loyalty programs.
👉 Example:
A user submits a receipt to claim a reward. Later, they digitally change the total or the date and resubmit it. Since the values differ, it will not be caught by duplicate detection. However, the Similarity Check will flag it as highly similar to the original, signaling potential fraud.
API Integration: When a similarity check triggers, the API response includes the following JSON fields:
{
"fraud": {
"color": "red",
"decision": "Fraud",
"types": ["similar documents"]
}
}
And you will find the list of documents similar to that one in the field meta.duplicates:
{
"duplicates": [
{
"id": 313322741,
"score": 0.69,
"url": URL
},
{
"id": 313322743,
"score": 0.69,
"url": URL
},
{
"id": 313322742,
"score": 0.69,
"url": URL
}
],
}
Image Quality and Accuracy
It’s important to note that poor-quality images can lead to both false positives and false negatives for duplicate detection and similarity check.
Blurry, cropped, or low-resolution images may cause Veryfi to misread fields, resulting in documents being incorrectly flagged as duplicates.
Similarly, modifications or noise in the image may cause the similarity check to either miss fraud attempts or incorrectly flag legitimate documents.
For more details, see:
🔗 What affects data extraction accuracy?
Duplicate Spike Alert
Another useful tool is the Duplicate Spike Alert, which notifies you if there’s an unusual surge in duplicate submissions. This can help you quickly identify issues such as:
Users mistakenly uploading the same file multiple times
System-level errors causing resubmissions
Fraudulent behavior patterns
Quick Comparison
Feature | What it Detects | How it Works | Example Use Case |
Duplicate Detection | Exact duplicate documents | Match key fields to determine if a new document is a duplicate | Prevent double uploads |
Similarity Check | Near-duplicate / altered documents | Checks similarity based on a configurable % and by comparing the entire OCR text of the documents | Fraud prevention (e.g., modified receipts) |
✅ In summary:
Use Duplicate Detection to prevent accidental resubmissions of the same document (customizable to your business rules).
Use Similarity Check to catch intentional or unintentional submissions of documents that are nearly identical but not exact.
Monitor image quality to reduce false positives/negatives.
Leverage Duplicate Spike Alerts for proactive monitoring.
Just so you know: Similarity Check is part of our Fraud Suite. If you don’t see it in your account, please reach out to [email protected], we’ll be happy to help you get it set up.