Skip to main content

What’s the Difference Between Duplicate Detection and Similarity Check in Veryfi?

Duplicate detection vs Similarity check

Updated yesterday

When processing documents with Veryfi, you may come across two different features: Duplicate Detection and Similarity Check. While both help prevent duplicate or fraudulent submissions, they serve different purposes.

Duplicate Detection

  • Purpose: To identify if a document has already been processed before.

  • How it works: Veryfi uses several fields to determine whether a new document matches one that was already processed. If it does, the new one is flagged as a duplicate.​

👉 Examples:

  • You upload the same invoice twice. Since all four fields match, Veryfi will detect it as a duplicate and flag it.

  • A user uploads a paper receipt and later uploads the digital version of the same receipt sent by the POS via email. Even though one is a scan/photo and the other is a digital PDF, if the values match, Veryfi will flag it as a duplicate.

Customization: Duplicate Detection can be customized for your use case. Clients can define which fields and logic should be checked when evaluating duplicates. If you are interested in customized duplicate detection, please reach out to [email protected].

API Integration: When a duplicate is detected, the API response will include the following JSON fields:


{

"duplicate_of": document_id,
"is_duplicate": true

}


Similarity Check

Purpose: To identify documents that are not exactly the same, but highly similar.

How it works: Veryfi uses a configurable threshold (e.g., 90%, 95%) to measure how similar a new document is compared to previously processed ones.

Use case: Mainly for fraud detection, especially in scenarios like loyalty programs.

👉 Example:
A user submits a receipt to claim a reward. Later, they digitally change the total or the date and resubmit it. Since the values differ, it will not be caught by duplicate detection. However, the Similarity Check will flag it as highly similar to the original, signaling potential fraud.

API Integration: When a similarity check triggers, the API response includes the following JSON fields:

{ 
"fraud": {
"color": "red",
"decision": "Fraud",
"types": ["similar documents"]
}
}


And you will find the list of documents similar to that one in the field meta.duplicates:

{
"duplicates": [
{
"id": 313322741,
"score": 0.69,
"url": URL
},
{
"id": 313322743,
"score": 0.69,
"url": URL
},
{
"id": 313322742,
"score": 0.69,
"url": URL
}
],
}

Image Quality and Accuracy

It’s important to note that poor-quality images can lead to both false positives and false negatives for duplicate detection and similarity check.

  • Blurry, cropped, or low-resolution images may cause Veryfi to misread fields, resulting in documents being incorrectly flagged as duplicates.

  • Similarly, modifications or noise in the image may cause the similarity check to either miss fraud attempts or incorrectly flag legitimate documents.

For more details, see:
🔗 What affects data extraction accuracy?


Duplicate Spike Alert

Another useful tool is the Duplicate Spike Alert, which notifies you if there’s an unusual surge in duplicate submissions. This can help you quickly identify issues such as:

  • Users mistakenly uploading the same file multiple times

  • System-level errors causing resubmissions

  • Fraudulent behavior patterns


Quick Comparison

Feature

What it Detects

How it Works

Example Use Case

Duplicate Detection

Exact duplicate documents

Match key fields to determine if a new document is a duplicate

Prevent double uploads

Similarity Check

Near-duplicate / altered documents

Checks similarity based on a configurable % and by comparing the entire OCR text of the documents

Fraud prevention (e.g., modified receipts)


In summary:

  • Use Duplicate Detection to prevent accidental resubmissions of the same document (customizable to your business rules).

  • Use Similarity Check to catch intentional or unintentional submissions of documents that are nearly identical but not exact.

  • Monitor image quality to reduce false positives/negatives.

  • Leverage Duplicate Spike Alerts for proactive monitoring.

Just so you know: Similarity Check is part of our Fraud Suite. If you don’t see it in your account, please reach out to [email protected], we’ll be happy to help you get it set up.​

Did this answer your question?