Skip to main content

Document Classification API

Intelligent Document Classifier, Preprocess and Route

Updated this week

What is Document Classification API

Veryfi's Classification API is an intelligent document classifier that identifies document types without performing full file processing and data extraction.

This lightweight API returns the document_type (receipt, invoice, W-2, etc.) along with a confidence score, enabling you to route documents to appropriate processing workflows or filter out irrelevant content before full processing or else custom handling.

Prerequisites

The following document types are supported: .webp, .heic, .txt, .gif, .htm, .avif, .xls, .ofd, .xlsx, .heif, .html, .zip, .csv, .jpg, .jpeg, .pdf, .png.

The max file size: 20mb, min file size is 0.25kb.

RPS: 60 requests per second.

How Classification API Differs from Document Processing

The Classification API operates fundamentally differently from other Veryfi Data Extraction APIs. While the Data Extraction API extracts all available data fields from documents (amounts, dates, vendor names, line items, etc.), the Classification API focuses solely on identifying what type of document it's analyzing.

Classification API is ideal for:

  • Pre-filtering documents before processing

  • Routing documents to specialized processing workflows

  • Rejecting irrelevant uploads

  • Bulk document sorting and organization

Data Extraction APIs are better for:

  • Complete data extraction from predefined document types

  • End-to-end document processing workflows

  • Situations where document type and data extraction happen simultaneously

  • Cases where preprocessing and/or classification of the document type is not required

What document types can the Classification API detect?

Standard document types include:

other, receipt, invoice, purchase_order, w9, statement, check, contract, w8, remittance_advice, business_card, bank_statement, w2, packing_slip, credit_note  

⚠️ Important Note: You can use either standard document types (pre-trained) or define custom types.

Use document_types request parameter to define custom types.


What does a Classification API response look like?

Request:

curl -L -X POST 'https://api.veryfi.com/api/v8/partner/classify' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'CLIENT-ID: vrfY81111111111111yc6co' \
-H 'AUTHORIZATION: apikey en.apikeystest:2b886726a7710c3111110c41c460' \
--data-raw '{
"external_id": "string",
"package_path": "string",
"bucket": "string",
"file_data": "string",
"file_url": "https://veryfi-testing-public.s3.us-west-.amazonaws.com/receipt.jpg"",
}'

JSON Response:

{
"document_type": {
"score": 0.97,
"value": "receipt"
}
}

The score indicates confidence (0.0 to 1.0), and value is the classified document type.

Score 0.97 = 97% confident this is the correct document type

Value "receipt" = The classified document type

Higher scores = More reliable classification

Learn more about Confidence Details here

⚠️ Important: You can start with either higher thresholds or lower, depending on the approach that depends on the requirement for precision and recall.

⚠️ Important: Different thresholds might be employed for default document_types and custom document_types.

Custom vs Standard Types:

Standard Types: Ready to use, trained on millions of documents

Custom Types: "medical_record", "legal_contract", "property_deed"

Q: How do I handle documents classified as "other" or “null”?

A: Documents classified as "other" are typically:

  • Personal photos, screenshots

  • Non-document content

  • Highly damaged/unclear documents

A: Documents classified as "null" are:
- Document that has fewer than 50 characters

Q: Can I improve the accuracy of custom document types?

A: Yes, contact Veryfi for custom model training for specialized industries or unique document types not covered by standard classifications.


Use Cases by Industry Vertical

Tax Forms & Accounting

Q: How can tax preparers use the Classification API?

A: Upload all client tax documents to the Classification API first, then route them automatically:

Financial Services & KYC

Q: How does the Classification API help with KYC (Know Your Customer) processes?

A: Create a single document upload endpoint for customers, then route based on classification:

Compliance Benefits:

  • Faster customer onboarding

  • Reduced manual document review

  • Consistent compliance workflows

  • Audit trail for document classification

Q: Can I integrate Classification API with compliance workflows?

A: Yes, use Business Rules Engine to create automated compliance checks based on document types and confidence scores.

Learn more about KYC Toolkit at Veryfi

Expense Management

Q: How does Classification API improve expense management workflows?

A: Implement "garbage in, garbage out" prevention:

  • Pre-classify all user uploads

  • Only process documents classified as receipt, invoice, or purchase_order

  • Reject business_card, other, or low-confidence classifications

  • Route legitimate documents to Receipts & Invoices API

  • Save processing costs by filtering non-expense documents upfront

Healthcare & Insurance

Q: How can healthcare organizations use Classification API?

A: Route medical documents efficiently:

  • receipt → Medical expense processing for HSA/FSA claims

  • invoice → Provider billing verification

  • statement → Insurance claim processing

  • other → Manual review for non-medical documents

Healthcare Workflow Benefits:

  • Faster claim processing

  • Reduced manual document sorting

  • Better fraud detection

  • Improved patient experience


Getting Started

Quick Start Checklist:

  1. 🔑 Obtain API credentials from your Veryfi account

  2. 🧪 Test with sample documents from your use case

  3. 📊 Determine optimal confidence thresholds for your workflow

  4. ⚙️ Implement routing logic to appropriate processing APIs

  5. 📈 Set up monitoring and optimization processes

Do I need different API keys for Classification vs Processing APIs?

No, use the same Veryfi API credentials for all APIs. Billing is tracked separately by API endpoint usage.

Did this answer your question?