Smart PDF Splitting
PDF splitting is the process of dividing a single multipage PDF file into multiple separate PDF documents. This feature can be especially useful when dealing with PDFs that contain various invoices or receipts that need to be treated as individual transactions within the accounting software.
In - PDF with multiple invoices\receipts
Out - a list of separate PDF files
Imagine a scenario where you have a PDF document containing multiple pages, each representing different expenses or invoices. While using a PDF to consolidate these receipts is convenient, the subsequent task of extracting and integrating them into accounting software can be challenging. Veryfi's Smart PDF Splitter API solves this problem by automatically splitting the PDF into individual documents, making it easier
for you to manage.
How to Use
Use the PDF Splitter API by sending a POST request to: https://api.veryfi.com/api/v8/partner/documents-set-async.
Include the PDF file you want to split as part of the request.
Upon processing, a notification will be sent to your webhook.
API Documentation and Integration Guide
Please find all the details on Request, Body, Response etc in Veryfi API Docs:
Smart PDF Splitter API vs Receipts and Invoices API
The PDF Splitter endpoint (/api/v8/partner/documents-set-async) is designed specifically for processing PDF files containing multiple distinct receipts or invoices.
It differs from the Receipts and Invoices endpoint in the following ways:
Asynchronous Processing: Smart PDF Splitter API is available only as an asynchronous process, requiring a defined webhook.
File Format: Smart PDF Splitter API supports only PDF files, while /api/v8/partner/documents support various common file types.
Charges: Charges are incurred per document, so each document split out from the original incurs a charge.
External Preprocessing: If your PDF contains non-receipt or non-invoice pages, consider external preprocessing to remove them before sending to PDF Splitter API.
The maximum file size supported is 50 MB.
The maximum number of pages in a PDF is 100.
The API uses the v8 API. If you use v7, you can still process using v8 by running a GET request in v7. More about v7 vs v8 FAQ
PDF splitting is based on a set of extracted fields such as page number, invoice number, document title, and document date. The algorithm used for splitting is continually fine-tuned based on customer samples.
If the API fails to split a PDF file, you will receive a JSON result as if you had sent the file to the /api/v8/partner/documents endpoint, treating all pages as a single document.
Currently, the focus of the PDF splitter functionality is on English and Spanish documents. Splitting results for PDFs in other languages could be not stable. For support in other languages, please contact Veryfi's support team.
Multiple Documents on One Page
The document splitter only splits pages. If you have multiple documents on a single page, Veryfi offers a different way of splitting PDFs or images into multiple documents, which can either be stitched into one PDF or treated as separate documents.
Handling Image-Only PDFs
The process relies on full-text pages to identify page breaks. If your PDF consists entirely of images with no text, the process may not work correctly.
Frequently Asked Questions
Q: Can I call the Document Sets Async endpoint synchronously?
A: No. This endpoint is only available as an asynchronous process. You must have a webhook defined in https://app.veryfi.com/api/settings/keys.
Q: Can I simply send all my documents to the documents-set-async endpoint instead of using /api/v8/partner/documents?
A: This is not typically recommended. The documents-set-async endpoint only supports PDF files, whereas/api/v8/partner/documents support many common file types. If you send all documents to this endpoint, you will likely get multiple errors due to the file format. However, if you are only processing PDF files, then the documents-set-async endpoint could be used in place of /api/v8/partner/documents.
Q: Is it cheaper to use documents-set-async since it is only one API call?
A: No. The charges are per document. Each document that is split out incurs a charge.
Q: Could it be more expensive to use documents-set-async?
A: In most cases, there should be no difference between using an external preprocessing step to split the PDF into multiple files and then sending it to the /api/v8/partner/documents endpoint. However, there is a scenario where a document could have a number of “noise” pages that are not a receipt or invoice. These can get split out and processed as separate documents that incur a fee. For example, legal disclosures, packing lists, quality reports, etc., may get processed as separate Documents. If your PDF file is likely to have these types of extra Documents, consider an external preprocess step to remove them before sending them to Veryfi.
Q: How does the splitting work?
A: Splitting is based on a set of extracted fields that indicate a new document. For example, page number, invoice number, document title, and document date. Please contact our support team If you find repeatable cases where the splitting does not occur. We are continuously fine-tuning the process.
Q: What happens when it fails to split a PDF file?
A: You will get the same JSON result as sending the file to the /api/v8/partner/documents endpoint. Veryfi OCR API processes all pages as a single document and the resulting field extraction is often a mix of line items from both files.
Q: Does documents-set-async work for documents in all the Veryfi-supported languages?
A: The focus has been on English and Spanish documents. Please reach out to our support team if you need support in other languages.
Q: Is the doc splitting able to handle multiple documents on a one-page pdf?
A: This document splitter only splits pages. We have a different way of splitting/cropping PDF or an image into multiple documents and they can either be stitched into 1 PDF or treated as separate documents. For example, if there are multiple receipts on one picture: https://www.youtube.com/watch?v=F0huOZZr_mg
Q: What happens with non-text pages with images united to one PDF
A: The process relies on full-text pages to identify page breaks. If your PDF consists entirely of images with no text, the process may not work correctly.
For any questions or assistance, please reach out to Veryfi's support team at firstname.lastname@example.org.