Multipage PDFs with multiple documents
PDFs rarely show up in the shape your extraction pipeline wants them in. A finance team scans a stack of invoices as one file. An expense report arrives as a 12-page PDF with hotel folios, flight receipts, and restaurant checks bundled together. A bill-pay user uploads a 20-page vendor statement hoping to settle just one line on page 9. These are all legitimate business workflows, and none of them produce the clean single-invoice files that OCR works best on.
When Veryfi receives one of these multi-document PDFs, the default behavior is to treat everything as one invoice. That works beautifully when the file really is one invoice spanning several pages, and badly when it isn't. This guide explains what happens under the hood when a bundled PDF comes in, why the results look the way they do, and the options available to handle it properly, ranked by how much control you have over the upload process.
Why bundled PDFs produce messy results
When you send a PDF, Veryfi reads it as one document by default. Every page is processed together and we return a single result: one vendor, one set of totals, one list of line items. That works perfectly when the PDF really is one invoice, even if it spans several pages.
When a PDF contains several separate invoices, or a real invoice mixed with irrelevant supporting documents, the model sees conflicting signals it can't cleanly resolve. Faced with multiple candidate vendors, totals, and line items, it anchors on the strongest candidate and works around it, blending in fragments from the rest. The output ends up looking plausible but inaccurate.
Common data extraction symptoms of bundled submissions:
Totals from one invoice appearing as a line item on another
The vendor name matches only the most prominent invoice in the bundle
Invoice numbers swapped, duplicated, or missing
Duplicated line items when two invoices share a similar layout
If any of that sounds familiar, the sections below will help you pick a better approach.
Quick answers by situation
Not sure which option fits your use case? Start here, then dive into the relevant section below.
Your situation | What to do |
You send one invoice per PDF | Nothing to change. Keep using the standard endpoint. |
You or your users batch several invoices into one PDF, and you can control that | Educate end users on submission guidelines and split the PDF before sending. This gives the cleanest results. |
You receive pre-bundled PDFs from end users or third parties and can't control the format | It depends on your use case. See "When user behavior is unpredictable" below, because this is where things get nuanced. |
You already know which pages belong to which invoice and they are consistent | Split on your side using that knowledge, then send each segment as its own request. Most accurate option. |
Your bundles are a mix of invoices with packing slips, disclaimers, or other noise | Use the Smart PDF Splitter combined with the fraud signal "not a document". Ignore anything flagged as not a document, process the rest. |
Your options, side by side
Option | How it works | Great when... | Keep in mind |
Split before sending (recommended) | You break the PDF into individual invoices and send each as its own request to the standard endpoint | You control how files enter your pipeline | More API calls, but the cleanest results and the easiest to troubleshoot |
Give your users the ability to split at upload | Embed a splitter in your web upload UI so users can choose what to send and how to slice it | You can influence how end users submit files | Requires product work on your side |
Smart PDF Splitter | Veryfi detects document boundaries inside a bundle and returns one extraction per detected document | You can't pre-split, and the invoices follow a reasonably consistent layout | Async workflow. Every document detected is billed, including noise pages (can be flagged with the fraud signal "not a document") |
Send the PDF as-is | The entire PDF is treated as one document | It really is one invoice, just spread across several pages | Don't use this for bundles. Results will be blended. Set user expectations if you do |
Let's take a closer look at the options available
1. Split the PDF before you send it
This is the approach we recommend for most customers. You separate the bundle into individual PDFs on your side, then send each one as its own request to the standard Process a Document endpoint.
The standard endpoint (POST /api/v8/partner/documents) is documented here.
Why it works well:
One clean result per invoice. No blending.
If something looks wrong, you know exactly which file caused it.
No dependency on automatic detection, which means no surprises.
The tradeoff is more API calls, one per invoice instead of one per bundle. For most pipelines, this is a small price for a meaningful jump in accuracy.
Best for Any customer who controls how files enter their pipeline, or can add a splitting step upstream of the API call.
2. Give your users the ability to split at upload
If your end users upload documents through your web app, consider putting a splitter in the upload flow itself. Users pick the pages that belong together, label them, and submit each document cleanly. Veryfi has built this exact flow for our own internal expense tool and it works well.
This is the most reliable way to eliminate multi-invoice PDFs at the source, but it requires product work on your side. If you already have an upload UI, adding a splitter is a smaller lift than you might expect and it keeps the problem upstream of Veryfi entirely.
Best for Products with an interactive upload experience where users can reasonably be asked to identify invoice boundaries themselves.
3. Let Veryfi split it for you: Smart PDF Splitter Endpoint
If you can't split the PDF yourself and you can't push splitting upstream to your users, send the bundle to our Smart PDF Splitter. It scans the file for document boundaries using clues like page numbers, invoice numbers, document titles, and dates, and returns one extraction per document it detects.
URL: POST https://api.veryfi.com/api/v8/partner/documents-set
Full documentation: Split and process a PDF (API reference) and the Smart PDF Splitter FAQ
Before you route everything through the Splitter, pause.
The Splitter is an excellent tool for the specific case where you truly cannot split upstream. It is not a general-purpose upgrade for your whole pipeline.
If only 5-10% of your volume is bundled PDFs, running 100% of your volume through the Splitter is overkill. You pay the async overhead for every file, not just the ones that actually need splitting. If you can distinguish multi-document bundles from single invoices at intake, route only the bundles to the Splitter and send everything else to the standard endpoint. If you can't reliably tell them apart, that signal itself is worth investing in before scaling the Splitter up.
The Splitter is also only available on the Receipts and Invoices API. If you're using a different Veryfi product, this option isn't open to you.
Best for Pre-bundled PDFs from third parties with consistent formatting, when pre-splitting isn't an option, and you can identify which files need splitting rather than routing everything through it.
4. Send the PDF as-is
This is the default behavior: The whole PDF becomes one document. It's the right choice only when the PDF really is a single invoice spread across multiple pages, not a bundle of separate ones.
Don't use this for bundles If the file actually contains several separate invoices, this option will blend them together. Pick one of the options above, and set user expectations on accuracy if bundled PDFs do slip through.
When user behavior is unpredictable
Most of the trouble with multi-invoice PDFs doesn't come from a simple format problem. It comes from the fact that your end users submit whatever is convenient for them, not what's convenient for you or the Veryfi extraction. Two scenarios come up constantly.
Expense management: the business trip bundle
Your user submits a single expense report for a business trip and attaches one PDF that contains flight tickets, hotel reservations, taxi receipts, and a restaurant bill. Some expense platforms require each expense to be submitted separately, but plenty of companies still reimburse the whole trip against a single total. You can't realistically block users from uploading a mixed-receipt PDF because the policy on your customer's side allows it.
What that means in practice: you need to expect bundled submissions and plan for them. The Smart PDF Splitter handles this case reasonably well because the documents inside are genuinely separate receipts, each with its own vendor, total, and line items. Be aware that every receipt detected will be billed, and that a trip bundle can produce more detected documents than the user expects.
Bill pay: the invoice inside the haystack
Your SMB user wants to pay one vendor for one thing, but the PDF they upload happens to be a 20-page statement where the relevant invoice sits on pages 7 to 9 alongside 19 other line items, reminders, remittance advice, and marketing inserts. The user wants one payment. You have one file. The extraction is going to be unreliable no matter what you do.
If you route this through the Splitter, you'll get back 5, 10, or 20 detected documents. You'll be billed for each one, and your user's monthly invoice allowance may get eaten up by a single upload. This is the case where the right answer is often to surface the problem to the user, either with an in-app splitter (see Option 2) or with guidance at upload time that asks them to isolate the invoice they want to pay.
The takeaway The deeper you go into real customer workflows, the more the "right" answer depends on factors outside the API itself: what users will tolerate, what your billing model can absorb, whether you can distinguish bundle types at intake, and what your product team is willing to build. There's rarely a single technical setting that solves the whole problem.
A note on Veryfi Workflows: the LLM-prompt approach
If you're a Veryfi Workflows customer, you have another option worth knowing about. Workflows supports LLM-based steps that can inspect a document and take actions based on prompts you define, including routing or splitting logic driven by what the model sees in the file.
Learn more about Workflows
In practice this looks like a step that says something like: "If this document contains multiple invoices, split it and process each one separately." For consistent, predictable document patterns, this can work well and gives you more control than a generic splitter because the prompt is yours.
A few caveats:
It relies on pattern consistency. If your bundles follow a predictable structure, the prompt can be tuned to handle them. If every customer submits something different, prompt-based splitting becomes a moving target.
It's built for lower volumes. LLM steps add latency and cost per document. For high-volume pipelines, routing every file through Veryfi LLM decision step may not be practical.
It's overkill if splitting is a small part of your problem. If only a fraction of your volume needs splitting, adding an LLM step to the full pipeline to handle that fraction is usually the wrong trade.
Workflows with LLM prompts is a great fit when bundle patterns are consistent, volumes are moderate, and splitting is a meaningful share of what you're trying to solve. For high-volume production pipelines where a few percent of files are bundles, a routing decision before the API call, sending bundles to the Splitter and everything else to the standard endpoint, is usually the cleaner architecture.
A note on page limits
Every account starts with a 15-page limit per PDF. If you send a longer file, we'll process the first 15 pages and stop. This is a safety default that protects you from runaway jobs, not a cap on what's possible. If you regularly work with longer documents. The limit is soft.
There's also a parameter called max_pages_to_process that lets you cap processing at any number between 1 and 15 per request. It's useful when you want to skip known boilerplate at the end of a long document, or force single-page extraction from an image set.
Important: this parameter does not split the document. It simply tells us where to stop reading. Everything up to that page is still treated as one document. Full details in the max_pages_to_process FAQ.
For a full list of supported file formats across Veryfi endpoints, see File formats Veryfi supports.
Frequently asked questions
Why did my long PDF only get partially processed?
Your account is probably at the default 15-page limit.
Can I use max_pages_to_process to split a document?
No. It only tells us where to stop reading. Everything up to that page is still treated as one document. To actually split, use one of the approaches above.
Will the Smart PDF Splitter save me money?
Usually no. We bill per extracted document, so pre-splitting and the Splitter cost the same for the same bundle. The Splitter can cost slightly more if it extracts noise pages as separate documents.
What happens if the Splitter can't find any boundaries?
It falls back to treating the PDF as a single document. You won't get an error, you'll just get one result instead of several.
My bundles have packing slips or legal pages mixed in. What should I do?
Either remove those pages before sending, or use the Smart PDF Splitter combined with the fraud signal "not a document". Anything flagged as not a document can be ignored downstream, so you only act on the real invoices. Talk to us.
Can I use the Smart PDF Splitter with APIs other than Receipts and Invoices?
No. The Smart PDF Splitter is only available on the Receipts and Invoices API. If you're working with a different Veryfi product, you'll need to split on your side or use a Workflows step with LLM prompts if available in your plan.
Should I route my entire pipeline through the Splitter just to be safe?
Almost never. If only a small share of your volume is bundled PDFs, putting everything through the Splitter adds unnecessary latency and cost. The better pattern is to identify bundles at intake and route only those to the Splitter, with everything else going to the standard endpoint.
How does this interact with Veryfi Workflows?
If you're on Workflows, you can build LLM-prompt steps that inspect a document and decide what to do with it, including splitting logic. This works well for consistent patterns and moderate volumes. For high-volume pipelines, a routing decision before the API call is usually more efficient than running every file through an LLM step.
Still not sure?
If you're unsure which approach fits your workflow, or if you're seeing extraction issues that don't match the patterns above, reach out to your account team. A quick look at a use case and at a sample file is almost always enough to recommend the right setup.
We're here to help tune the integration so the output matches what your downstream systems expect.
