Factors Affecting Processing Speed
What is latency?
Latency refers to the time it takes for the Veryfi OCR API to process a document and return the JSON response to the user. It measures the delay between the moment the Veryfi API receives the document and the moment Veryfi sends back the processed data in JSON response. Lower latency indicates faster processing, while higher latency indicates longer processing times.
What is upload time
There are two distinct phases in the OCR API workflow (1) upload time and (2) processing time.
Upload time
Upload time refers to the duration it takes to transmit the document or file from the client's system to the Veryfi OCR API service. It measures the time from initiating the file transfer to the completion of the upload. During this phase, the client is responsible for sending the document to the API for processing.
Factors that impact upload time
Document itself
File size
File format
CDN
Network Connection
Other
File size
Optimizing the upload time involves reducing file size, leveraging efficient file transfer protocols, or employing compression techniques to expedite the data transmission process. Smaller file sizes facilitate faster data transfer. The file size limit is 20Mb for direct API calls and 10Mb for Web uploader. File size requirements FAQ
File Format
Choose an appropriate file format that balances file size and upload time. For example, using image formats like JPEG or PNG with efficient compression can reduce the file size while maintaining acceptable quality.
CDN (Network / Speed)
Leveraging a CDN can enhance upload speed by caching files closer to the user's geographical location, reducing latency. European Union (EU) Region Configuration: If your business is based in the EU, you can request Veryfi to configure the EU region for data processing. This can potentially minimize latency. Data centers location
Network Connection
Network connection still plays a crucial role in the file upload process. In cases when product end users are on a high-speed network connection uploading files Upload time will be much better in comparison to unstable Wi-Fi. While Veryfi is not responsible for Upload time, delayed upload time even with a 2-3 second processing time will directly impact the user experience of your application/product.
Other
Any other steps performed by the client before initiating the upload.
How to improve upload time
Fine-tuning the upload process can significantly enhance user experience.
To improve uploading time, users can consider the following strategies:
File Compression: Compressing files before uploading can significantly reduce their size and expedite the upload process. Smaller file sizes lead to faster data transfer. Please note that file resolution and image quality plays an important role in data extraction, make sure the compression does not lower the image quality. File requirements faq
Optimal File Format: Choose an appropriate file format that balances file size and upload time. For example, using image formats like JPEG or PNG with efficient compression can reduce the file size while maintaining acceptable quality.
CDN (Content Delivery Network): Utilize a CDN to distribute files across multiple servers located in different regions. CDNs store cached copies of files closer to end-users, reducing the distance data needs to travel and accelerating upload times.
Resumable Uploads: While you can not limit or control the network and connection speed/reliability, we encourage all to implement resumable upload functionality, which enables allows you to resume an interrupted upload instead of starting from scratch. This can be particularly helpful for large files or in situations where network connectivity may be unstable. The idea is to encourage users to resend the request if due to network or else call was interrupted, feel free to re-write/adjust.
Preprocessing and Validation: Perform any necessary preprocessing and validation tasks on your (user) side before initiating the upload. This includes data format checks, size restrictions, and any required transformations. By handling these tasks locally, one can optimize the upload process and overall user experience. Veryfi does all the validations on its side, but if a request contains the file format or size that is not supported a request will be returned with an error, while validation on your side will improve user experience.
Try Veryfi Lens
Veryfi Lens (mobile and browser) already has preprocessing and validation mechanisms that set a small max file size to ensure both fast upload & processing time. Read more Veryfi Lens for Mobile and Lens for Browser
What is processing time
Processing Time
Processing time refers to the duration it takes for the Veryfi OCR API to analyze and extract data from the uploaded document. It encompasses the time from when the API receives the document to when it completes the processing and returns the extracted information or response back to the client. The processing time is solely determined by the OCR API service.
Processing time is influenced by various factors such as the complexity of the document, the number of pages, the file format, and the specific OCR algorithms employed. Additionally, certain client-side configurations and parameters like confidence details, duplicate, blur detection or number field extraction can also impact the processing time.
Factors that affect the processing time
The way you submit files
Document itself
File size
File format
Number of pages
Client-side configurations and request parameters
Number of fields
The way you submit files
Zipped files sent as binary through the "file" parameter in the OCR API generally provide faster performance compared to using the "file_data" or "file_url" parameters. Base64 encoded string - the least effective way to submit files. https://en.wikipedia.org/wiki/Data_URI_scheme
Here's why:
Reduced Network Overhead: When sending a zipped file as binary data, the entire file is sent as a single payload. This reduces network overhead since there is no need to make additional requests to retrieve file data or download the file from a URL.
Efficient Data Transfer: Zipped files are compressed, resulting in a smaller file size compared to the original uncompressed files. Smaller file size leads to faster data transfer over the network, reducing the overall processing time.
Streamlined Processing: The OCR API can directly process the zipped file as binary data without the need for additional steps like decoding base64-encoded file data or retrieving the file from a URL. This streamlined processing eliminates unnecessary operations, resulting in faster performance.
Simplified Handling: Zipped files as binary data provide a straightforward and standardized method of submitting files. The API can efficiently handle the binary data without additional parsing or decoding steps, further enhancing processing speed.
It's important to note that the specific performance gains may vary depending on the implementation, use case. However, in general, using zipped files as binary data via the "file" parameter offers improved performance and faster processing times compared to other submission methods like "file_data" or "file_url." Please refer to API Docs for Veryfi API Schema.
File size
Optimizing the upload time involves reducing file size, leveraging efficient file transfer protocols, or employing compression techniques to expedite the data transmission process. Smaller file sizes facilitate faster data transfer. The file size limit is 20Mb for direct API calls and 10Mb for Web uploader. File size requirements
File Format
Popular image types like JPEG and PNG generally perform better. While PDFs usually take longer to process. List of file types supported
Why do PDFs take longer to process?
PDF documents can potentially take longer to process compared to other file formats due to their inherent complexity and versatility. Here are a few reasons why processing PDF files may require more time:
Richer Format: PDF offers a richer format compared to image-based formats like JPEG or PNG. PDF files can contain various elements such as text, images, tables, and complex formatting. Extracting and interpreting the content from these elements can be more time-consuming.
Higher Quality: PDF documents generally have higher quality, which means they may contain higher-resolution images and sharper text. Processing higher-resolution images or performing OCR on clearer text can require additional computational resources and time.
Page Structure: PDF files often have a defined page structure with different layers, annotations, and interactive elements. Parsing and analyzing these structural components can add complexity and processing time.
Text Extraction: Extracting text from PDF files involves deciphering the text content embedded within the document's structure. This process may require additional steps compared to directly processing image-based formats.
Compression and Encoding: PDF files can be compressed and encoded in various ways to reduce file size. Decompressing and decoding the file before processing the content can contribute to increased processing time.
Despite potentially longer processing times, PDF documents offer advantages such as better text preservation, maintainable formatting, and the ability to store complex document structures. The average processing time for 1 page PDF is 3-4 seconds.
Number of pages
Multipage document refers to a PDF document with multiple pages inside, a request with multiple URLs for multiple images, and Zip files that contain multiple documents. How Veryfi works with multiple documents sent via a single API call. Plus What file type to expect after a document is processed on Veryfi side:
Multipage documents can take more time to process due to several factors:
Increased Content: Multipage documents contain multiple pages of content, which means there is more data to process. Each page needs to be individually analyzed and extracted, leading to a higher workload for the OCR system.
Page Alignment and Orientation: In multipage documents, pages may have different alignments or orientations. Veryfi needs to handle variations in page layout and adjust accordingly, which can increase processing time compared to documents with consistent page alignment.
Page-Level Analysis: Veeyfi performs page-level analysis to understand the overall structure and relationships between pages. This analysis can involve detecting headers, footers, page numbers, and other page-specific elements. Processing this additional contextual information for multipage documents requires more computational resources and time.
Sequential Processing: Multipage documents are processed sequentially, page by page. Each page must be loaded, analyzed, and processed before moving on to the next page. This sequential processing adds cumulative processing time for each page, making the overall processing time longer.
Note: In 85% of cases when latency was questioned the delay was caused by the number of pages in PDFs. The average processing time and average number of pages can be found in Analytics Dashboard inside the user account. Just drill down to the Analytics section.
Configurations, fields, and request parameters
Request parameters can affect processing time because they provide instructions and additional data that influence how the OCR API processes the request and extracts information from the uploaded document.
Here are a few reasons why request parameters can impact processing time:
Processing Configuration: Request parameters allow clients to configure various aspects of the OCR processing. For example, parameters like, duplicate detection, confidence details, bounding boxes, or barcode detection settings can affect the complexity and depth of the processing performed by the API. Configuring these parameters differently can result in variations in processing time.
Field Extraction: Request parameters related to field extraction, such as categories, tags, or custom fields, to define the specific information to be extracted from the document. The complexity and number of fields to be extracted impact the processing time, as more extensive or intricate extractions may require additional computation and analysis.
Post-Processing Tasks: Some request parameters may trigger additional post-processing tasks or validations. For instance, address validation, and vendor checks using maps can introduce additional processing steps that take time to complete.
Data Validation and Enrichment: Request parameters related to data validation or enrichment processes, such as external ID or fraud detection mechanisms, may involve additional checks or queries against external databases or services. These processes can add to the overall processing time as the API interacts with external resources.
It's important to note that the impact of request parameters on processing time may vary depending on the specific service and its underlying algorithms. Understanding how different parameters influence processing time can help clients optimize their API usage and configure requests to achieve the desired balance between speed and accuracy.
How to optimize processing time
Compressing files before uploading can significantly reduce their size and expedite the upload process. Smaller file sizes lead to faster data transfer.
Optimal File Format
Choose an appropriate file format that balances file size and upload time. For example, using image formats like JPEG or PNG with efficient compression can reduce the file size while maintaining acceptable quality.
Control the number of pages to process
Multipage documents may take more time to process, to optimize the processing time for multi-page documents and improve the overall efficiency you can limit the number of pages processed using the max_pages_to_process parameter.
max_pages_to_process is a parameter that controls how many pages of the document Veryfi should read and extract. Link to FAQ.
The current limitation is *15 pages. It means that if you submit a 20-page PDF document, Veryfi will extract the data from the first 15 pages only, or if set to 1 we will read only the first page.
Account Configuration / Request parameters
To optimize processing time, clients can explore options such as enabling boost mode, fine-tuning field extraction settings, and leveraging any available performance-enhancing features provided by the OCR API service.
It's important to note that upload time and processing time are distinct phases, and while the client can take steps to optimize upload time, the processing time is primarily dependent on the OCR API service's capabilities, infrastructure, and the document's complexity. Boost_mode explained
boost_mode skips:
vendor address validation
fields enrichment
vendor check using maps
vendor checks against government DB of registered businesses
logo search on the document
auto-rotation on a back-end
blur detection
document conversion to readable PDF
barcode detection
duplicate detection
categorization
To get the most recent parameters and API Schema please check out API Docs and integration guide.
Number of Extracted Fields
The number of fields to extract can impact processing time. Customers can focus on necessary fields, reducing the overall processing workload. Veryfi supports 180+ fields, depending on the use case, customers may not need all the fields we currently return.
Removing the ocr_text field from the list of JSON fields we return could significantly improve the speed for multi-page PDFs.
Optimizing ways to submit files
In general, using zipped files as binary data via the "file" parameter offers improved performance and faster processing times compared to other submission methods like "file_data" or "file_url."
Account or Request parameters
Removing additional services/microservices or features can potentially speed up processing time by reducing unnecessary computations.
Disabling Additional Services like:
Barcode Detection
Blur Detection
Rotation & Documents crop
Duplicate Detection
Confidence details and bounding boxes
Parsing Address
Fraud detection
etc.
How to debug latency issues
1. Look at the meta object in your JSON.
Utilize the meta fields provided by Veryfi, such as 'total processed' and 'total pages.' These fields offer insights into the extraction progress and the number of pages processed, aiding in monitoring and managing the extraction workflow.
"meta":{
"owner":string"test"
"processed_pages":15
"source":string"api"
"total_pages":19
}
Where:
owner = username.
source - API or Lens
processed_pages - total number of pages that were processed. Cannot exceed the max_pages_to_process parameter, if included, and the system limit of 15 pages.
total_pages - total pages in a PDF file
meta
object enabled for all users by default
2. Look at Veryfi Analytics Dashboard.
Veryfi Analytics Dashboard unlocks the most important stats and API usage insights over Processing time and Document-related metrics.
Processing time: Average and Median processing time and its distribution along with a trend chart that visualizes the processing time fluctuations over time.
Document-related metrics: File types and document source distribution, average file size, and average number of pages per document.
β
β
Contact us at support@veryfi.com if you need help.