Intelligent Document Processing has a Data Problem

Intelligent automation has a data problem. That is, it typically takes enormous quantities of data to effectively automate processes that involve unstructured content. But there is a solution, and it involves intelligent document automation platforms that make effective use of artificial intelligence (AI) technologies including deep learning and transfer learning.

The problem with unstructured content

Unstructured content accounts for some 80% or more of all the data in a typical company. Most everything outside of highly structured content such as spreadsheets and databases is considered unstructured, including email, Word documents, PDFs, images and more. (See this previous post for a deeper dive on unstructured vs. structured content.)

The endless variety of potential formats and content inherent in unstructured data presents a significant problem when it comes to applying AI technology to automate processes that involve such data.

Approaches to training AI models

At a high level, AI technology works by training models, or algorithms, on how to perform a given function. In the case of automated document processing, that means training a model to understand what data you’re looking to extract from a given document. To automate the life insurance underwriting process, for example, you need to train a model to identify data points that are critical to that process – such as an applicant’s age, health history, occupation and so on.

When applying AI to automate document processing, there are essentially two ways to train a model. One is to present numerous examples of the sorts of documents you’re dealing with and identify the data points you want to extract from each. Taking that approach, for a given process you’d need to present thousands or even hundreds of thousands of documents to train a model with any degree of accuracy.

Deep learning, however, flips that approach on its head. You simply provide examples of what you want the end result to be, and the platform figures out how to create a model that produces the desired result. (Under the covers there’s lots of wonky technology involving things like neural networks, but we don’t need to go there.)

How Indico reduces data requirements

In practice, you do still need a sizeable database of labeled data points in order to create effective models. But a good intelligent document processing platform will have that base covered for you. The Indico Intelligent Process Automation platform, for example, is built on a database of some 500 million labeled data points. (We spent the first couple of years of our existence building that database.) That’s enough to provide context behind virtually any type of document or image you throw at it.

Transfer learning enables you to take our generalized model and apply it to your specific process, but with only a fraction of the data you would normally need to train a model. To automate that insurance underwriting process, for example, it would take just 100 to 200 of the documents that are actually involved in the process to train a model with accuracy rates of 90% or better.

As opposed to a traditional AI approach, that reduces your data requirements by 100x to 1000x. In effect, it makes intelligent automation feasible for the vast majority of companies that don’t have the funds, time or ability to collect the required data on their own.

Limited compute requirements

Another benefit is the Indico approach dramatically decreases the level of computer power required for automated document processing. We’ve done most of the model training up front, so our platform can run on just one or two GPUs – not the 10 or more required with traditional AI approaches (an issue we covered in this previous post).

To check out the Indico IPA platform in action, click here to arrange a free demo. Or, feel free to contact us with any questions. We can even discuss those neural networks if you want.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

Schedule Demo

Unstructured Unlocked podcast

April 24, 2024 | E45

Unstructured Unlocked episode 45 with Daniel Faggella, Head of Research, CEO at Emerj Artificial Intelligence Research

Listen Now

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Listen Now

March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Listen Now

View All

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Get Started

Industry

Use Cases

Get Started

Resources

Documentation

Customer Stories

Get Started

Get Started

Get Started

Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)

BLOG

Intelligent Document Processing has a Data Problem

The problem with unstructured content

Approaches to training AI models

Related Article: 3 Keys to Scaling Document Process Automation Enterprise-Wide

How Indico reduces data requirements

Limited compute requirements

Increase intake capacity. Drive top line revenue growth.

Related Posts

Intelligent Document Processing

Everest Group deems Indico leader in intelligent document processing for insurance — and in enabling “data-driven decision-making at enterprise scale”

Insurance Underwriting, Intelligent Process Automation

How underwriting process automation is shaping insurance and financing

Intelligent Document Processing

The transformative power of IDP software

Unstructured Unlocked podcast

Unstructured Unlocked episode 45 with Daniel Faggella, Head of Research, CEO at Emerj Artificial Intelligence Research

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Get started with Indico

Schedule1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

Get our best content on intelligent automation sent to your inbox weekly!

Schedule
1-1 Demo