The Harsh Truth about Templated Approaches to Unstructured Content

A large insurance company recently came to Indico with a story that is all-too-familiar among firms trying to implement rule-based approaches to handle unstructured documents. The company was looking to automate a process that involved all sorts of financial documents. It partnered with a consulting firm, ultimately paying millions for a small army of engineers to spend months writing countless rules in an attempt to account for every imaginable variation in the documents.

It worked – briefly. But not long after the consulting engagement ended, the automated process broke down. Why? Because it’s virtually impossible to account for every variation you may encounter when dealing with unstructured content–no matter how hard you try, or how much you spend.

That is the unvarnished truth when it comes to purely rule-based process automation software, and it’s a key reason templating tools will only get you so far. They are great at processing highly structured documents, where the same information is in the exact same place every time. But they don’t fare well with unstructured content – such as the various statements and financial documents inherent in the aforementioned insurance process automation example.

Optical character recognition: no panacea

You may hear that optical character recognition (OCR) technology is a solution to this issue. That’s not actually the case. OCR is simply machine learning technology trained to convert scanned documents, often in PDF format, into machine-readable text.

That’s all well and good, but what happens once the scan has been converted into a readable document–how do you extract the information you’re looking for? Most OCR products out there offer to produce rule-based templates built from a sample set of your documents. That’s known as an OCR templating solution. For that to work, you need to know exactly where the text you’re after is located within each document. In that respect, it’s hardly automation at all, because it requires enormous manual effort to come up with the hundreds or thousands of rules required to make it work.

Let’s say you want to automate a process that involves taking financial data from a PDF and putting it into a spreadsheet. If all the statements are from the same financial institution and all the data you’re after is in the same place on every single statement, then an OCR templating approach may be a viable solution.

But as soon as you introduce statements from another institution, or even statements from the same institution that vary from the norm, you now need a new set of rules or templates to handle those situations. It’s easy to see how coming up with all the required templates can quickly become unwieldy – and costly.

RPA challenge: accounting for variation

On top of the requirement that the data be in the same place each time, you’ve also got to consider how the same information may be presented in different ways. Consider the date “Jan. 1, 2020.” That could be rendered in multiple formats, including:

1/1/20
01/01/20
01/01/2020
1-1-20
01-01-20
January 1, 2020
And on and on

To successfully automate a process using a templated approach means coming up with rules that account for each of those possibilities. Even if you succeed in getting a system working initially, chances are it won’t be long until something new comes along that throws a wrench into the works. Or maybe the automated process works on a small scale, such as in a proof-of-concept test, but quickly unravels in production.

Benefits of intelligent process automation

A better approach is to apply natural language processing (NLP) combined with transfer learning to enable intelligent process automation (IPA). Because NLP models are self-learning, they are able to understand context. That means the model looks at surrounding information to predict that a date format it hasn’t seen before is still most likely a date, no matter where it appears in the document.

When you apply NLP tools after OCR, you can now create an automated process that really can “read” text even from unstructured documents and make sense of it – without requiring anyone to write hundreds of rules. That’s truly scalable, intelligent process automation.

To learn more about how IPA works in practice, and how it differs from templated approaches, check out the Everest Group white paper, Intelligent Document Processing for Unstructured Documents.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

Schedule Demo

Unstructured Unlocked podcast

April 24, 2024 | E45

Unstructured Unlocked episode 45 with Daniel Faggella, Head of Research, CEO at Emerj Artificial Intelligence Research

Listen Now

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Listen Now

March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Listen Now

View All

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Get Started

Industry

Use Cases

Get Started

Resources

Documentation

Customer Stories

Get Started

Get Started

Get Started

Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)

BLOG

The Harsh Truth about Templated Approaches to Unstructured Content

Optical character recognition: no panacea

RPA challenge: accounting for variation

Benefits of intelligent process automation

Increase intake capacity. Drive top line revenue growth.

Related Posts

Insurance Underwriting, Intelligent Process Automation

How underwriting process automation is shaping insurance and financing

Insurance, Intelligent Intake, Intelligent Process Automation

Addressing the speed vs. accuracy dilemma in insurance process automation

Insurance, Intelligent Process Automation

Be more selective in underwriting by automating the insurance submission intake process

Unstructured Unlocked podcast

Unstructured Unlocked episode 45 with Daniel Faggella, Head of Research, CEO at Emerj Artificial Intelligence Research

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Get started with Indico

Schedule1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

Get our best content on intelligent automation sent to your inbox weekly!

Schedule
1-1 Demo