Indico Data receives top position in Everest Group's Intelligent Document Processing (IDP) Insurance PEAK Matrix® 2024
Read More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report

BLOG

Structured to Unstructured Content: Finding the Right Process Automation Tool

June 11, 2020 | Artificial Intelligence, Intelligent Process Automation

Back to Blog

Companies often struggle with understanding which tool is best for the use case they have in mind as they’re exploring intelligent document processing tools. It’s understandable, given that the landscape of documents users are dealing with ranges from highly structured W-2 forms to ungainly, inherently unstructured financial reports.

The potential solutions likewise run the gamut. They include robotic process automation (RPA) tools and others that use a templated approach, both of which work well with highly structured content. Things get dicier when at least some of the content is unstructured, where you don’t know ahead of time exactly where the information is within a given document. Most of the content companies deal with – some 85% – is of the unstructured variety, including emails, reports, images, Word documents, and more. Coping with this content requires a tool with enough artificial intelligence capability to “read” these documents much like a human would.

In this post, we’ll walk you through each type of content and sample use cases to help illustrate which tool is most appropriate for each. 

Related Article: What is Intelligent Process Automation?

 

Solutions for structured content

RPA tools and those that take a templated process automation approach work well when they know what’s coming. If you’ve got a series of documents that are all formatted the same – like W-2s and other IRS forms, statements from the same bank, or a website “Contact Us” form – then a templated approach should serve you well.

Such an approach often involves using optical character recognition (OCR) technology to identify the text within an image. Then a template indicates precisely the location of where the data you care about is. Together, they can find and extract the data.

An RPA tool can take the resulting data and put it into some other downstream system for processing, relieving a human from performing these same tedious steps over and over. So long as there’s no variation anywhere in the process, whether in the documents or the steps required to get the job done, it should work fine.

A mixed bag: semi-structured content 

Next on the content spectrum is semi-structured content. This type of content takes on many forms but consider again a “Contact Us” form. A form that includes only fields for name, email address, and phone number is structured content – you know what data is in each field.

Now consider the same form with another field that invites visitors to offer more information, such as “Tell us about your issue.” Such a field enables the visitor to enter free-form text and perhaps include an attachment. While the name and address fields are structured and handled by an OCR/templated automation tool, that free-form text is in a different category.

This example of semi-structured content and dealing with it requires a mix of templated or RPA tools plus an automation solution that includes more intelligence. It requires an intelligent document processing tool that can “read” the free-form text, grasp what the visitor wants, and do something with it based on that determination. In this case, that may mean forwarding it to an appropriate customer service representative depending on the subject, such as a repair, financial issue, complaint, suggestion, or what-have-you.. (For more on this topic, see our previous post on digital mailrooms.) 

Related Article: Outlining the Difference Between Unstructured, Structured and Semi-Structured Data

 

Unstructured content requires intelligent automation

Finally, we have unstructured content– documents with no pre-defined fields, such as pure text or a mix of text and images. Or, it could be documents such as invoices. An individual invoice is considered structured content. But most organizations must deal with invoices from many different companies, and they are not all the same. In that case, if you’re trying to automate invoice processing, you’re effectively dealing with unstructured content – because you’d be hard-pressed to create templates for every invoice you receive.

Another classic example of unstructured content is financial documents, such as the various SEC documents that public companies must file. Financial analysts must pore over these documents and pull out the essential bits of data to assess the company’s performance. It’s painstaking work

An intelligent process automation solution can do that work for them. By employing technologies including natural language processing and deep learning, an effective IPA tool “reads” such documents and extracts the data that’s important to financial analysts. It can even work in conjunction with an RPA tool to paste the data into a spreadsheet or whatever downstream tool the analyst desires. 

There’s no one-size-fits-all approach nor any single tool that can handle all of your intelligent document processing requirements. But there is a solution for each use case, even if it involves highly unstructured content.

To learn more about the benefits of intelligent automation tools and how they help automate processes that include unstructured content, download this free white paper from the Everest Group, “Unstructured Data Process Automation.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

[addtoany]

Unstructured Unlocked podcast

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

podcast episode artwork
March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

podcast episode artwork
March 13, 2024 | E42

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

podcast episode artwork

Get started with Indico

Schedule
1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!