Structured to Unstructured Content: Finding the Right Process Automation Tool
June 11, 2020 / artificial intelligence, Intelligent Process Automation
An issue that companies often struggle with as they’re exploring intelligent document processing tools is how to understand which tool is best for the use case they have in mind. It’s understandable, given the landscape of documents users are dealing with range from highly structured W-2 forms to ungainly, inherently unstructured financial reports.
The potential solutions likewise run the gamut. They include robotic process automation (RPA) tools and others than use a templated approach, both of which work well with highly structured content. Things get dicier when at least some of the content is unstructured, where you don’t know ahead of time exactly where the information you want is located within a given document. Most of the content companies deal with – some 85% – is of the unstructured variety, including emails, reports, images, Word documents and more. Dealing with this content requires a tool with enough artificial intelligence capability to be able to “read” these documents much like a human would.
In this post, we’ll walk you through each type of content and sample use cases to help illustrate which tool is most appropriate for each.
Related Article: What is Intelligent Process Automation?
Solutions for structured content
RPA tools as well as those that take a templated process automation approach work well when they know what’s coming. If you’ve got a series of documents that are all formatted exactly the same – like W-2s and other IRS forms, statements from the same bank, or a website “Contact Us” form – then a templated approach should serve you well.
Such an approach often involves using optical character recognition (OCR) technology to identify the text within an image. Then a template is used to indicate exactly where in the document the data you care about is located. Together, they can be used to find and extract the data.
An RPA tool may then be employed to take the resulting data and put it into some other downstream system for processing, relieving a human from performing these same tedious steps over and over. So long as there’s no variation anywhere in the process, whether in the documents or in the steps required to get the job done, it should work fine.
A mixed bag: Semi-structured content
Next on the content spectrum is semi-structured content. This can take many forms but consider again a website “Contact Us” form. A form that includes only fields for name, email address and phone number would be considered completely structured – you know what data is in each field.
Now consider the same form with another field that invites the visitor to offer more information, such as “Tell us about your issue.” Such a field enables the visitor to enter free-form text and perhaps include an attachment. While the name and address fields are still structured and could be handled by an OCR/templated automation tool, that free-form text is in a different category.
This is an example of semi-structured content and dealing with it requires a mix of templated or RPA tools plus an automation solution that includes more intelligence. It requires an intelligent document processing tool that can “read” the free-form text, grasp what the visitor wants and, based on that determination, do something with it. In this case, that may mean forwarding it to an appropriate customer service representative depending on subject, such as a repair, financial issue, complaint, suggestion or what-have-you. (For more on this topic, see our previous post on digital mailrooms.)
Unstructured content requires intelligent automation
Finally, we’ve got completely unstructured content. This includes documents with no pre-defined fields, such as pure text, or a mix of text and images. Or, it could be documents such as invoices. An individual invoice would be considered structured. But most organizations must deal with invoices from many different companies and they are not all the same. In that case, if you’re trying to automate invoice processing, you’re effectively dealing with unstructured content – because you’d be hard-pressed to create templates for each invoice you may receive.
Another classic example of unstructured content is financial documents, such as the various SEC documents that public companies must file. Financial analysts must pore over these documents and pull out the bits of data that are important to them in assessing the company’s performance and prospects. It’s painstaking work.
An intelligent process automation tool, however, can be trained to do that work for them. By employing technologies including natural language processing and deep learning, an effective IPA tool such as Indico’s can be trained to “read” such documents and extract the data that’s important to financial analysts. It can even work in conjunction with an RPA tool to then paste the data into a spreadsheet or whatever downstream tool the analyst desires.
As you can see, there’s no one-size-fits-all approach nor any single tool that can handle all of your intelligent document processing requirements. But there is a solution for each use case, even if it involves highly unstructured content.
To learn more about the benefits of intelligent automation tools and how they help automate processes that include unstructured content, download this free white paper from the Everest Group, “Unstructured Data Process Automation.