Structured to Unstructured Content: Finding the Right Process Automation Tool
June 11, 2020 / artificial intelligence, Intelligent Process Automation
Companies often struggle with understanding which tool is best for the use case they have in mind as they’re exploring intelligent document processing tools. It’s understandable, given that the landscape of documents users are dealing with ranges from highly structured W-2 forms to ungainly, inherently unstructured financial reports.
The potential solutions likewise run the gamut. They include robotic process automation (RPA) tools and others that use a templated approach, both of which work well with highly structured content. Things get dicier when at least some of the content is unstructured, where you don’t know ahead of time exactly where the information is within a given document. Most of the content companies deal with – some 85% – is of the unstructured variety, including emails, reports, images, Word documents, and more. Coping with this content requires a tool with enough artificial intelligence capability to “read” these documents much like a human would.
In this post, we’ll walk you through each type of content and sample use cases to help illustrate which tool is most appropriate for each.
Related Article: What is Intelligent Process Automation?
Solutions for structured content
RPA tools and those that take a templated process automation approach work well when they know what’s coming. If you’ve got a series of documents that are all formatted the same – like W-2s and other IRS forms, statements from the same bank, or a website “Contact Us” form – then a templated approach should serve you well.
Such an approach often involves using optical character recognition (OCR) technology to identify the text within an image. Then a template indicates precisely the location of where the data you care about is. Together, they can find and extract the data.
An RPA tool can take the resulting data and put it into some other downstream system for processing, relieving a human from performing these same tedious steps over and over. So long as there’s no variation anywhere in the process, whether in the documents or the steps required to get the job done, it should work fine.
A mixed bag: Semi-structured content
Next on the content spectrum is semi-structured content. This type of content takes on many forms but consider again a “Contact Us” form. A form that includes only fields for name, email address, and phone number is structured content – you know what data is in each field.
Now consider the same form with another field that invites visitors to offer more information, such as “Tell us about your issue.” Such a field enables the visitor to enter free-form text and perhaps include an attachment. While the name and address fields are structured and handled by an OCR/templated automation tool, that free-form text is in a different category.
This example of semi-structured content and dealing with it requires a mix of templated or RPA tools plus an automation solution that includes more intelligence. It requires an intelligent document processing tool that can “read” the free-form text, grasp what the visitor wants, and do something with it based on that determination. In this case, that may mean forwarding it to an appropriate customer service representative depending on the subject, such as a repair, financial issue, complaint, suggestion, or what-have-you.. (For more on this topic, see our previous post on digital mailrooms.)
Unstructured content requires intelligent automation
Finally, we have unstructured content– documents with no pre-defined fields, such as pure text or a mix of text and images. Or, it could be documents such as invoices. An individual invoice is considered structured content. But most organizations must deal with invoices from many different companies, and they are not all the same. In that case, if you’re trying to automate invoice processing, you’re effectively dealing with unstructured content – because you’d be hard-pressed to create templates for every invoice you receive.
Another classic example of unstructured content is financial documents, such as the various SEC documents that public companies must file. Financial analysts must pore over these documents and pull out the essential bits of data to assess the company’s performance. It’s painstaking work.
An intelligent process automation solution can do that work for them. By employing technologies including natural language processing and deep learning, an effective IPA tool “reads” such documents and extracts the data that’s important to financial analysts. It can even work in conjunction with an RPA tool to paste the data into a spreadsheet or whatever downstream tool the analyst desires.
There’s no one-size-fits-all approach nor any single tool that can handle all of your intelligent document processing requirements. But there is a solution for each use case, even if it involves highly unstructured content.
To learn more about the benefits of intelligent automation tools and how they help automate processes that include unstructured content, download this free white paper from the Everest Group, “Unstructured Data Process Automation.