3 Reasons Why Template-less Automation is Key to Unstructured Data Extraction
May 18, 2021 / artificial intelligence, Intelligent Process Automation, Machine Learning
As companies seek to automate document processing, a first step is often to use a robotic process automation (RPA) tool or a templated approach, which may deliver quick gains for simple processes involving structured content. But to automate the processing of unstructured content, you’ll find three big reasons why a template-less approach is required.
Structured content includes things like spreadsheets and databases, where data is neatly laid out and it’s a relatively simple matter to create a template to automate data extraction. The problem is, 80% or more of the content in most companies is of the unstructured variety. That includes Word documents, PDFs, emails, images and more. (For a deeper dive on the various content types, check out this previous post.)
Learn How Indico is Helping Leading Organizations Move Beyond Template & Rule-based Process Automation
AI is required for variable content
For RPA or templated automation approaches to be effective, you need to know exactly where the data you want to extract from a given document will be. Because unstructured content is highly variable, it’s virtually impossible to create enough templates to effectively automate data extraction.
Automating data extraction from unstructured content requires a tool that can understand the context of documents and find the target data no matter where it may be located. Such tools make use of artificial intelligence technologies including deep learning, machine learning and transfer learning.
That ability to understand context in different documents requires the tool to be trained on a massive number of data points. The Indico Intelligent Process Automation (IPA) platform, for example, is trained on more than 500 million labeled data points, enabling it to “read” and understand everything from images to PDFs and emails. Transfer learning, which enables a model trained on one task to take on other, similar tasks, makes it possible for users to train models for their specific use cases using simple tools to label documents.
Templates don’t scale
Scalability is another issue that demands a template-less approach to data extraction.
A templated approach may be appropriate for simple, low-volume use cases that do not involve variation in terms of document type. Think about automating the auto insurance claims process. Perhaps an insurance company could use a templated approach to extract certain data from its standardized claim form, such as name, address, account number and the like.
But a claim typically involves far more information than that, perhaps including photos, estimates from body shops, and a claims adjuster’s own notes. A company would need thousands of templates to cover all the possible permutations – and the model would fail as soon as a new document type showed up.
Intelligent document processing systems like Indico’s can handle complex, high-volume use cases. Documents containing hundreds and thousands of pages are no problem. It can also automate processes that involve varied sorts of documents, like the auto claim example.
Intelligent automation delivers cost savings
The fact that a single model can handle all the key variables involved in each process also means the intelligent automation approach delivers significant cost savings vs. RPA or a templated approach.
As noted above, to automate a process involving numerous types of documents requires creating a template for each one. That takes many hours and lots of money, even if you handle it in-house. But many companies wind up hiring consultants to write the templates for them, at lofty prices.
With the Indico “citizen data scientist” approach, the business people who know the processes best actually use the IPA platform to create models. A simple interface makes it easy for them to label the sorts of data they want to extract from each document. In an afternoon, they can label 200 documents and have a working model that’s around 95% accurate.
It’s not uncommon for our clients to see a 4x increase in process capacity and an 80% reduction in the human resources required after automating processes involving unstructured content. That amounts to substantial cost savings while also freeing up employees for more strategic and rewarding work.