Making Deep Learning Practical with Smaller Datasets

At the recent VB Summit in Berkeley, Jeff Dean, Head of Google Brain discussed a popular challenge in making Deep Learning a practical solution inside the enterprise: “I would say pretty much any business that has tens or hundreds of thousands of customer interactions has enough scale to start thinking about using these sorts of things”. “If you only have 10 examples of something, it’s going to be hard to make deep learning work. If you have 100,000 things you care about, records or whatever, that’s the kind of scale where you should really start thinking about these kinds of techniques.”
Fact: Training a deep learning model from scratch requires large datasets of at least tens to hundreds of thousands of examples.This is a large barrier to entry for integrating deep learning into a business’ workflow, either due to inability to access such a large amount of data, or, if using supervised learning techniques, the monetary and temporal cost of labeling such a huge dataset.

What is Transfer Learning?

What if we didn’t have to start from scratch? What if instead of starting from zero every time you wanted to create a deep learning model, you could instead start with a model that already understood the basics of language? That’s the promise of an area of machine learning known as transfer learning. By looking at a massive corpus of language up front (typically hundreds of millions of labeled examples), transfer learning can create a basic understanding of language. Using this starting point, enterprises can use deep learning to their advantage even if their training datasets are orders of magnitude smaller than what is typically required.

At a low level, transfer learning “teaches” the model basic concepts of language. Things like synonyms, grammar, and basic syntactic structures. The model still needs help to understand specific documents: be it news reports, legal documents, or social media; but because the model already understands the basic structure of language, it can learn specific concepts much more quickly. You’re not going to get a lawyer model off the shelf, but you can at least get a model that speaks english.

Applying Transfer Learning

If you’re working with unstructured text and image data, Indico’s Custom Collection API enables you to build custom models with just a few hundred (or at most, a few thousand) labeled examples by taking advantage of our high quality feature embeddings. For ideas on how to use Custom Collection for the task you’re trying to solve, explore some of our use case tutorials:

And before you ask…yes, we needed hundreds of millions of examples to build our transfer learning model that enables indico customers to make use of Deep Learning with only a few dozen examples for their use case!

[addtoany]

Increase intake capacity. Drive top line revenue growth.

Schedule Demo

Unstructured Unlocked podcast

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Listen Now

March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Listen Now

March 13, 2024 | E42

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

Listen Now

View All

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Get Started

Industry

Use Cases

Get Started

Resources

Documentation

Customer Stories

Get Started

Get Started

Get Started

Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)

BLOG

Making Deep Learning Practical with Smaller Datasets

What is Transfer Learning?

Applying Transfer Learning

Increase intake capacity. Drive top line revenue growth.

Related Posts

Announcements, Machine Learning

Understanding Indico’s Staggered Loop

Machine Learning, Release Notes

Release Notes – Indico Unstructured Data Platform v5.3

Citizen Developer, Machine Learning

Overcome the complexity of machine learning: get to know machine teaching

Unstructured Unlocked podcast

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

Get started with Indico

Schedule1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

Get our best content on intelligent automation sent to your inbox weekly!

Schedule
1-1 Demo