How does the ELMo machine learning model work?
June 12, 2019 / Ask Slater, Machine Learning

At its core, ELMo is an RNN model with an LSTM gating setup that’s trained as a bidirectional language model.
But that’s not really what you asked. You asked how it works. It’s often really tough to explain how a machine learning model works. Importantly you have to understand that no part of what I said above is unique to ELMo. To understand why it “works” you have to understand what they changed from previous approaches, get pretty far into the weeds on exactly how they implemented this, and look at some benchmarks that show you exactly how well it works in which contexts.
Man, that sounds like a lot of work though. Don’t worry, someone already did it for you: Deep contextualized word representations
You see, the nice thing about Machine Learning is that it’s arguably the most open field of study that has ever existed. This is partially because it’s driven in large part by industry labs that have a large vested interest in keeping this research open. This field is driven by arxiv and conferences, not elsevier. That means that basically any advance in the field that’s important is public (including ELMo)
Not only have they made public all of their research, but they’ve actually released code and pretrained models along with it. Everything you can ask for. Unfortunately machine learning isn’t a trivial space to learn. If the paper I linked above doesn’t make sense then start with my first sentence. Break it down and learn the piece parts (Recurrent Neural Network, LSTM, gating, language modeling, bidirectional language modeling) even that though requires a pretty significant math background.
If you don’t understand the paper and you don’t have the needed math to understand the terms I’ve laid out above you’ve got two options:
- Learn the math. Once you’ve gotten through linear algebra, partial differential equations, and basic graduate-level statistics then this should be approachable
- Tell everyone that deep learning is an evil black box that should never be used and that it’s impossible to understand.
Most people in this spot pick #2. Hopefully you pick #1.
View original question on Quora >
Follow Slater on Quora >>
Author
Slater Victoroff
Other Categories
- Announcements
- artificial intelligence
- Ask Slater
- Business
- Case Study
- Citizen Developer
- Commercial Banking
- Data Science
- Developers
- Featured Writers
- Financial Services
- Hackathon Spotlight
- Image Data Use Case
- indico
- Insurance
- Intelligent Process Automation
- Machine Learning
- Opinion Piece
- Release Notes
- Robotic Process Automation
- Text Data Use Case
- Tutorials
- Uncategorized
- Use Case
SUBCRIBE TO OUR BLOG
Stay up to date with the latest AI, RPA & Intelligent Process Automation content and news
Related Posts
artificial intelligence, Intelligent Process Automation, Machine Learning, Robotic Process Automation
indico | November 13, 2020
3 Essential Factors in the Intelligent Automation Build vs. Buy Decision
Like most any company in the artificial intelligence space, a question we often deal with from potential customers looking at […]
Citizen Developer, Data Science, Intelligent Process Automation, Machine Learning
indico | August 24, 2020
How Intelligent Process Automation Enables Citizen-led AI Development
A longstanding problem that has plagued artificial intelligence projects, including document process automation, is complexity. Essentially, the issue is there’s […]
artificial intelligence, Intelligent Process Automation, Machine Learning, Uncategorized
indico | June 29, 2020
Not All Intelligent Process Automation Requires Million-dollar Hardware
While the artificial intelligence market is unquestionably enjoying rapid growth, cost is a gating factor that gives some companies pause. […]