Sequence Modeling With Neural Networks (Part 1): Language & Seq2Seq

Training and generation process of a language model.

This blog post is the first in a two part series covering sequence modeling using neural networks. Sequence to sequence problems address areas such as machine translation, where an input sequence in one language is converted into a sequence in another language. In this post we will learn the foundations behind sequence to sequence models and how neural networks can be used to build powerful models capable of analyzing data that varies over time.

Language Models

The foundation of sequence modeling tasks, such as machine translation, is language modelling. At a high level, a language model takes in a sequence of inputs, looks at each element of the sequence and tries to predict the next element of the sequence. We can describe this process by the following equation.

Y_t = f (Y_t-1)

Where Y_t is the sequence element at time t, Y_t-1 is the sequence element at the prior time step, and f is a function mapping the previous element of the sequence to the next element of the sequence. Since we’re discussing sequence to sequence models using neural networks, f represents a neural network which predicts the next element of a sequence given the current element of the sequence.

Language models are generative; once trained they can be used to generate sequences of information by feeding their previous outputs back into the model. Below are diagrams showing the training and generation process of a language model.

Figure 1: The sequence is ABCD. The input sequence is a slice of the whole sequence up to the last element. The target sequence is a slice of the whole sequence starting from t=2.

diagram showing the training and generation process of a language model (#2)

Figure 2: (Left) During training, the model tries to predict the next element of the target sequence given the current element of the target sequence. (Right) During generation, the model passes the result of a single prediction as the input at the next time step.

There are many different ways to build a language model, but in this post we’ll focus on training recurrent neural network-based language models. Recurrent neural networks (RNNs) are like standard neural networks, but they operate on sequences of data. Basic RNNs take each element of a sequence, multiply the element by a matrix, and then sum the result with the previous output from the network. This is described by the following equation.

h_t = activation(X_tW_x + h_t-1W_h )

An in-depth explanation of RNNs is out of scope for this post, but if you want to learn more I’d recommend reading Andrej Karpathy’s “The Unreasonable Effectiveness of Recurrent Neural Networks” and Christopher Olah’s “Understanding LSTM Networks”.

To use an RNN for a language model, we take the input sequence from t=1 to t=sequence_length – 1 and try to predict the same sequence from t=2 to t=sequence_length. Since the RNN’s output is based on all previous inputs of the sequence, its output can be be expressed as Y_t = g ( f (Y_t-1, Y_t-2, …Y_t1 )). The function f gives the next state of the RNN, while the function g maps the state of the RNN to a value in our target vocabulary. Put simply, f gives a hidden state of the network, while g gives the output of the network – much like a softmax layer in a simple neural network.

Unlike simple language models that just try to predict a probability for the next word given the current word, RNN models capture the entire context of the input sequence. Therefore, RNNs predict the probability of generating the next word based on the current word, as well as all previous words.

Sequence to Sequence Models

RNNs can be used as language models for predicting future elements of a sequence given prior elements of the sequence. However, we are still missing the components necessary for building translation models since we can only operate on a single sequence, while translation operates on two sequences – the input sequence and the translated sequence.

Sequence to sequence models build on top of language models by adding an encoder step and a decoder step. In the encoder step, a model converts an input sequence (such as an English sentence) into a fixed representation. In the decoder step, a language model is trained on both the output sequence (such as the translated sentence) as well as the fixed representation from the encoder. Since the decoder model sees an encoded representation of the input sequence as well as the translation sequence, it can make more intelligent predictions about future words based on the current word. For example, in a standard language model, we might see the word “crane” and not be sure if the next word should be about the bird or heavy machinery. However, if we also pass an encoder context, the decoder might realize that the input sequence was about construction, not flying animals. Given the context, the decoder can choose the appropriate next word and provide more accurate translations.

Now that we understand the basics of sequence to sequence modeling, we can consider how to build a simple neural translation model. For our encoder, we will use an RNN. The RNN will process the input sequence, then pass its final output to the decoder sequence as a context variable. The decoder will also be an RNN. Its task is to look at the translated input sequence, and then try to predict the next word in the decoder sequence – given the current word in the decoder sequence, as well as the context from the encoder sequence. After training, we can produce translations by encoding the sentence we want to translate and then running the network in generation mode. The sequence to sequence model can be viewed graphically in the diagrams below.

Figure 3: Sequence to Sequence Model – the encoder outputs a sequence of states. The decoder is a language model with an additional parameter for the last state of the encoder.

This concludes Part 1 of our series on sequence to sequence modeling. We saw how simple language models allow us to model simple sequences by predicting the next word in a sequence, given a previous word in the sequence. Additionally, we saw how we can build a more complex model by having a separate step which encodes an input sequence into a context, and by generating an output sequence using a separate neural network. In Part 2, we discuss attention models and see how such a mechanism can be added to the sequence to sequence model to improve its sequence modeling capacity. If you have any questions feel free to reach out to us at contact@indico.io.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

Schedule Demo

Unstructured Unlocked podcast

April 24, 2024 | E45

Unstructured Unlocked episode 45 with Daniel Faggella, Head of Research, CEO at Emerj Artificial Intelligence Research

Listen Now

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Listen Now

March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Listen Now

View All

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Get Started

Industry

Use Cases

Get Started

Resources

Documentation

Customer Stories

Get Started

Get Started

Get Started

Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)

BLOG

Sequence Modeling With Neural Networks (Part 1): Language & Seq2Seq

Language Models

Sequence to Sequence Models

Increase intake capacity. Drive top line revenue growth.

Related Posts

Announcements, Machine Learning

Understanding Indico’s Staggered Loop

Machine Learning, Release Notes

Release Notes – Indico Unstructured Data Platform v5.3

Citizen Developer, Machine Learning

Overcome the complexity of machine learning: get to know machine teaching

Unstructured Unlocked podcast

Unstructured Unlocked episode 45 with Daniel Faggella, Head of Research, CEO at Emerj Artificial Intelligence Research

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Get started with Indico

Schedule1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

Get our best content on intelligent automation sent to your inbox weekly!

Schedule
1-1 Demo