Recognizing Emotion in Text with Machine Learning (No Code Required)
June 15, 2016 / Business, Text Data Use Case, Tutorials
Hey guys! I’m Julie, Director of Ops, back for a quick tutorial of two awesome new offerings we’ve built for you. I’m on the non-technical / cat loving side of things so I break it down a bit. I also run the Machine Learning Without a PhD group on LinkedIn where jargon isn’t allowed. Feel free to join if you’d like 🙂 #machinelearning4everyone
So today you’re getting a 2-for-1 (or as my dad likes to call it a “TooFer”).
Today’s topic will be the 2016 Tony Awards that aired this past Sunday night.
After logging in, you’ll see the toolkit underneath your dashboard on the lefthand side of the screen, or you can just go to https://indico.io/dashboard/analyze…and here we are.
The toolkit is an awesome place to parse text for quick insight into your data using the following text APIs:
- Sentiment Analysis
- Text Tags
- Language Detection
- Political Analysis
Check out our Product page if you’d like further explanation of each API.
Anyway, as I mentioned earlier, we’ll be checking out our new Emotion API. We took the classic problem of Sentiment Analysis and kicked it up a notch to offer more depth and understanding to text in terms of five major emotions: anger, fear, joy, sadness, and surprise.
So let’s talk about Hamilton. You know, that musical that was nominated for a record-setting 16 Tony Awards. (16!) I haven’t been lucky enough to see this show and you’re a unicorn if you’ve managed to get tickets to watch it. In case you didn’t know – you have to pay to even get into the lottery for a chance at purchasing a ticket. Yep.
I grabbed some data in preparation of the whirlwind on Twitter for #Hamilton at the #TonyAwards – around 100 random tweets from Sunday afternoon (before the awards) that I then popped into the indico Toolkit. Paste it in, click on the API you want to use, and voilà! Machine learning.
The Toolkit will return a CSV file with four columns: authored text, predicted emotion, a linguistic assessment of confidence (e.g. “Very Confident”), and a numerical confidence score (e.g. “4”). So, what did we find? Joy. All the joy. Here’s a sample of the machine learning algorithm’s most confident assessments of joy:
I then took a quick pulse after the Tony Awards were over to see whether anything had changed. Not surprisingly, we still have a lot of happy people.
There were some results that algorithm really wasn’t confident about though, so it’s a good idea to do a bit of post-processing to remove any results that are marked as “less confident” or completely “not confident”. Here’s an example of a tweet that the model thought was expressing sadness, but marked as “not confident” about this result:
AND TOTALLY WELL DESERVED!!! ðŸ˜ðŸ˜ðŸ˜ðŸ˜ https://t.co/abCmw8QRVN
You can see why it would be confused by that. I’m confused by that.
We’re working on adding a feature to the Toolkit that lets you set a threshold so that the less/not confident results are automatically removed, but for now I just sorted everything in my spreadsheet based on the confidence score.
Let’s take an aggregate look at all the emotions expressed after Hamilton walked away with 11 wins. Yes, I said 11. You read that right. Anyway – I created this handy-dandy spreadsheet designed to autofill the results into a bar chart, and pasted everything from my downloaded CSV file right in.
Here’s a closer look at the results, which I’ve post-processed to highlight the ones that the algorithm felt most confident about:
Shocker – joy and surprise are the overwhelming leaders. I thought it was interesting that there were any instances of fear and sadness though, so I took a closer look. Here’s an example of what the algorithm thought felt fearful:
What a time to be alive @HamiltonMusical
Apparently that references a song from the Hamilton soundtrack. Just reading this tweet without having heard the song and not really knowing the context surrounding it…I could see it leaning towards either joy or fear. Maybe even sadness. My guess is that without an exclamation mark (i.e., What a time to be alive! Woohoo!) the algorithm figured (quite reasonably) that fear was the best choice.
Let’s take a look at what it thought felt sad:
.@HamiltonMusical literally saved my life last year. Got through one of the toughest times of my life thanks to this show. #TonyAwards
Perhaps that’s more hopeful than sad, but there’s definitely a sad element to it. Given the constraints of just five emotions for the algorithm to pick and choose from, “sad” is a pretty reasonable prediction. Perhaps we should try to add more categories — what do you think? Reach out to me at firstname.lastname@example.org with your thoughts!
The one emotion that wasn’t represented in that chart above was anger. In fact, there was one tweet that was marked as angry, but I didn’t include it as it didn’t meet the “very confident” threshold. However, I’m going to show it to you anyway because I thought it was pretty cool that the model picked up on it (?):
a bunch of Tonys could be the thing that finally gets the mainstream press to pay attention to #HamiltonMusical #TonyAwards
So, why is this awesome?
Think about how this algorithm scales. I only looked at 100 tweets here, but if you have 1,000, or 10,000 – or even 100,000 tweets, you can have an immediate gut check on your brand or campaign in seconds, without having to read through every single one until you want to cry. Get the information you need quickly, and then get on to what really matters to you, whether it’s writing a report on the success of your recent ad campaign, or figuring out who your best target audience is. Looking at demographics? That’s old news. Now you can understand – really understand – the people you’re trying to talk to.
And there you have it. Unstructured text data in….emotions out. Email me at email@example.com if you want the template (the one that produced that bar chart) for your own use.
And in closing…