In knowledge driven economies, it’s increasingly common for many jobs to revolve around the consumption and analysis of information. This may entail, for example, reading dozens of articles to draw conclusions about emerging trends. Doable, right? Imagine, though, if you were a financial analyst and had to read through hundreds of articles and reports each day in order to make informed decisions. Suddenly, the task becomes much more daunting. It’s difficult to keep track of what’s important when you’re overloaded with information and your time is limited.
For text analytics problems like this, machine learning can come to the rescue. The new indico Summarization API is designed to read through an article and pick out key, “big idea” sentences so you can quickly determine whether the article’s content will be useful to you. In so doing, it enables you to better allocate your time towards analysis of the information that is most interesting and relevant.
If you want to test the API without messing around with code, feel free to play with our Summarization demo. Copy + paste an article of your choosing, set the threshold for how short or long you want the summary to be, and it will present you with key sentences from the article. Keep in mind that this demo only allows you to analyze one article at a time — in the brief code walkthrough below we’ll show you how to pull summaries for several articles in one go.
Let’s take a look at some of the results. We ran a couple of articles through the demo and compared the summaries it produced to what humans thought were the key themes of the article.
Article #1: Apple: Down, But Not Out
(word count = 1,510)
Summarization API (word count = 196):
“Just a week ago, investors worldwide were stirring in anticipation of Apple’s (NASDAQ:AAPL) fiscal Q4 earnings release. The stock climbed above $120 in after-hours, but ended the week down 2.47%. Last month I wrote an article on Apple, unveiling my new long position in the stock. The fact that Apple is still above its downtrend and above its 40-week moving average is all I need to know. Long investors want to see a weekly close above $110 on Apple to keep momentum in the stock short term. Zooming into a daily of Apple, we can begin to look at where the trend is invalidated. This uptrend will remain in place as long as Apple holds above the $111.00 level. I would not turn bearish if Apple broke $111.00, this would simply be the first red flag for this new bull market in Apple. Apple’s simple 200-day moving average currently sits at the $103.48 level. In my opinion a close below this level of more than 1% would not be healthy for the stock. As long as Apple remains above $102.44 on a closing basis, I remain behind my thesis that Apple should see $140.00 in 2017.”
Human (Taylor Dart, the article’s author):
“Apple beat both revenues and earnings estimates in its fiscal Q4 earnings release. Despite this beat, the stock has since corrected 6% from its highs above $118.00. I don’t see any reason for investors to panic at current levels as the stock’s uptrend remains intact.”
(word count = 837)
Summarization API (word count = 106):
“When Tesla (NASDAQ: TSLA) announced in June it would buy SolarCity (NASDAQ: SCTY), I got really excited. For months, I’d been studying SolarCity’s financials and I had been stunned by the many misrepresentations of its value in the company’s publications. Unfortunately, the 3rd parties brought to the table by Tesla and SolarCity did not offer a lot of joy. They stuck to the numbers given to them by the people who paid them, so the story of deception just continued. Maintenance costs and non-recourse debt. If you plan to buy panels from SolarCity yourself, make sure you’re getting new ones. The money is in the briefcase.”
Human (robiniv, the article’s author):
“Tesla’s latest effort to persuade investors to buy SolarCity misrepresented the company’s NPV of future cash flows. With only 1.1 billion of such NPV, SolarCity on its own would not be able to pay off its recourse debt. If you plan to buy panels from SolarCity yourself, make sure you’re getting new ones.”
In order for our algorithm to complete this summarization task effectively, we had to overcome several challenges. For instance, many summarization algorithms tend to pick similar sentences to form the synopsis, resulting in redundancies. This happens because most algorithms are trained to choose sentences that are “central”, or similar to a large number of other sentences in the article. We still want our algorithm to pick out sentences based on repeated concepts or themes, but we also reward it for ensuring the sentences it extracts differ from each other.
Another issue we had to overcome was the tendency to select sentences that are similar to other sentences, but aren’t necessarily worded like a summary. For example:
“We voted to leave the European Union and become a fully independent, sovereign country,” the Conservative leader told party members this weekend. “We are not leaving the European Union today to give up control of immigration again.”
We solved this issue by using a special scoring system based on indicators like concision, quotation marks, or the use of pronouns. These are just a couple of the concerns we addressed while building the Summarization API before we were satisfied with the results.
Using Summarization (Code)
Cool! So you know the sorts of results Summarization produces. What about implementation?
First things first: you’ll need your API key in order to run the following code and use the API — if you don’t have one, you can sign up for a free indico account. We’ll be working in Python. We will be releasing Ruby, Java, PHP, node.js, and R libraries via our docs in the coming weeks.
Now, if you haven’t already installed the indicoio PyPi package, you can easily do so using pip.
pip install indicoio
If you have any trouble installing the indicoio package, here are some solutions to common installation issues.
Running the API is as simple as:
import indicoio indicoio.config.api_key = 'YOUR_API_KEY' indicoio.summarization(“This is the text from the article that I want analyzed.”)
Yeah, you really could do it in just three lines of code. You probably want to analyze more than one article though, and that requires some standard data preprocessing. Check out our SuperCell Summarization Github repo for a simple, readymade program that will take a CSV of articles, preprocess and analyze them, and then print out their summaries. Go ahead and tweak it to your preference! I doubt all of you have your text data neatly stored in CSV files; chances are they’re all jumbled in a monstrous database or directory somewhere…sadly not a problem we can solve here!
Using that little program, we analyzed these five articles:
- Brazil hit by more punches amid historic recession
- Brexit fears send British pound to new 31-year low
- Does India’s booming economy really need a rate cut?
- Theranos cutting hundreds of jobs as it shutters labs
- What Note 7 crisis? Samsung stock hits new high
Run the program on the CSV file (containing the five articles) you downloaded from the repo to see what you get back! If you want to combine Summarization with other machine learning tools tailored to your needs, check out our Custom Collection API to build models of your own.
Questions about indico’s Summarization API or feedback on how to improve it? Reach out to us at firstname.lastname@example.org.