Indico Data receives top position in Everest Group's Intelligent Document Processing (IDP) Insurance PEAK Matrix® 2024
Read More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report

BLOG

Happy Chinese New Year! Going from Tweets to Trends in Minutes

January 31, 2016 | Business, Developers, Text Data Use Case, Tutorials

Back to Blog

Happy Chinese New Year!

With the lunar new year around the corner, we can’t help but wonder what the latest trends are during this special time of the year. These trends are not only interesting but also crucial for goods and service providers to capitalize on this spirit of celebration. Here at indico, we want to demonstrate how quickly and easily these kinds of trends can be found to empower businesses with the knowledge of what people buy and what they care about from potential customers themselves.

In this demonstration, we will be looking at Twitter activity relevant to Chinese New Year approaching the time of the occasion. We will be using Python (2.7+), an unofficial Twitter client library a third party, and indico’s Python client library. Essentially, we will be getting tweets from Twitter, extracting the text and any images, and feeding them to indico’s Keywords API and Image Recognition API.

To forego the technical bits, go ahead and jump to the bottom of the post.

Getting Started

Authentication is always rough, especially for those new to using online APIs. For Twitter, you will need the following:

Consumer Key
Consumer Secret
Access Token
Access Token Secret

To obtain these keys, you will need to:

1. Sign into a Twitter Account
2. Navigate to https://apps.twitter.com/
3. Click Create New App
4. Fill in the details. Feel free to put any placeholder website under Website and leave Callback URL empty.
5. Read and agree to the Developer Agreement
6. Should be navigated to your newly created Application page. Go to Keys and Access Tokens
7. Click Generate My Access Token and Token Secret
8. Now, on this page, you should have access to all 4 keys.

For indico, simply sign into https://indicodata.ai and find your API Key at the top of your dashboard.  If you have troubles, chat with us straight from the chat window.

Installation

Assuming you have a development environment (access to a terminal or command prompt), go ahead and install the following:

1. Python 2.7+
2. PIP (a Python package manager)
3. Install TwitterSearch via PIP
4. Install indicoio via PIP

The Code

Here are our steps: get tweets, give it to indico, and get results.  Don’t worry, the code is simple and straightforward, we promise.

     try:
        # Construct Search Query
        tso = TwitterSearchOrder()
        tso.set_keywords(tags)
        tso.set_include_entities(True)
        # Authorize the Client
        CLIENT =  TwitterSearch(
            consumer_key="",
            consumer_secret="",
            access_token="",
            access_token_secret=""
        )
        tweets = CLIENT.search_tweets_iterable(tso)
    except TwitterSearchException as e:
        # Catch Potential Search Exceptions
        print e

Extracting the Text and Images

Twitter will give us loads of information about the tweets. That is extremely nice of them, but we only need certain parts.

    # Extract Data
    extracted_data = {}
    for tweet in tweets:
        #extracting the entities / media for images and hashtags
        entities = tweet.get("entities", {})
        media = entities.get("media", [])
        #info will hold our tweet information
        info = {}
        info["id"] = tweet.get("id_str")
        info["text"] = tweet.get("text", "")
        info["hashtags"] = [tag["text"] for tag in entities.get("hashtags", [])]
        info["photos"] = []
        for medium in media:
            if str(medium["type"]) == "photo":
                info["photos"].append(medium["media_url"])
        #Saving via the ID as a key to update tweets and prevent duplicates
        extracted_data[info["id"]] = info

Additionally, indico’s APIs accept URLs as inputs! indico will handle downloading the images and downsizing them for you.

Adding Indico Data

    # Adding Indico
    for tweet in tweets.values():
        try:
            tweet["keywords"] = indicoio.keywords(tweet["text"], top_n=3)
        except indicoio.IndicoError as e:
            print e
        photo_info = {}
        for photo in tweet["photos"]:
            try:
                photo_info[photo] = indicoio.image_recognition(photo, top_n=3)
            except indicoio.IndicoError as e:
                print e
        tweet["image_tags"] = photo_info
    return tweets

Now, our tweets object contains all the information about tweets we will need, including the additional analysis by indico! We can perform any kind of analysis on this data. A simple and impactful one is a frequency analysis of the tags we found. Additional tip: if you want to analyze data overtime or play with the data in a sandbox type of environment, we recommend you cache results from Twitter using cPickle (or similar libraries) to avoid being rate-limited by the API.

Analysis: Frequency

    keywords = defaultdict(int)
    imagetags = defaultdict(int)
    for tweet in tweets.values():
        for keyword in tweet["keywords"]:
            keywords[keyword] += 1
        for tags in tweet["image_tags"].values():
            for tag in tags:
                imagetags[tag] += 1
    sorted_keywords = sorted(keywords.iteritems(), key=lambda x: -x[1])
    sorted_imagetags = sorted(imagetags.iteritems(), key=lambda x: -x[1])
    keywords_top_30 = sorted_keywords[:30]
    imagetags_top_30 = sorted_imagetags[:30]

Now, we have the top thirty keywords and image tags of Twitter statuses. Taking a quick look at some of these tags, we see several keywords, such as “cny”, “chinese”, “celebrate(ing)”. These are obvious keywords given the occasion, so let’s ignore them. We can do this via a blacklist or simply ignore it. There are also nonsense keywords like “https” from links that we can ignore as well. Here’s a little bit of code to do that. Disclaimer: It is slightly inefficient but easy on the eyes.

    IGNORED_LIST = [ "https", "rt", "celebrates", "chinese","celebrate", "celebrations", "celebrating" ]
    sorted_keywords = sorted(keywords.iteritems(), key=lambda x: -x[1])
    sorted_imagetags = sorted(imagetags.iteritems(), key=lambda x: -x[1])
    filtered_keywords = filter(sorted_keywords, key=lamdba x: x not in IGNORED_LIST)
    filtered_imagetags = filter(sorted_imagetags, key=lamdba x: x not in IGNORED_LIST)
    keywords_top_30 = filtered_keywords[:30]
    imagetags_top_30 = filtered_imagetags[:30]

Results

The following is some code to look up the source tweets for certain keywords and image tags.

def lookup_keyword(tweets, keyword):
    return filter(lambda x: keyword in x["keywords"], tweets.values())
def lookup_imagetag(tweets, imagetags):
    return filter(lambda x: any([imagetags in tags for tags in x["image_tags"].values()]), tweets.values())

Keywords

  • “monkey”: Well, it is the year of the monkey so this makes plenty of sense.
  • “warriors”: Let’s say we don’t know anything about sports. Here is an example of a tweet it came from:
    RT @warriors: #Warriors Chinese New Year gear is now available at the @warriors_store! Happy shopping » https://t.co/BuzWyDQ2VV https://t.cu2026. Looks like the Warriors have Chinese New Year gear, and letting fans know. Chinese New Year themed merchandise is a great way to increase revenue for brands that find that their followers are excited about the year of the monkey.
  • “manchester”: 'RT @CNY_MCR: Looking for the perfect city to celebrate #ChineseNewYear look no further! https://t.co/Qn0rB7vyyY #Manchester https://t.co/AEu2026' It looks like the celebration in Manchester’s Chinatown is getting a lot of attention. This is a great opportunity to increase tourism revenue.
  • “sesame”: New Post on my Blog: Chinese Laughing Sesame Balls (u7b11u53e3u68d7) #ChineseNewYear SnacknnRecipe Link:u2026 https://t.co/QUD3YtKljk' People are super into Chinese sesame recipes! It might be time for fooderies to roll out their special Chinese New Year edition foods.
  • “hamper” Huh, people must be all about those hampers, or this business found a large number of users to tweet about their giveaway. A little perplexed, but hey, whatever works. 'RT @vegetarianexpre: To celebrate #ChineseNewYear weu2019re giving away 5 @wingyipstore hampers & 1 deluxe hamper! For a chance to win, follow u2026', u'To celebrate #ChineseNewYear weu2019re giving away 5 @wingyipstore hampers & 1 deluxe hamper! For a chance to win, follow & RT.' 
  • There are several, several more, but we’ll leave that as an activity for the reader!

Image Tags

For the most part, the images came with tweets that were captured in the keywords analysis, so these tags should serve to bolster the findings and find popular items that people enjoy sharing. These tags describe the most popular kinds of images that the Image Recognition API picked up on.

book jacket, dust cover, dust jacket, dust wrapper

These tags are quite lengthy, but in general captures specific group of items, such as posters, books, and other flat objects. In our case, it captured both Chinese New Year posters and banners. Several campaigns on Twitter include pictures of advertisements as images and a lot of them will also be captured in this category. There could potentially be a space for new innovative Chinese New Year banners.

carousel, carrousel, merry-go-round, roundabout, whirligig

The Chinese New Year Festivities are being recognized in this category, which kind of makes sense considering there is a limited pre-determined set of tags for image recognition.

envelope

Red Envelopes! This is clearly quite a popular trend. Perhaps, there is a space for long-distance or bulk red envelope giving (electronically or not). Red envelope designs are apparently also something to show off!

jersey, T-shirt, tee shirt
A great way to customize apparel for the Chinese New Year!

plate

Plating the Chinese New Year themed dishes, eat up! If you’re looking for some, The Food Network has some great recipes for you to try at home!

 
I hope that you’ve enjoyed this tutorial and know that you can use it for not just only the Chinese New Year but for anything you want to dig a little further into.  Knowledge is power and in this case, an ultimate competitive advantage!

Feel free to drop me a line at chris@indico.io if you have any questions, need any help, or want to talk use cases for your particular project.  We’re all here to help.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

[addtoany]

Unstructured Unlocked podcast

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

podcast episode artwork
March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

podcast episode artwork
March 13, 2024 | E42

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

podcast episode artwork

Get started with Indico

Schedule
1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!