jp-first-sighting

Here at indico, we like to move quickly. Combine our passion for creating powerful tools with our left brain impulse for GSD, and you have our summer in a nutshell. It’s been awesome. So what should you be looking out for and why should you care? For those of you who are already familiar with our services, we’re filling out our offerings and making it easier to compare results (see the Intersections tool below), as well as grab all the metadata you need in one call (in other words, get batch results from multiple models). For those of you who are unfamiliar with our stuff, you came at a good time! It should now be easier than ever to start exploring your data using indico. Now, let’s see what we’ve got.

Note: I’m going to be using Python for the examples shown below, but we also have language wrappers in Ruby, node, Java, R and PHP. If none of those are your jam, you can also make calls to one of our RESTful endpoints directly.

New Text Models


Keywords (And It’s Multilingual!)

This model extracts the less common words from phrases, giving you an idea of what the salient parts of a piece of text are. Although different in its implementation, it’s like a low level version of text_tags in that the words it pulls out may be indicative of a topic area, but the only prediction we’re doing is deciding which words are likely to stand out — and therefore are most important — in the context.

Also, it’s available in multiple languages! We’re starting to work on finding training data in various languages so that you’re not restricted to only analyzing text in English. Admittedly, it will take time, but we’re eager to support our models in as many languages as we can find good data for.

import indicoio

indicoio.keywords('Did you hear about that coffee shop that has kittens for you to play with?')

# returned results
{
  u'coffee': 0.16531870846662627,
  u'kittens': 0.2311858662209601,
  u'shop': 0.15917130269049332
}

Named Entity Recognition (NER)

Our NER model predicts which words and phrases in a piece of text refer to a specific person, place, or organization. This is helpful when you want to catch mentions of well-known proper nouns. If the model believes that a word is a named entity but is unsure of its class, it will predict unknown according to its uncertainty. Confidence describes the overall certainty in the word being a named entity.

import indicoio

indicoio.named_entities("I'm gonna head over to the AMC by Boston Common and see the movie on NWA")

# returned results
{u'AMC': {
   u'categories': {
     u'location': 0.22840071129086342,
     u'organization': 0.4190438981162218,
     u'person': 0.12222254114599808,
     u'unknown': 0.2303328494469167
   },
   u'confidence': 0.936886019151214
 },
 u'Boston': {
   u'categories': {
      u'location': 0.6753939885583797,
      u'organization': 0.12916238189767898,
      u'person': 0.043988239066485295,
      u'unknown': 0.15145539047745604
   },
   u'confidence': 0.9767122494980794
 },
 u'NWA': {
      u'categories': {
      u'location': 0.1446359844360483,
      u'organization': 0.46923487108690415,
      u'person': 0.16775268664381932,
      u'unknown': 0.21837645783322826
   },
   u'confidence': 0.986235929098924
 }
}

Twitter Engagement

Austin covered this in more depth in his post, but just to recap: get an idea of how your tweet will be received! This model has learned the features of tweets that get large amounts of retweets and favorites. All you need to do is throw in your tweet’s text.

import indicoio

indicoio.twitter_engagement('What time is it? fav and RT for a chance to win some sweet #adventuretime swag! #algebraic')

indicoio.twitter_engagement('Hello World!')

# returned results
# 1. 0.582484619046522
# 2. 0.30013532183097935

Intersections

This tool isn’t actually a new model, but it’s a new way of interacting with our existing ones. All you need to do is provide some text data, and then choose two of our models you’d like to examine for a possible relationship. Intersections will then tell you the correlations among our results and how confident it is in these connections. Still not quite sure what it does? Here’s a quick example using the first chapter of The Scarlet Letter.

import indicoio

# I choose to only see the top 3 most related topics
indicoio.intersections(prison_door, apis=['sentiment_hq', 'text_tags'], top_n=3)

# returned results
{u'sentiment_hq': 
  {
    u'buddhism': {
      u'confidence': 0.9915452711149384,
      u'correlation': 0.7454545454545454
    },
    u'diy': {
      u'confidence': 0.9927179589915709,
      u'correlation': -0.7545454545454546
    },
    u'left_politics': {
      u'confidence': 0.9902404640400831,
      u'correlation': 0.7363636363636363
    }
  }
}

The dictionary is nested according to the order in which you specify the two APIs (no more than two). Here we have sentiment_hq as our primary key, and the possible text tags topics at the second level of nesting. Finally, we have the correlation between the sets of values and the model’s confidence in its prediction.


Analyze Text

This tool allows you to combine calls when you’d like to analyze the same set of text with multiple models. It’s a great baseline if you’d like to test indico out. However, we recommend that you use a subset of your total data, or store the results to avoid running out of calls when experimenting. This example uses just the first three sentences.

import indicoio

# Using the first 3 sentences, keywords returns top 3 by default
indicoio.analyze_text(prison_door[:3], apis=['sentiment_hq', 'keywords'])

# returned results
{
  u'keywords': [
    {
      u'bareheaded': 0.06061224599523786,
      u'steeple': 0.051910770347179246,
      u'timbered': 0.06061224599523786
    },
    {
      u'allot': 0.05567700702194627,
      u'cemetery': 0.055980665842299215,
      u'portion': 0.08275732345894946
    },
    {
      u'churchyard': 0.05946816424488492,
      u'seasonably': 0.06303141736695839,
      u'sepulchres': 0.06303141736695839
    }
  ],
  u'sentiment_hq': [
    0.5535060167312622,
    0.8725922107696533,
    0.35406574606895447
  ]
}

New Image Models


Content Filtering

Automatically determine if an image is NSFW. Just toss it a compatible image format and see the number reflect how inappropriate the content is on a scale of 0 to 1. 0 is appropriate, 1 is inappropriate.

poptarts
import indicoio

# from within downloads, filepath must be relative
indicoio.content_filtering('poptarts.jpg')

# returned results
0.3174145519733429

Facial Localization

Use this tool to find faces in an image — just give it a picture and you’ll receive a list of coordinates that bound each face.

Bonus Points! Our FER (Facial Emotion Recognition) model can now take the argument detect_faces=True to automatically analyze just the faces in a picture. Nice!

import indicoio

# from within downloads, filepath must be relative, using the pic from the beginning
indicoio.facial_localization('jp-first-sighting.jpg')

# returned results
[
  {
    u'bottom_right_corner': [1883, 556], 
    u'top_left_corner': [1371, 44]
  },
  {
    u'bottom_right_corner': [781, 722],
    u'top_left_corner': [222, 163]
  }
]

Analyze Image

Finally, we have analyze_image, which is the same as analyze_text, but for images (bet you didn’t see that coming). Here I’ll do a quick mashup with image_features and content_filtering.

shaq-and-billcarrie-and-harrisonlouis-paul-duke

import indicoio

indicoio.analyze_image(['shaq-and-bill.jpg', 'carrie-and-harrison.jpg', 'louis-paul-duke.jpg'], apis=['image_features', 'content_filtering'])

# returned results
{
  u'image_features': [
    [ list of 2048 floats ],
    [ list of 2048 floats ],
    [ list of 2048 floats ]
  ],
  u'content_filtering': [
    0.23042018711566925,
    0.14787468314170837,
    0.19382698833942413
  ],
}

And the Finale…New Documentation!

We’re excited to announce that we’ve moved to self-hosted docs! They’ll be easier for us to edit, which means we can build new models for you much faster. Using self-hosted docs also means we’ll have an easier time adding nice little features, like jumping to specific pieces of documentation for a specific language, among many other benefits. If you see any issues with them, feel free to send a message via our chat. They’re still a work in progress and I’d be happy to help out.

docs

Thanks for reading, and I hope you have fun with the new toys! Please feel free to contact us if you have any questions; we’re always down to work with you to figure things out.

Suggested Posts

A Biased Debrief of the Boston Deep Learning Conference

Regressor Based Image Stylization

Sequence Modeling With Neural Networks (Part 1): Language & Seq2Seq