Indico Data receives top position in Everest Group's Intelligent Document Processing (IDP) Insurance PEAK Matrix® 2024
Read More
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report

BLOG

Create an Image Similarity Web App Using the indico API [Python and JavaScript]

April 10, 2015 | Developers, Tutorials

Back to Blog

Introduction

This tutorial will teach you how to create a simple image similarity web application using the indico Image Features API with Python and JavaScript. The process is composed of two parts:

  1. Cache your dataset’s image features using Python.
  2. Create your web app using JavaScript.

This tutorial assumes that you’re already familiar with Python and JavaScript.
Where to get help:
If you’re having trouble going through this tutorial, please email us at contact@indico.io.

What does the Image Features API do?

The Image Features API is used as a building block for creating other machine learning models. For a given image, it returns a vector of feature information that can be used to compare the image with other images. For example, you’ll see in this tutorial that taking the cosine similarity between two feature vectors allows you to quantify how similar the images are in color, shape, texture, etc.

Getting Started

First, to get started with the sample data used in this tutorial, you should clone this Github repo.

Next you’ll need to install the indicoio Python library. To do this just go to your terminal and install using pip:

$ pip install indicoio

Alternatively, install the indico Python wrapper directly from Github.

If you run into any problems, check the Installation section of the docs.

Once the client library is installed, get an indico API key using the Quickstart Guide. We recommend setting your API key in an environment variable, $INDICO_API_KEY, which the client library will automatically know to look for. However, you can also put your API key in your configuration file or pass it in directly when you call the API, if you prefer.

Step 1: Preparing Your Images [Python]

Before you can begin building your app, make sure that your images are in a canonical image format. One way you can visualize whether or not they are in a readable format is with the matplotlib library in Python.

from matplotlib import pyplot as plt

Then, use SciPy’s imsave and imread functions, which will make sure they are loaded in the correct format.

from scipy.misc import imsave, imread

Then import the indico Image Features API and JSON (a useful, lightweight format used for data interchanging).

from indicoio import image_features
import json

Note: If possible, it’s better to use color images over grayscale because they are richer in information.

Step 2: Compute and Cache Your Images’ Features [Python]

The image features are fixed for each image, which means they can be cached for later use. It’s better that you store the features so you don’t have to query every time – if you have a million images, it will take some time to ping the server for each image. Compute the image features for all images and save it to JSON.

features = []
for i in range(250):
    img = imread('imgs/%s.png'%i)
    features.append(image_features(img)) json.dump(features, open('features.json', 'wb'))

Once you have run your images through this Python application using the indico API and stored them, you’re ready to start building the front-facing application in JavaScript.

Step 3: Set Up Basic App Structure [JavaScript]

Now, let’s move into the JavaScript component of this tutorial – the creation of the actual web app. First, set up some basic HTML structure.

compare_idx = 0;
function gen_html(i){
    istr = i.toString();
    html = '    html += '.png” height="64" width="64" distance="'
    html += istr
    html += '">'
    return html
}
function load_images(){
    html = '';
    for (i = 0; i < 250; i++){
        html += gen_html(i);
  }
  $('#container').append(html);
}

Step 4: Compute Distance Metrics [JavaScript]

Computing distance metrics between the cached feature vectors allows you to understand how similar or different two vectors are to one another. This is the process that actually figures out the similarity between one image and another. Depending on the problem you’re trying to solve, you can either use the cosine distance function or the Euclidean distance function – both of them have relatively similar mechanisms.

For cosine, imagine images as being vectors in some n-dimensional space. Two vectors define two directions in this space and therefore there is an angle between them. The smaller the angle, the more similar they are to one another. One potential advantage of cosine similarity is that it is scale invariant, as it does not depend on the relative magnitudes of the vectors, but only on their directions. If you think cosine is more appropriate for your problem, you can use this function in JavaScript.

However, the cosine function in JavaScript is less convenient to use because it depends on another function as well. Consequently, we prefer to use Euclidean distance instead.

function euclidean (a, b){
        //Euclidean distance between two vectors
        var d = 0;
        for (i = 0; i < a.length; i++){
            d += Math.pow(a[i] - b[i], 2)
        }
        return Math.sqrt(d)
}

Euclidean distance perceives vectors as locations, in the aforementioned n-dimensional space. The function calculates the distance between the two locations – again, the closer they are to each other, the more similar they are.

Step 5: Query [JavaScript]

Finally, write the querying function. This goes through all of your images and computes the distance between the image you clicked on and all the other ones in the set, and then sorts them based on this.

Begin by loading your stored image features, then initialize isotope, a library for animated filtering. After that, compute the similarity measures for all images to the selected images, which are then used by isotope to sort them. Lastly, reorder the images.

$(window).load(function(){
        $.getJSON('features.json', function(features) {
        load_images() //load images into html
        var $container = $('container').isotope({
            getSortData: {
                distance: '[distance]'
            }
        });
        $('img').click(function() {
            compare_idx = $(this).attr('id');
            for (var idx=0; idx < 250; idx++){
                distance = euclidean(features[compare_idx], features[idx]);
                $("#"+idx.toString()).attr('distance', distance);
            }
            $container.isotope( 'updateSortData', $container.children() );
            //low distance in feature space is equivalent to high similarity
            //since sort from low to high, most similar results are first
            $container.isotope({ sortBy: 'distance'}));
        });
    });
});

Now you have an image similarity web app using JavaScript, Python and the indico Image Features API!

Going Further

What can you add on top of this framework? Try tweaking it into a simple recommendation system for clothes, or cars – etc. There are also other applications for Image Features to explore, such as identifying objects within an image, logo detection, and artifact curation. If you hack something cool with our APIs, let the Indico team know.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

[addtoany]

Unstructured Unlocked podcast

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

podcast episode artwork
March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

podcast episode artwork
March 13, 2024 | E42

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

podcast episode artwork

Get started with Indico

Schedule
1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!