Webinar replay: How carriers are leveraging large language models (LLMs) and automation to drive better decisions
Watch Now
  Everest Group IDP
             PEAK Matrix® 2022  
Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)
Access the Report

BLOG

Auto-tagging Interior Design Styles

March 9, 2017 | Business, Developers, Image Data Use Case, Tutorials

Back to Blog

Sorting content into categories is a key task for recommendation systems, as well as for general data management. We’ve talked a lot about text data lately — for example, using topic tags to improve article suggestions for your readership, and how you can build a custom text classifier for your specific industry or task. We didn’t even need large datasets to build those tools — and the same applies to images. Using our customizable machine learning API, Custom Collection, you can easily build a model to auto-tag images, streamlining and improving visual content recommendations and management.

The Task

Build a simple interior design style classifier using Custom Collections. More specifically, we’ll be testing to see how well it can distinguish three fairly similar styles: contemporary, minimalist, and industrial.

interior design styles

The Data

Our dataset consists of 21 images of rooms — 7 per style — that I grabbed from Google Images (all labeled for reuse). That’s right, just 21. How is this possible? Custom Collections uses a machine learning technique called transfer learning, which allows us to build models with very small datasets. In fact, depending on the difficulty of the task you’re trying to achieve, you’ll start seeing diminishing returns after the first few thousand data points.

Take note though, it’s generally better to have 10 or more samples for each category (depending on the difficulty of the problem you’re trying to solve). It’s difficult to find good free-for-use images though, so we’ll have to make do.

Training the Model

If you want to follow along, clone the dataset and skeleton code from the Github repo. We’ll be working in Python.

Before we go any further, have you set up your free indico account yet? In case you haven’t, follow our Quickstart Guide. It will walk you through the process of getting your API key and installing the indicoio Python library. If you run into any problems, check the Installation section of the docs. You can also reach out to us through that little chat bubble if you run into any trouble.

Assuming your account is all set up and you’ve installed everything, let’s get started.

Step 1: Labeling the Data

If you’re working with the dataset I provided (located in the images folder), you’ll see that each image is named after the style it represents. Open up main.py. In the generate_training_data function, you’ll see that we grabbed those filenames and used them as the labels for each image. If you decide to use your own unlabeled dataset, you can use our CrowdLabel tool. I’m no design expert, so I may have inaccurately labeled a few of these images. CrowdLabel allows multiple people on your team to separately label datasets, increasing labeling accuracy by only using examples that multiple people have assigned the same label. Using CrowdLabel also lets you skip all of the code in this tutorial 😛

Step 2: Training Your Collection

The generate_training_data function processed all our data and labels and prepared them so they can be passed into the Custom Collection API, which only takes in a list of items paired with a single label.

Now we can train our model! It’s actually incredibly easy.

Go to the top of your file and import indicoio. Don’t forget to set your API key — there are a number of ways you can do it; I like to put mine in a configuration file.

import indicoio
from indicoio.custom import Collection
indicoio.config.api_key = 'YOUR_API_KEY'

Go back down to the bottom of your file and under if __name__ == "__main__", generate your training data, and define your empty Collection. Now, just add your data to the Collection and train!

if __name__ == "__main__":
    train = generate_training_data()
    collection = Collection("interior_design")
    for sample in tqdm(train):
        collection.add_data(sample)
    collection.train()
    collection.wait()
    collection.info()

Just like that. tqdm is a progress bar that will inform you about how much data has been uploaded, and .wait() will block until the training is complete. Since the dataset is so small, it should only take about a minute train, depending on how fast your Internet connection is.
Calling collection.info() will check your Collection’s status, and return metrics that are useful indicators of the model’s performance. However, larger training set sizes are recommended for more reliable precision and recall metrics, so we’ll use a more hands-on way to test our model instead.

Testing the Model

First, let’s run some test examples through our model to see how it performs for all the categories. I set aside some images in the test_images folder that weren’t in the training dataset. Comment out the code for training the model under if __name__ == "__main__", and then run the following code.

if __name__ == "__main__":
    collection = Collection("interior_design_2")
    test_model()

Generally, we can assume that the highest probability result is the category that the model thinks the image most likely belongs to. Your results should appear as below (note that slight variations in the numbers are normal — they should just be roughly the same). The test images for each category appear above the results here.

contemporary style test images

Test results for CONTEMPORARY category:
{u'minimalism': 0.1399256475, u'industrial': 0.1386253738, u'contemporary': 0.7214489787}
{u'minimalism': 0.28305400940000003, u'industrial': 0.00672567, u'contemporary': 0.7102203205000001}
{u'minimalism': 0.3341266282, u'industrial': 0.0198866961, u'contemporary': 0.6459866757}
******

industrial style test images

Test results for INDUSTRIAL category:
{u'minimalism': 0.1785425491, u'industrial': 0.5101019466, u'contemporary': 0.31135550430000003}
{u'minimalism': 0.0526127544, u'industrial': 0.5505277214000001, u'contemporary': 0.39685952420000004}
{u'minimalism': 0.3112188771, u'industrial': 0.3916452771, u'contemporary': 0.2971358458}
******

minimalist style test images

Test results for MINIMALISM category:
{u'minimalism': 0.9069708701, u'industrial': 0.0880693949, u'contemporary': 0.004959735000000001}
{u'minimalism': 0.9149341994, u'industrial': 0.0394596009, u'contemporary': 0.0456061997}
{u'minimalism': 0.6548668879, u'industrial': 0.0572945461, u'contemporary': 0.287838566}

Looks like the model did alright! If, however, the model had not performed satisfactorily, we could try adding more examples of the underperforming category to the Collection’s training dataset, and retrain the model.

Next Steps

Where to from here? Try expanding the system by adding more styles, or applying this tutorial to other categories, like clothes, food, or art. Or, go a step further — can you adapt our fashion matching tutorial, which also uses the structure of a classification problem, to build a model that matches pieces of furniture to the style of already existing rooms?


Effective January 1, 2020, Indico will be deprecating all public APIs and sunsetting our Pay as You Go Plan.

Why are we deprecating these APIs?

Over the past two years our new product offering Indico IPA has gained a lot of traction. We’ve successfully allowed some of the world’s largest enterprises to automate their unstructured workflows with our award-winning technology. As we continue to build and support Indico IPA we’ve reached the conclusion that in order to provide the quality of service and product we strive for the platform requires our utmost attention. As such, we will be focusing on the Indico IPA product offering.

 

[addtoany]

Increase intake capacity. Drive top line revenue growth.

[addtoany]

Unstructured Unlocked podcast

March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

podcast episode artwork
March 13, 2024 | E42

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

podcast episode artwork
February 28, 2024 | E41

Unstructured Unlocked episode 41 with Charles Morris, Chief Data Scientist for Financial Services at Microsoft

podcast episode artwork

Get started with Indico

Schedule
1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.
Subscribe to our blog

Get our best content on intelligent automation sent to your inbox weekly!