Auto-tagging Interior Design Styles

Sorting content into categories is a key task for recommendation systems, as well as for general data management. We’ve talked a lot about text data lately — for example, using topic tags to improve article suggestions for your readership, and how you can build a custom text classifier for your specific industry or task. We didn’t even need large datasets to build those tools — and the same applies to images. Using our customizable machine learning API, Custom Collection, you can easily build a model to auto-tag images, streamlining and improving visual content recommendations and management.

The Task

Build a simple interior design style classifier using Custom Collections. More specifically, we’ll be testing to see how well it can distinguish three fairly similar styles: contemporary, minimalist, and industrial.

The Data

Our dataset consists of 21 images of rooms — 7 per style — that I grabbed from Google Images (all labeled for reuse). That’s right, just 21. How is this possible? Custom Collections uses a machine learning technique called transfer learning, which allows us to build models with very small datasets. In fact, depending on the difficulty of the task you’re trying to achieve, you’ll start seeing diminishing returns after the first few thousand data points.

Take note though, it’s generally better to have 10 or more samples for each category (depending on the difficulty of the problem you’re trying to solve). It’s difficult to find good free-for-use images though, so we’ll have to make do.

Training the Model

If you want to follow along, clone the dataset and skeleton code from the Github repo. We’ll be working in Python.

Before we go any further, have you set up your free indico account yet? In case you haven’t, follow our Quickstart Guide. It will walk you through the process of getting your API key and installing the indicoio Python library. If you run into any problems, check the Installation section of the docs. You can also reach out to us through that little chat bubble if you run into any trouble.

Assuming your account is all set up and you’ve installed everything, let’s get started.

Step 1: Labeling the Data

If you’re working with the dataset I provided (located in the images folder), you’ll see that each image is named after the style it represents. Open up main.py. In the generate_training_data function, you’ll see that we grabbed those filenames and used them as the labels for each image. If you decide to use your own unlabeled dataset, you can use our CrowdLabel tool. I’m no design expert, so I may have inaccurately labeled a few of these images. CrowdLabel allows multiple people on your team to separately label datasets, increasing labeling accuracy by only using examples that multiple people have assigned the same label. Using CrowdLabel also lets you skip all of the code in this tutorial 😛

Step 2: Training Your Collection

The generate_training_data function processed all our data and labels and prepared them so they can be passed into the Custom Collection API, which only takes in a list of items paired with a single label.

Now we can train our model! It’s actually incredibly easy.

Go to the top of your file and import indicoio. Don’t forget to set your API key — there are a number of ways you can do it; I like to put mine in a configuration file.

import indicoio
from indicoio.custom import Collection
indicoio.config.api_key = 'YOUR_API_KEY'

Go back down to the bottom of your file and under if __name__ == "__main__", generate your training data, and define your empty Collection. Now, just add your data to the Collection and train!

if __name__ == "__main__":
    train = generate_training_data()
    collection = Collection("interior_design")
    for sample in tqdm(train):
        collection.add_data(sample)
    collection.train()
    collection.wait()
    collection.info()

Just like that. tqdm is a progress bar that will inform you about how much data has been uploaded, and .wait() will block until the training is complete. Since the dataset is so small, it should only take about a minute train, depending on how fast your Internet connection is.
Calling collection.info() will check your Collection’s status, and return metrics that are useful indicators of the model’s performance. However, larger training set sizes are recommended for more reliable precision and recall metrics, so we’ll use a more hands-on way to test our model instead.

Testing the Model

First, let’s run some test examples through our model to see how it performs for all the categories. I set aside some images in the test_images folder that weren’t in the training dataset. Comment out the code for training the model under if __name__ == "__main__", and then run the following code.

if __name__ == "__main__":
    collection = Collection("interior_design_2")
    test_model()

Generally, we can assume that the highest probability result is the category that the model thinks the image most likely belongs to. Your results should appear as below (note that slight variations in the numbers are normal — they should just be roughly the same). The test images for each category appear above the results here.

Test results for CONTEMPORARY category:
{u'minimalism': 0.1399256475, u'industrial': 0.1386253738, u'contemporary': 0.7214489787}
{u'minimalism': 0.28305400940000003, u'industrial': 0.00672567, u'contemporary': 0.7102203205000001}
{u'minimalism': 0.3341266282, u'industrial': 0.0198866961, u'contemporary': 0.6459866757}
******

Test results for INDUSTRIAL category:
{u'minimalism': 0.1785425491, u'industrial': 0.5101019466, u'contemporary': 0.31135550430000003}
{u'minimalism': 0.0526127544, u'industrial': 0.5505277214000001, u'contemporary': 0.39685952420000004}
{u'minimalism': 0.3112188771, u'industrial': 0.3916452771, u'contemporary': 0.2971358458}
******

Test results for MINIMALISM category:
{u'minimalism': 0.9069708701, u'industrial': 0.0880693949, u'contemporary': 0.004959735000000001}
{u'minimalism': 0.9149341994, u'industrial': 0.0394596009, u'contemporary': 0.0456061997}
{u'minimalism': 0.6548668879, u'industrial': 0.0572945461, u'contemporary': 0.287838566}

Looks like the model did alright! If, however, the model had not performed satisfactorily, we could try adding more examples of the underperforming category to the Collection’s training dataset, and retrain the model.

Next Steps

Where to from here? Try expanding the system by adding more styles, or applying this tutorial to other categories, like clothes, food, or art. Or, go a step further — can you adapt our fashion matching tutorial, which also uses the structure of a classification problem, to build a model that matches pieces of furniture to the style of already existing rooms?

Effective January 1, 2020, Indico will be deprecating all public APIs and sunsetting our Pay as You Go Plan.

Why are we deprecating these APIs?

Over the past two years our new product offering Indico IPA has gained a lot of traction. We’ve successfully allowed some of the world’s largest enterprises to automate their unstructured workflows with our award-winning technology. As we continue to build and support Indico IPA we’ve reached the conclusion that in order to provide the quality of service and product we strive for the platform requires our utmost attention. As such, we will be focusing on the Indico IPA product offering.

[addtoany]

Increase intake capacity. Drive top line revenue growth.

Schedule Demo

Unstructured Unlocked podcast

April 10, 2024 | E44

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Listen Now

March 27, 2024 | E43

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Listen Now

March 13, 2024 | E42

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

Listen Now

View All

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Get Started

Industry

Use Cases

Get Started

Resources

Documentation

Customer Stories

Get Started

Get Started

Get Started

Indico Named as Major Contender and Star Performer in Everest Group's PEAK Matrix® for Intelligent Document Processing (IDP)

BLOG

Auto-tagging Interior Design Styles

The Task

The Data

Training the Model

Step 1: Labeling the Data

Step 2: Training Your Collection

Testing the Model

Next Steps

Effective January 1, 2020, Indico will be deprecating all public APIs and sunsetting our Pay as You Go Plan.

Increase intake capacity. Drive top line revenue growth.

Related Posts

Artificial Intelligence, Business

6 Steps to Building the Business Case for Intelligent Automation

Announcements, Business, Indico

Indico Posts Record Q2 in New Bookings as Automation Wave Continues to Accelerate

Artificial Intelligence, Business, Financial Services, Intelligent Process Automation, Machine Learning, Robotic Process Automation

Process Automation Comes to ISDA Master Agreements

Unstructured Unlocked podcast

Unstructured Unlocked episode 44 with Tom Wilde, Indico Data CEO, and Robin Merttens, Executive Chairman of InsTech

Unstructured Unlocked episode 43 with Sunil Rao, Chief Executive Officer at Tribble

Unstructured Unlocked episode 42 with Arthur Borden, VP of Digital Business Systems & Architecture for Everest and Alex Taylor, Global Head of Emerging Technology for QBE Ventures

Get started with Indico

Schedule1-1 Demo

Resources

Blog

Gain insights from experts in automation, data, machine learning, and digital transformation.

Unstructured Unlocked

Enterprise leaders discuss how to unlock value from unstructured data.

YouTube Channel

Check out our YouTube channel to see clips from our podcast and more.

Get our best content on intelligent automation sent to your inbox weekly!

Schedule
1-1 Demo