Custom Collections

Not everyone’s needs align with what indico offers out of the box, so we also provide a generalized solution that learns to recognize patterns in your data. This means that you can apply our technology to a specific use case, as well as train your model on the data most relevant to you.

What is this?
It’s a way to automatically learn about new data based on the patterns in past data. For example, if you had a photo sharing app with user specified scene tags, you could use 4,000 tagged photos to predict tags for the remaining 17,000 untagged ones. Right now we support one label per input, but there are ways to work around that. We currently only support text and image classification. Please reach out if you want to use our product on a regression problem.

Who is this for?
Anyone who currently has to label text or image data by hand or has ever shied away from building great things that might require continuous, manual labelling.

What do I need?
All you need is a well defined problem. Say you want to see what kinds of posts your followers like. If you took the text from your posts and passed the amount of likes for each one as a label, you could create a Collection that predicts how much your audience will like new posts. If you’re working with a completely unlabeled dataset, you and your team can use our CrowdLabel tool to seamlessly and quickly upload, label, and process data. You can then train a Custom Collection on your newly labeled dataset.

How much do I need?
This one’s a bit of a gray area depending on your application. Having very little data is rarely helpful, but having huge amounts doesn’t guarantee accurate predictions either. For starters, you should aim for your data to only communicate relevant information. Having a border on all your nature photos could cause your image classifier to believe any bordered image represents nature. To give you a better idea, here’s a table that describes how many samples per label you ought to start with on tasks of varying difficulties:

Task Text Samples Image Samples
Easy 100 10
Medium 500 50
Hard 2000 200

How many samples can I add?
Custom Collections is designed to provide high quality predictions with as few data points as possible. In general, you’ll see diminishing returns after the first few thousand data points. Because of this, the Custom Collections API currently has a cap at 50,000 samples. If you want to train Collections with larger amounts of input data, reach out to us directly at contact@indico.io and we’ll help make your project possible.

What makes a task hard for a computer?
As humans, we know that subtle variations can make a world of difference. We’re always working on teaching machine learning models to retain as much useful context as possible, but catching slight cues can be difficult to do in a generalized way. Here are some example categorization tasks for text and image Collections.

Task Text Image
Easy Wallstreet Journal vs. Buzzfeed Jungle vs. Beach
Medium 1 Star vs. 5 Star Review Leather vs. Wood
Hard Dieting vs. Fitness Husky vs. Malamute