Benchmarks

How will my model’s accuracy compare to other common techniques?
Especially with small amounts of data, Custom Collections should generally give a higher accuracy than common DIY algorithms. We benchmarked Custom Collections againts some common algorithms for three typical machine learning tasks to give an idea of how it compares.

Task: Sentiment Detection (Text)
Dataset: Large Movie Review Dataset
Custom Collection Domain: “sentiment”
Benchmarked Against: tfidf vectors of samples (with stop words removed) into logistic regression with a grid search for an optimal regularization parameter

Samples Custom Collection Accuracy DIY Algorithm Accuracy
100 0.89 0.58
1,000 0.93 0.82
10,000 0.94 0.86

Task: Topic Classification (Text, 4 Categories)
Dataset: Aggregated News
Custom Collection Domain: “topics”
Benchmarked Against: tfidf vectors of samples (with stop words removed) into logistic regression with a grid search for an optimal regularization parameter

Samples Custom Collection Accuracy DIY Algorithm Accuracy
100 0.81 0.60
1,000 0.86 0.82
10,000 0.88 0.88

Task: Classification (Image, 25 Categories)
Dataset: Caltech 101
Custom Collection Domain: Not set
Benchmarked Against: Logistic regression model trained on HoG features and color histograms for each sample and with a grid search for an optimal regularization parameter

Samples Custom Collection Accuracy DIY Algorithm Accuracy
100 0.82 0.22
1,000 0.95 0.55
10,000 0.94 0.67
from indicoio.custom import Collection
indicoio.config.api_key = 'YOUR_API_KEY'

collection = Collection("collection_name")

# Add Data
collection.add_data([["text1", "label1"], ["text2", "label2"], ...])

# Training
collection.train()

# Telling Collection to block until ready
collection.wait()

# Done! Start analyzing text
collection.predict("indico is so easy to use!")