Benchmarks

How will my model’s accuracy compare to other common techniques?
Especially with small amounts of data, Custom Collections should generally give a higher accuracy than common DIY algorithms. We benchmarked Custom Collections againts some common algorithms for three typical machine learning tasks to give an idea of how it compares.

Task: Sentiment Detection (Text)
Dataset: Large Movie Review Dataset
Custom Collection Domain: “sentiment”
Benchmarked Against: tfidf vectors of samples (with stop words removed) into logistic regression with a grid search for an optimal regularization parameter

SamplesCustom Collection AccuracyDIY Algorithm Accuracy
1000.890.58
1,0000.930.82
10,0000.940.86

Task: Topic Classification (Text, 4 Categories)
Dataset: Aggregated News
Custom Collection Domain: “topics”
Benchmarked Against: tfidf vectors of samples (with stop words removed) into logistic regression with a grid search for an optimal regularization parameter

SamplesCustom Collection AccuracyDIY Algorithm Accuracy
1000.810.60
1,0000.860.82
10,0000.880.88

Task: Classification (Image, 25 Categories)
Dataset: Caltech 101
Custom Collection Domain: Not set
Benchmarked Against: Logistic regression model trained on HoG features and color histograms for each sample and with a grid search for an optimal regularization parameter

SamplesCustom Collection AccuracyDIY Algorithm Accuracy
1000.820.22
1,0000.950.55
10,0000.940.67

require 'indico'
Indico.api_key =  'YOUR_API_KEY'

collection = Indico::Collection.new("collection_name")

# Add Data
collection.add_data([["text1", "label1"], ["text2", "label2"], ...])

# Training
collection.train()

#Telling Collection to block until ready
collection.wait()

#Done! Start analyzing text
collection.predict("indico is so easy to use!")