Benchmarks

How will my model’s accuracy compare to other common techniques?
Especially with small amounts of data, Custom Collections should generally give a higher accuracy than common DIY algorithms. We benchmarked Custom Collections againts some common algorithms for three typical machine learning tasks to give an idea of how it compares.

Task: Sentiment Detection (Text)
Dataset: Large Movie Review Dataset
Custom Collection Domain: “sentiment”
Benchmarked Against: tfidf vectors of samples (with stop words removed) into logistic regression with a grid search for an optimal regularization parameter

SamplesCustom Collection AccuracyDIY Algorithm Accuracy
1000.890.58
1,0000.930.82
10,0000.940.86

Task: Topic Classification (Text, 4 Categories)
Dataset: Aggregated News
Custom Collection Domain: “topics”
Benchmarked Against: tfidf vectors of samples (with stop words removed) into logistic regression with a grid search for an optimal regularization parameter

SamplesCustom Collection AccuracyDIY Algorithm Accuracy
1000.810.60
1,0000.860.82
10,0000.880.88

Task: Classification (Image, 25 Categories)
Dataset: Caltech 101
Custom Collection Domain: Not set
Benchmarked Against: Logistic regression model trained on HoG features and color histograms for each sample and with a grid search for an optimal regularization parameter

SamplesCustom Collection AccuracyDIY Algorithm Accuracy
1000.820.22
1,0000.950.55
10,0000.940.67

var indico = require('indico.io');
indico.apiKey =  'YOUR_API_KEY';
var collection = indico.Collection('my_collection');

// Adding Data
collection.addData([["text1", "label1"], ["text2", "label2"], ...])

  // Training
  .train()

  // Waiting for Collection to be trained
  .wait()

  // Predicting once the model is ready!
  .predict("This is awesome!")

  // Viewing results
  .then(console.log);