indico has published two open source libraries to the community.
Finetune is an open source python library for language model finetuning. It’s our attempt to make the research done by Alec Radford at OpenAI more widely accessible by packaging it in an easy to use, scikit-learn style library. Radford demonstrated that language models serve as a good base model for solving downstream tasks like classification, entailment, similarity, and multiple-choice question answering. We’ve also added support for multilabel classification, sequence labeling tasks like named-entity recognition, and multi-field text inputs and packaged everything up into a scikit-learn style interface. Documentation is available at finetune.indico.io.
Enso is an open-source python library for benchmarking document embeddings and transfer learning methods.
Enso was created in part to help measure and address industry-wide overfitting to a small number of academic datasets. There needs to be a simple way to separate generic progress from advances that take advantage of dataset-specific features. The former is “publication worthy”, the latter may not be — but often it’s hard to distinguish between the two because papers fail to evaluate a broad enough domain. The number of classes, number of training examples, level of class imbalance, average document length, and other dataset attributes can have an enormous influence on the viability of an approach, and evaluation across a broad range of tasks helps to ascertain where an application of a given method is appropriate and where it is not. Through Enso, we hope to make evaluating across a broad range of datasets painless in an effort to make this practice more common.
In addition to providing a framework for benchmarking embedding quality, we’ve also included 24 open-source datasets for you to use in your own experiments. Read more.