Facebook researchers show random methods without any training can outperform modern sentence embeddings models for sentence classification

3 min read

A pair of researchers, John Wieting, Carnegie Mellon University, and Douwe Kiela, Facebook AI Research, published a paper titled “No training required: Exploring random encoders for sentence classification”, earlier this week.

Sentence embedding refers to a vector representation of the meaning of a sentence. It’s most often created by a transformation of word embeddings using a composition function, which is often nonlinear and recurrent in nature. Most of these word embeddings get initialized using pre-trained embeddings. These sentence embeddings are then used as features for a collection of downstream tasks (that receive data from the server).

The paper explores three different approaches for computing sentence representations from pre-trained word embeddings (that use nothing but random parameterizations). It considers two well-known examples of sentence embeddings including, SkipThought (mentioned in Advances in neural information processing systems by Ryan Kiros), and InferSent (mentioned in Supervised learning of universal sentence representations from natural language inference data by Alexis Conneau). As mentioned in the paper, SkipThought took around one month to train, while InferSent requires large amounts of annotated data.

“We examine to what extent we can match the performance of these systems by exploring different ways for combining nothing but the pre-trained word embeddings. Our goal is not to obtain a new state of the art but to put the current state of the art methods on more solid footing”, states the researchers.

Approaches used

The paper mentions three different approaches for computing sentence representation from pre-trained word embeddings as follows:

Bag of random embedding projections (BOREP): In this method, a single random projection is applied in a standard bag-of-words (or bag-of-embeddings) model. A matrix is randomly initialized consisting of a dimension of the projection and dimension of an input word embedding. The values for the matrix are then sampled uniformly.
Random LSTMs: This approach makes use of bidirectional LSTMs without any training involved. The LSTM weight matrices and their other corresponding biases get initialized uniformly at random.
Echo state networks (ESNs): Echo State Networks (ESNs) were primarily designed and used for sequence prediction problems, where given a sequence X, a label y is predicted for each step in the sequence. The main goal of using echo state networks is to minimize the error between the predicted yˆ and the target y at each timestep.

In the paper, the researchers have diverged from the typical per-timestep ESN setting, and have instead used ESN to produce a random representation of a sentence. A bidirectional ESN is used where the reservoir states get concatenated for both directions. These states are then pooled over to generate a sentence representation.

Results

For evaluation purposes, following set of downstream tasks are used: sentiment analysis (MR, SST), question-type (TREC), product reviews (CR), subjectivity (SUBJ), opinion polarity (MPQA), paraphrasing (MRPC), entailment (SICK-E, SNLI) and semantic relatedness (SICK-R, STSB). The three models are evaluated against random sentence encoders, InferSent and SkipThought models.

As per the results:

When comparing the random sentence encoders, ESNs outperformed BOREP and RandLSTM for all downstream tasks.
When compared to InferSent, it shows that the performance gains over random methods are not as phenomenal, despite the fact that InferSent requires annotated data and takes more time to train as opposed to the random sentence encoders that can be applied immediately.
For SkipThought, the gain over random methods (which do have better word embeddings) is even smaller. SkipThought took a very long time to train, and in the case of SICK-E, it’s better to use BOREP. ESN also outperforms SkipThought in most of the downstream tasks.

“The point of these results is not that random methods are better than these other encoders, but rather that we can get very close and sometimes even outperform those methods without any training at all, from just using the pre-trained word embeddings,” state the researchers.

For more information, check out the official research paper.

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Artificial Intelligence

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Servers

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Interviews

Clean Coding in Python with Mariano Anaya

Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Front-End Web Development

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Featured

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago

Facebook researchers show random methods without any training can outperform modern sentence embeddings models for sentence classification

Approaches used

Results

Read Next

Recent Posts

Top life hacks for prepping for your IT certification exam

Learn Transformers for Natural Language Processing with Denis Rothman

Learning Essential Linux Commands for Navigating the Shell Effectively

Clean Coding in Python with Mariano Anaya

Exploring Forms in Angular – types, benefits and differences

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Facebook researchers show random methods without any training can outperform modern sentence embeddings models for sentence classification

Approaches used

Results

Read Next

Related Post

Recent Posts

Top life hacks for prepping for your IT certification exam

Learn Transformers for Natural Language Processing with Denis Rothman

Learning Essential Linux Commands for Navigating the Shell Effectively

Clean Coding in Python with Mariano Anaya

Exploring Forms in Angular – types, benefits and differences

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Exploring Forms in Angular – types, benefits and differences