
Im trying to obtain sentence embeddings for Bert but Im not quite sure if Im doing it properly... and yes Im aware that exist such tools already such as bert-as-service but I want to do it myself and understand how it works.

Lets say I want to extract a sentence embedding from word embeddings from the following sentence "I am.". As I understood Bert outputs in the form of (12, seq_lenght, 768). I extracted each word embedding from the last encoder layer in the form of (1, 768). My doubt now lies in extracting the sentence from these two word vectors. If I have (2,768) should I sum the dim=1 and obtain a vector of (1,768)? Or maybe concatenate the two words (1, 1536) and applying a (mean) pooling and get the sentence vector in shape of (1, 768). Im not sure what is the right approach is to obtain the sentence vector for this given example is.

I would either use Bert as service, or just use the technique they use for pooling representations github.com/hanxiao/bert-as-serviceSam H.

1 Answers


as I know, BERT had a comment line in its source code:

For classification tasks, the first vector (corresponding to [CLS]) is used as the "sentence vector." Note that this only makes sense because the entire model is fine-tuned.

[CLS] provided by BERT for sentence embeddings without any combination or processing from all the word vectors in the sentence.

Hope it helps.