Im trying to obtain sentence embeddings for Bert but Im not quite sure if Im doing it properly... and yes Im aware that exist such tools already such as bert-as-service but I want to do it myself and understand how it works.
Lets say I want to extract a sentence embedding from word embeddings from the following sentence "I am.". As I understood Bert outputs in the form of (12, seq_lenght, 768). I extracted each word embedding from the last encoder layer in the form of (1, 768). My doubt now lies in extracting the sentence from these two word vectors. If I have (2,768) should I sum the dim=1 and obtain a vector of (1,768)? Or maybe concatenate the two words (1, 1536) and applying a (mean) pooling and get the sentence vector in shape of (1, 768). Im not sure what is the right approach is to obtain the sentence vector for this given example is.