Word2Vec captures distributed representation of a word which essentially means, multiple neurons (cells) capture a single concept (concept can be word meaning/sentiment/part of speech etc.), and also a single neuron (cell) contributes to multiple concepts.
These concepts are automatically learnt and not pre-defined, hence you can think of them as latent/hidden.
More is the number of neurons (cells), more will be the capacity of your neural network to represent these concepts, but more data will be required to train these vectors (as they are initialised randomly).
size
of word-vector is significantly smaller than vocabulary size (typically), since we want a compressed representation of word. Cosine similarity between two word-vectors indicates similarity between the two words.
EDIT
For more clarity, think of each word being earlier represented by one-hot encoded vector of size of vocabulary which is of the order of thousands/millions. The same word is now condensed into 200 or 300 dimensional vector. In order to find relation between two words, you need to calculate cosine similarity between vector representation of these two words.