This question should not be new, but I just cannot find it... forgive me for asking a repeated question.
Anyway content-based recommendation system requires us to create feature vectors for the items we are recommending. So we have two issues we need to solve to begin with: 1. what components are important enough that should be included in the feature vector, which represents an item? 2. once we decide all the components in the vector, who is responsible for populating the values?
Using movie as the most popular example, we probably decide to user actors, director(s) and genre as the components in the vector. Now, for each movie in the past many years (there are lots of movies out there), how can we populate all these components to prepare the raw data for the vectors? manually? automatically (how)?
I could have missed something. Seems like whenever we decide to do content-based systems, we need to solve these issues, which are not easy to address. Now, it seems almost like collaborative filtering it easier, since it only needs the utility matrix (user-item matrix), and it does not require us to generate all the feature vectors. Of course, utility matrix contains user ratings, which would be another headache to obtain.
Could someone share some thoughts on this? many thanks!