1
votes

I have used three different ways to calculate the matching between the resume and the job description. Can anyone tell me that what method is the best and why?

  1. I used NLTK for keyword extraction and then RAKE for keywords/keyphrase scoring, then I applied cosine similarity.

  2. Scikit for keywords extraction, tf-idf and cosine similarity calculation.

  3. Gensim library with LSA/LSI model to extract keywords and calculate cosine similarity between documents and query.

3
I think you are going to need to test with you data. Since they are different documents I think you would be better off using resumes that are a match to the job. - paparazzo
@Paparazzi They are giving different results therefore i am little confused, which to use? Did you already performed the related work? - Khalid Usman
If you have all three then let it be up to the user which to use. - paparazzo
actually i am iOS expert, this is my first project in Information retrieval and machine learning, so i am just doing R&D without any guidance. Can you guide me if you already did the related work. - Khalid Usman
That is my guidance. Give users the options. Since a job description is not the same as a resume this is not going to be perfect. - paparazzo

3 Answers

4
votes

Nobody here can give you the answer. The only way to decide which method works better is to have one or more humans independently match lots and lots of resumes and job descriptions, and compare what they do to what your algorithms do. Ideally you'd have a dataset of already matched resumes and job descriptions (companies must do this kind of thing when people apply), because it takes a lot of work to create a sufficiently large dataset.

Next time you take on this kind of project, start by considering how you are going to evaluate the performance of the solution you'll put together.

1
votes

As already mentioned in answers, try ti use Doc2Vec. Seems using Doc2Vec from Gensim on both corpora (CVs and job descriptions) separately and then using cosine similarity between the two vectors is the easiest flow to work. It works better than others on documents which are not similar in form and words content but similar in context and sematics, so merely keywords would not help much here.

Then you can try to train CNN on the corpus of pairs of matched CV&JD with labels like yes/no if available and use it to qulaify CVs/resumees against job descriptions.

Basically I'm going to try these aproaches in my pretty much the same task, pls see https://datascience.stackexchange.com/questions/22421/is-there-an-algorithm-or-nn-to-match-two-documents-basically-not-closely-simila

0
votes

Since its highly likely that job description and resume content can be different, you should think from semantics point of view. One thing possible you can do is use some domain knowledge. But its pretty difficult to gain domain knowledge for a variety of job types. Researchers sometimes use dictionary to augment the similarity matching between documents.

Researchers are using deep neural networks to capture both syntactic and semantic structure of documents. You can use doc2Vec to compare two documents. Gensim can produce doc2Vec representation for you. I believe that will give better results compared to keyword extraction and similarity computation. You can build your own neural network model to train on job descriptions and resumes. I guess neural networks will be effective for your work.