0
votes

I want to pull the top search result for a LinkedIn query.

In this fiddle : https://dotnetfiddle.net/Vtwi7g

passing to 'html' var this link :
https://www.linkedin.com/search/results/index/?keywords=firstname%3Ajohn%20AND%20lastname%3Adoe%20AND%20company%3Amicrosoft%20AND%20title%3Aceo&origin=GLOBAL_SEARCH_HEADER

I want to get first result : https://www.linkedin.com/in/john-doe-63803769/

  • I guess the program needs some credentials to log in to LinkedIn first - how do I pass these ?

  • I tried Inspect element to see its location - how to traverse the DOM to get the first result ?

1

1 Answers

1
votes

It will be more complicated with linked in search. Their search closed for unauthorized users.

First of all you need to login with your browser and then take your session cookies li_at and _lipt.

LinkedIn is not rendering results list directly to html markup. He is rendering big json objects into <code> element and then using JS to render that.

Your console app should be like this:

static void Main(string[] args)
{
    var html = @"https://www.linkedin.com/search/results/index/?keywords=firstname%3Ajohn%20AND%20lastname%3Adoe%20AND%20company%3Amicrosoft%20AND%20title%3Aceo&origin=GLOBAL_SEARCH_HEADER";

    HtmlWeb web = new HtmlWeb();
    web.PreRequest = new HtmlWeb.PreRequestHandler(OnPreRequest2);
    var htmlDoc = web.Load(html);

    var codeElement = htmlDoc.DocumentNode.SelectNodes("//code[starts-with(@id,'bpr-guid')][last()]");
    var json = WebUtility.HtmlDecode(codeElement.Last().InnerText);
    var obj = JsonConvert.DeserializeObject<Rootobject>(json);
    var profiles = obj.included.Where(i => i.firstName != null);
    foreach(var profile in profiles)
    {
        Console.WriteLine("Profile Name: " + profile.firstName + ";" + profile.lastName + ";" + profile.occupation + ";https://www.linkedin.com/in/" + profile.publicIdentifier); 
    }
    Console.ReadKey();
}
public static bool OnPreRequest2(HttpWebRequest request)
{
    var cookies =   "li_at={YOURCOOKIEHERE};" +
                    "_lipt={YOURCOOKIEHERE}";
    request.Headers.Add(@"cookie:" + cookies);
    return true;
}


public class Rootobject
{
    public Included[] included { get; set; }
}


public class Included
{
    public string firstName { get; set; }
    public string lastName { get; set; }
    public string occupation { get; set; }
    public string objectUrn { get; set; }
    public string publicIdentifier { get; set; }
}

At the end it will print

Profile Name: John;Doe;ceo at Microsoft;https://www.linkedin.com/in/john-doe-8102b868
Profile Name: John;Doe;Ceo at Microsoft;https://www.linkedin.com/in/john-doe-63803769
Profile Name: John;Doe;CEO at Microsoft;https://www.linkedin.com/in/john-doe-2151b69b