0
votes

I want to extract the title of every news item displayed on "http://pib.nic.in/newsite/erelease.aspx?relid=58313" website using Excel VBA. I have written a code using getelementsbyclassname("contentdiv"). But the debugger is showing a error pertaining to that the object doesn't support...I want to extract the information items of every relid..which is there in the URL as well...

1
It would be great if you could write a code..for one relid=58313..I will run a loop after that. - Sahil Oberoi

1 Answers

0
votes

Cold scrapes like this are generally handled more efficiently with a XMLHTTP pull. This requires the addition of a few libraries to the VBE's Tools ► References. The code below needs Microsoft XML, v6.0, Microsoft HTML Object library and Microsoft Internet Controls. Might not need the last one but you probably will if you expand the code beyond what is supplied.

Public Const csURL As String = "http://pib.nic.in/newsite/erelease.aspx?relid=×ID×"

Sub scrape_PIBNIC()
    Dim htmlBDY As HTMLDocument, xmlHTTP As MSXML2.ServerXMLHTTP60
    Dim i As Long, u As String, iDIV As Long

    On Error GoTo CleanUp

    Set xmlHTTP = New MSXML2.ServerXMLHTTP60
    Set htmlBDY = New HTMLDocument

    For i = 58313 To 58313
        htmlBDY.body.innerHTML = vbNullString
        With xmlHTTP
            u = Replace(csURL, "×ID×", i)
            'Debug.Print u
            .Open "GET", u, False
            .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
            .send
            If .Status <> 200 Then GoTo CleanUp

            htmlBDY.body.innerHTML = .responseText

            For iDIV = 0 To (htmlBDY.getElementsByClassName("contentdiv").Length - 1)
                If CBool(htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span").Length) Then
                    Sheets("Sheet1").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0) = _
                      htmlBDY.getElementsByClassName("contentdiv")(iDIV).getElementsByTagName("span")(0).innerText
                End If
            Next iDIV

        End With
    Next i

CleanUp:
    Set htmlBDY = Nothing
    Set xmlHTTP = Nothing
End Sub

That should be enough to get you started. The site you are targeting requires that charset=UTF-8 be added to the request. I had no success without it. I strongly suspect that this may have been the source of your object doesn't support error.