I am gathering data about different universities and I have a question about the follow error after executing the following code. The problem is when using htmlParse()
Code:
url1 <- "http://nces.ed.gov/collegenavigator/?id=165015"
webpage1<- getURL(url1)
doc1 <- htmlParse(webpage1)
Output:
Error in htmlParse(webpage1) : File
!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
html xmlns="http://www.w3.org/1999/xhtml" head id="ctl00_hd"meta http-equiv="Content-type" content="text/html;charset=UTF-8" /title
College Navigator - National Center for Education Statistics
/titlelink href="css/md0.css" type="text/css" rel="stylesheet" meta name="keywords" content="college navigator,college search,postsecondary education,postsecondary statistics,NCES,IPEDS,college locator"/meta meta name="description" content="College Navigator is a free consumer information tool designed to help students, parents, high school counselors, and others get information about over 7,000 postsecondary institutions in the United States - such as programs offered, retention and graduation rates, prices, aid available, degrees awarded, campus safety, and accreditation."meta>meta name="robots" content="index,nofollow"/metalink
I have webs scraped pages before using this package and I never had an issue. Does the name="robots" have anything to do with it? Any help would be greatly appreciate.