0
votes

I'm trying to teach myself R (just started). I decided to import 2 csv files to practice a join on them.

One file imported just fine, the other one is giving off the following errors:

Here is the csv file link:

https://data.world/jonathankkizer/occupation-computerization

I used the following statement

occupationforjoin<-read.table("C:/Users/Admin/Desktop/-=Data
Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
header=TRUE, sep=",")

Warning messages: 1: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv", : line 1 appears to contain embedded nulls 2: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv", : line 2 appears to contain embedded nulls 3: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv", : line 3 appears to contain embedded nulls 4: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv", : line 4 appears to contain embedded nulls 5: In read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv", : line 5 appears to contain embedded nulls 6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string 7: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input

I found on StackOverflow that it could be due to encoding, so I used the suggested solution and executed the statement

occupationforjoin<-read.table("C:/Users/Admin/Desktop/-=Data
Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv",
header=TRUE, sep=",", fileEncoding="UTF-16LE")

It gave me a different error message:

Error in read.table("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/jonathankkizer-occupation-computerization/OccComp.csv", : more columns than column names

I also tried using the read.csv function to no avail.

How do I fix this problem and import the data set successfully? None of the solutions (e.g., using "skipNul = TRUE", "comment.char="" " parameters) that I found online helped.

UPD: Here's the paste of the data set if you don't want to download the csv file from the data world: https://pastebin.com/SPEtWT6f

3
more cols than tablenamses often means there is an extra comma floating around making it look like 1 columns is actually 2. Can you check for that? - morgan121
I've added a link to paste bin in addition to the link to the actual csv file. I couldn't find any rogue commas, but I've literally just started learning R and might be missing something. I'm at the "Hello world" stage with R, so to speak. Would you kindly take a look please? - InfiniteLoop
the separation isn't a comma so save it as a txt file and use: read.csv("document.txt", header=T, sep="\t") I also had to indent the first column heading with a tab (which is the delieter) - morgan121
Warning messages: 1: In read.table(file = file, header = header, sep = sep, quote = quote, : line 1 appears to contain embedded nulls 2: In read.table(file = file, header = header, sep = sep, quote = quote, : line 2 appears to contain embedded nulls 3: In read.table(file = file, header = header, sep = sep, quote = quote, : line 3 appears to contain embedded nulls 4: In read.table(file = file, header = header, sep = sep, quote = quote, : line 4 appears to contain embedded nulls - InfiniteLoop
5: In read.table(file = file, header = header, sep = sep, quote = quote, : line 5 appears to contain embedded nulls 6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input - InfiniteLoop

3 Answers

2
votes

I finally found the solution! I was going nuts; even my instructor didn't know how to fix it!

This statement works:

o<-read.csv("C:/Users/Admin/Desktop/-=Data Science=-/11-27-2018/Occ.txt", header=T, sep="\t", fileEncoding="UTF-16LE")

Like I said in my original question: I tried using fileEncoding="UTF-16LE" and it didn't help. After asking the question, I tried using sep="\t", and it didn't help. But using both of them did the trick!

1
votes

Try to use the function of read_csv() from the readr package.

1
votes

Use dataframe = read.csv("name_of_file.csv")

or

dataframe = read.csv(file.choose()).

Hope this will work.