0
votes

I want to read the following .txt file into R where the first rows look like this:

"Mark"  "Name des Unternehmens" "Ort"   "ID 1"  "ID 2"  "Straße und Hausnummer (*)" "Postleitzahl"  "ID 3"  "ID 4"  "Value of interest" "Value of interest 2"
"1" "VOLKSWAGEN AKTIENGESELLSCHAFT" "Wolfsburg" "2070000543"    "38100 HRB 100484"  ""  "38440" "03103" "031"   "2910"  "3361"
"2" "Daimler AG"    "Stuttgart" "7330530056"    "70190 HRB 19360"   ""  "70327" "08111" "081"   "2910"  "3361"
"3" "E.ON SE"   "Essen" "5050056484"    "40227 HRB 69043"   ""  "45131" "05113" "051"   "7010"  "5511"

that is, the first row gives the headers as usual and the next rows, starting with the numbers "1", "2", "3" feed in specific observations.

Unfortunately, whenever I want to read this into R, either using import tools or read.table, R does not recognise the different columns as different variables, and just gives one completely useless variable. In addition I get the error message "line x contains embedded nuls" for each line.

I tried the following, I imported the .txt in Excel, and saved it as a csv, which then allowed me to import it. This works, but is not there a better way to do this in R? The .txt data does not look so outlandish. Thanks!

1
What did you try to read this ? Can you write the code precisely ?Orhan Yazar
`read.table("filename.txt", header=TRUE, sep="")´Florestan
This works for me: df <- read.table(text = '"Mark" "Name des Unternehmens" "Ort" "ID 1" "ID 2" "Straße und Hausnummer (*)" "Postleitzahl" "ID 3" "ID 4" "Value of interest" "Value of interest 2"\n"1" "VOLKSWAGEN AKTIENGESELLSCHAFT" "Wolfsburg" "2070000543" "38100 HRB 100484" "" "38440" "03103" "031" "2910" "3361"\n"2" "Daimler AG" "Stuttgart" "7330530056" "70190 HRB 19360" "" "70327" "08111" "081" "2910" "3361"\n"3" "E.ON SE" "Essen" "5050056484" "40227 HRB 69043" "" "45131" "05113" "051" "7010" "5511"', header = TRUE, stringsAsFactors = FALSE)Aurèle
Keep the default sep = " "Aurèle
Embedded nuls isn't a problem with R, it's a problem with the text file. There are "invisible" characters in the text. You should clean those (with a text editor or even in R) before parsing it.Nathan Werth

1 Answers

0
votes

Yes your approach is correct:

read.table("data.txt", header = TRUE, stringsAsFactors = FALSE)

However you likely have encoding issues that are causing you grief. The text in your example includes non ASCII characters and therefore check the encoding of the text file. and check the encoding of your R session.

I'm guessing that the text is UTF-8 and you're importing in a native non-UTF-8 encoding. Try this:

read.table("data.txt", header = TRUE, stringsAsFactors = FALSE, encoding = "UTF-8")