I have an excel file which I have imported into R. The problem is that there are over 100,000 rows that have data that looks like the following in a single cell:
{'gender': 'Male', 'nationality': 'IRL', 'document_type': 'passport', 'date_of_expiry': '2019-08-12', 'issuing_country': 'IRL'}
So what happens is that R also places this entire information into one box in my table. I would like R to break this single column into multiple columns such that 'gender', 'nationality', 'document_type' etc. each have their own column with entries 'Male', 'IRL', 'passport' etc. respectively for this row. The problem is that some information is missed out in other cells e.g. another cell may look like this:
{'gender': 'Female', 'document_type': 'driving_licence', 'date_of_expiry': '2023-02-28', 'issuing_country': 'GBR'}
and another like this:
{'document_type': 'driving_licence', 'date_of_expiry': '2044-12-14', 'issuing_country': 'GRC'}
How do I assign these entries to their own columns and have empty values where no information is provided. I am a beginner R user. I have tried using package "qdapRegex" because they have a handy tool where I can extract characters between two other characters/strings but to no avail. I think if I could figure out how to apply operators like '|' to strings I would be able to do this with my way.
After Input:
input <- read.csv(text=r"(
ID, Properties
1,"{'gender': 'Male', 'nationality': 'IRL', 'document_type': 'passport', 'date_of_expiry': '2019-08-12', 'issuing_country': 'IRL'}"
2,"{'document_type': 'national_identity_card', 'date_of_expiry': '2027-01-23', 'issuing_country': 'CYP'}"
3,"{'date_of_expiry': '2019-01-28', 'issuing_country': 'JEY'}"
4,"{'document_type': 'driving_licence', 'date_of_expiry': '2006-06-14', 'issuing_country': 'POL'})")
Desired Output:
output <- read.csv(text="
ID,gender,nationality,document_type,date_of_expiry,issuing_country
1,Male,IRL,passport,2019-08-12,IRL
2,,,national_identity_card,2027-01-23,CYP
3,,,,2019-01-28,JEY
4,,,driving_licence,2006-06-14,POL
")