I'm trying to read in an sort data from a CSV file and I'm having trouble with the long strings involved. I've attached screenshots of part of my output from SAS and from the original dataset. (The dataset is from Kaggle, about Ted Talks.)
I'm having trouble with the variable "tags". Basically, I'm looking to read in the tags and sort the data accordingly (i.e. tags that mention children or education will be put into an Education category). So far I'm stuck with just reading them in though. Any help would be appreciated!
This is my code so far:
data tedtalks;
infile 'O:\ted_main1.csv' dlm = ',' firstobs = 2;
informat name $80.;
informat main_speaker $20.;
informat speaker_occupation $60.;
informat title $80.;
input comments duration event $ film_date languages
main_speaker $ name $ num_speaker published_date
speaker_occupation $ tags $ title $ views
;
run;
proc print data=tedtalks;
run;
First few lines of CSV data:
comments duration event film_date languages main_speaker name num_speaker published_date speaker_occupation tags title views
4553 1164 TED2006 1140825600 60 Ken Robinson Ken Robinson: Do schools kill creativity? 1 1151367060 Author/educator ['children', 'creativity', 'culture', 'dance', 'education', 'parenting', 'teaching'] Do schools kill creativity? 47227110
265 977 TED2006 1140825600 43 Al Gore Al Gore: Averting the climate crisis 1 1151367060 Climate advocate ['alternative energy', 'cars', 'climate change', 'culture', 'environment', 'global issues', 'science', 'sustainability', 'technology'] Averting the climate crisis 3200520
124 1286 TED2006 1140739200 26 David Pogue David Pogue: Simplicity sells 1 1151367060 Technology columnist ['computers', 'entertainment', 'interface design', 'media', 'music', 'performance', 'simplicity', 'software', 'technology'] Simplicity sells 1636292
200 1116 TED2006 1140912000 35 Majora Carter Majora Carter: Greening the ghetto 1 1151367060 Activist for environmental justice ['MacArthur grant', 'activism', 'business', 'cities', 'environment', 'green', 'inequality', 'politics', 'pollution'] Greening the ghetto 1697550