0
votes

First view rows of the column ac$summary

    1
    during a demonstration flight, a u.s. army flyer flown by orville wright nose-dived into the ground from a height of approximately 75 feet, killing lt. thomas e. selfridge who was a passenger. this was the first recorded airplane fatality in history. one of two propellers separated in flight, tearing loose the wires bracing the rudder and causing the loss of control of the aircraft. orville wright suffered broken ribs, pelvis and a leg. selfridge suffered a crushed skull and died a short time later.
    2
    first u.s. dirigible akron exploded just offshore at an altitude of 1,000 ft. during a test flight.
    3
    the first fatal airplane accident in canada occurred when american barnstormer, john m. bryant, california aviator was killed.
    4
    the airship flew into a thunderstorm and encountered a severe downdraft crashing 20 miles north of helgoland island into the sea. the ship broke in two and the control car immediately sank drowning its occupants.
    5
    hydrogen gas which was being vented was sucked into the forward engine and ignited causing the airship to explode and burn at 3,000 ft..
    6
    crashed into trees while attempting to land after being shot down by british and french aircraft.
    7
    exploded and burned near neuwerk island, when hydrogen gas, being vented, was ignited by lightning.
    8
    crashed near the black sea, cause unknown.
    9
    shot down by british aircraft crashing in flames.
    10
    shot down in flames by the british 39th home defence squadron.
    11
    crashed in a storm.
    12
    shot down by british anti-aircraft fire and aircraft and crashed into the north sea.
    13
    caught fire and crashed. 

I want to make ac$sumnew column based on ac$summary

I wrote following code, but it's not returning the desired output both & and | were used. When | was used, results were irregular. Sometimes right, sometimes wrong.

    ac$sumnew = ifelse(grepl("missing & crashed",ac$Summary),"missing and crashed",
        ifelse(grepl("shot | crashed",ac$Summary),"shot down and crashed",
        ifelse(grepl("struck | lightening",ac$Summary),"struck by lightening and crashed",
         ifelse(grepl("struck | bird & crashed",ac$Summary),"struck by bird and crashed",
         ifelse(grepl("exploded | crashed",ac$Summary),"exploded and crashed",
         ifelse(grepl("engine | failure",ac$Summary),"engine failure",
         ifelse(grepl("fog | crashed",ac$Summary),"crashed due to heavy fog",
         ifelse(grepl("fire | crashed",ac$Summary),"caught fire and crashed",
         ifelse(grepl("shot",ac$Summary),"shot down",             
         ifelse(grepl("crashed",ac$Summary),"Crashed",
         ifelse(grepl("shot",ac$Summary),"Shot down",
         ifelse(grepl("disappeared",ac$Summary),"Disappeared",
         ifelse(grepl("struck | obstacle | crashed ",ac$Summary),"struck by obstacle and Crashed",
         ifelse(grepl("crashed",ac$Summary),"crashed",
         ifelse(grepl("exploded",ac$Summary),"exploded",
         ifelse(grepl("fire",ac$Summary),"caught fire","others"))))))))))))))))

For example if the plane has been shot, it should return "shot down"

if it's just crashed, output should return "crashed"

if it is both missing and crashed it should return "missing and crashed"

I cannot get this part correctly using & and | also

output obtained shown below

1
others
2
exploded and crashed
3
others
4
others
5
engine failure
6
shot down and crashed
7
exploded and crashed
8
Crashed
9
shot down and crashed
10
shot down and crashed
11
Crashed
12
missing and crashed
13
missing and crashed
14
missing and crashed
15
Crashed
16
shot down and crashed
17
shot down and crashed
1

1 Answers

1
votes

I think you have a hierarchy problem. R tests these sequentially, so you have to arrange it the appropriate way. Here is a link to help with that: https://www.programiz.com/r-programming/if-else-statement.

ac$new  <-ifelse(apply(sapply(c("struck","bird","crash"), grepl, as.character(s$s)), 1, all) ,"struck by bird and crashed",
          ifelse(apply(sapply(c("struck","obstacle","crash"), grepl, as.character(s$s)), 1, all) ,"struck by obstacle and Crashed",
          ifelse(apply(sapply(c("miss" , "crash"), grepl, as.character(s$s)), 1, all) ,"missing and crashed",
          ifelse(apply(sapply(c("shot" , "crash"), grepl, as.character(s$s)), 1, all) ,"shot down and crashed",
          ifelse(apply(sapply(c("struck","lightening"), grepl, as.character(s$s)), 1, all) ,"struck by lightening and crashed",
          ifelse(apply(sapply(c("explode","crash"), grepl, as.character(s$s)), 1 , all) ,"exploded and crashed",
          ifelse(apply(sapply(c("engine|failure"), grepl, as.character(s$s)), 1 , all) ,"engine failure",
          ifelse(apply(sapply(c("fog","crash"), grepl, as.character(s$s)) , 1, all) ,"crashed due to heavy fog",
          ifelse(apply(sapply(c("fire","crash"), grepl, as.character(s$s)), 1, all) ,"caught fire and crashed",
          ifelse(apply(sapply("shot", grepl, as.character(s$s)), 1, all) ,"shot down",
          ifelse(apply(sapply("crash", grepl, as.character(s$s)), 1, all), "crashed",
          ifelse(apply(sapply("explode", grepl, as.character(s$s)), 1, all), "exploded",
          ifelse(apply(sapply("fire", grepl, as.character(s$s)), 1, all),"caught fire",
          ifelse(apply(sapply("disappear", grepl, as.character(s$s)), 1, all), "Disappeared","others"))))))))))))))

Now, this works by checking for all words in the c(), and then equating the value to ac$new, except engine|failure. Also, because we are working with words, you want to use the simplest stem word present to check for all variations: so for example, instead of using "missing", you should use "miss".

I got

1                   others
2                 exploded
3                   others
4                  crashed
5           engine failure
6    shot down and crashed
7                 exploded
8                  crashed
9    shot down and crashed
10               shot down
11                 crashed
12   shot down and crashed
13 caught fire and crashed

Some words don't match above because I did check for all words. The reason I checked for all words is because you had single words identified in the latter part of your "ifelse" chain. I did do an eyeball test, and I think mine is correct based on checking for all words.

Btw, this is tedious, especially if you want to expand the list. You may want to use something like,

ac <- data.frame(s = as.character(t), word.que = seq(1, length(t), by = 1))

ac$word.count <- sapply(gregexpr(" ", ac$s), length) + 1

new.mat <- data.frame(word.que = rep.int(ac$word.que, ac$word.count), word = unlist(strsplit(as.character(ac$s), split = " ")))
words.of.interest <- c("struck|bird|crash|obstacle|miss|shot|struck|lightening|explode|engine|failure|fog|fire|disappear")
new.mats<- new.mat %>%
           mutate(word = gsub("\\,", "", gsub("\\.", "", word))) %>%
           mutate(word.interest = ifelse(grepl(words.of.interest, as.character(word)), 1, 0)) %>%
           filter(word.interest == 1) %>%
           group_by(word.que) %>% 
           summarise(word.list = paste0(unique(word), collapse = "; ")) %>%
           full_join(ac, by = "word.que" ) %>%
           arrange(word.que) %>%
           mutate(word.list = ifelse(is.na(word.list), 'other', word.list))

This will create a more efficient search list for you to construct by. The result is

   word.que           word.list
1         1               other
2         2            exploded
3         3               other
4         4            crashing
5         5     engine; explode
6         6       crashed; shot
7         7            exploded
8         8             crashed
9         9      shot; crashing
10       10                shot
11       11             crashed
12       12 shot; fire; crashed
13       13       fire; crashed

As well as your text variable and word.count. This may be more efficient in the long run as well.