I have the following R problem when calculating median value from time data series. Can someone understand why R behaves so strangely when such a simple thing like median value needs to be calculated.
- Task: calculate median value of finishing time from runners racing competition dataset.
- Problem: when taking median value from time value an error message "argument is not numeric or logical: returning NA" is returned by R.
- Data is read in from "NEJ_21_km_results.csv" file and factors converted to char value. When trying to convert time value from char to numeric "NAs introduced by coercion" message is returned (but there is no NA values in dataframe).
- In some other cases (when using other files) only then error message is returned when data is filtered by gender (and sometimes only for one gender).
1) Read data into "all_runners" dataframe
all_runners <- read.csv("NEJ_21_km_results.csv", stringsAsFactors=FALSE, strip.white = TRUE)
"RESULT" datafield info is of "chr" datatype
str(all_runners)
'data.frame': 100 obs. of 10 variables:
$ POS : int 1 2 3 4 5 6 7 8 9 10 ...
$ BIB : int 3 2 1 9 5 10 8 33 34 67 ...
$ NAME : chr "DOMINIC KIPTARUS" "TIIDREK NURME" "ROMAN FOSTI" "RAIDO MITT"...
$ YOB : int 1996 1985 1983 1991 1984 1982 1993 1992 1984 1996 ...
$ NAT : chr "KEN" "EST" "EST" "EST" ...
$ CITY : chr "" "" "" "" ...
$ RESULT : chr "01:03:55" "01:03:57" "01:06:18" "01:09:33" ...
$ BEHIND : chr "" "00:00:02" "00:02:23" "00:05:38" ...
$ NET.TIME: chr "01:03:55" "01:03:57" "01:06:18" "01:09:31"...
$ CAT : chr "MN" "M" "M" "M" ...
2) Calculate median of all runners results
> all_runners_median = median(all_runners$RESULT, na.rm = TRUE)
Warning message: In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) : argument is not numeric or logical: returning NA
3) Convert time value from char to numeric
> results_to_numeric <- as.numeric(all_runners$RESULT)
Warning message: NAs introduced by coercion
4) Calculate median of all womens results ('N'=>women, 'M'=>men)
all_womens <- all_runners %>%
filter(str_sub(CAT, 1, 1) == "N") %>%
select(RESULT)
'RESULT' datafield info is of 'chr' datatype
> str(all_womens)
'data.frame': 8 obs. of 1 variable: $ RESULT: chr "01:18:36" "01:20:07" "01:22:52" "01:25:11" ...
Warning message: In mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]) : argument is not numeric or logical: returning NA
> all_womens
RESULT
1 01:18:36
2 01:20:07
3 01:22:52
4 01:25:11
5 01:26:04
6 01:26:09
7 01:26:42
8 01:26:55
RESULT
to date class and you will be fine. Instead ofselect
usepull
. Sorry I'm working from phone otherwise I would be more helpful. – A. Suliman