Summarize the question:
I am trying to organize a complex data string that is delivered from a sensor system. Unfortunately, because the order of the variables can change due to sensor errors, I cannot use a simple delimiter-based parsing scheme, as data can end up in the wrong column. My new concept is to pre-determine the columns to sort data into, read repeating segments of the string, and use those repeating variable names to parse the values into the correct column. Does anyone have an idea of a function/functions that will perform that task?
Why the simple solution doesn't work:
The sensor system is a wireless collection of soil moisture, temperature, pressure, and rain sensors that collect readings every five minutes and automatically upload the data to my server.
Unfortunately, I didn't get to decide how data is delivered. Each entry of this string includes 15 readings from ~10 different sensors. The variable names and values are listed together in the string with a ~ delimiter, like so:
vbat~3.63~A0~2821~A1~2931~A3~2917~A4~2963~A5~2984~T11/dP~2950.32~T11/t~12.40~p5/ms5803/p~942.31~p5/ms5803/t~13.43~p6/ms5803/p~942.91~p6/ms5803/t~8.64~p7/t~13.72~p7/h~59.06~Tip_Ct~0
The text characters ("vbat", "A0", "A1", ... "T11/dP", "T11/t", "p5/ms5803/p", ... "Tip_Ct")
are each variable names of the sensor/data type, and the numeric values (3.63, 2821, 2931, ... 2950.32, 12.40, 942.31,...0)
are the sensor readings.
Unfortunately, I realized that this won't consistently sort my variables, as the order of readings can change if the microprocessor malfunctions or one of the sensors fails. For example, when sensor "p6"
failed, the values for "p6/ms5803/p"
and "p6/ms5803/t"
are no longer included in the string, like so:
vbat~3.89~A0~2815~A1~2723~A3~2737~A4~2660~A5~2620~T11/dP~3024.08~T11/t~10.20~p5/ms5803/p~935.46~p5/ms5803/t~10.37~p7/t~10.28~p7/h~89.54~Tip_Ct~0
That means that a simple delimiter-based sorting mechanism puts sensor readings at the end of the string into the wrong column.
I am looking for a set of functions that will read my string, identify specific variable names (e.g. "A0"
), and then sort the next tilde-delimited value into a column with the same name. If anyone has an idea for a simpler workflow, I would be happy to hear that too!
What I intended:
My original plan was simple: parse by delimeter, remove repetitive columns, and re-name data columns according to variable name. My spreadsheet is available at https://drive.google.com/open?id=1USD_zTSj7rDL-P3yiRt6Aa5Mk00Li3Si
*#LOAD LIBRARIES AND DATA*
>library(splitstackshape)
>Harbor = read.csv(file="filepath/HARBOR2_190828.csv",header=T)
*#PARSE DATA*
>Harbor = concat.split(Harbor, split.col="full_data",sep="~",drop=T)
>Harbor = Harbor[-seq(from=4,to=32,by=2)]
>names(Harbor) = c("Date","Time","DeviceID","vbat","A0","A1","A3","A4","A5","T11/dP","T11/t","p5/ms5803/p","p5/ms5803/t","p6/ms5803/p","p6/ms5803/t","p7/t","p7/h","Tip_Ct")
I realized that my data string length was inconsistent when I received the following error after the "concat.split" function:
>Error in `[.data.table`(indt, , splitCols, with = FALSE) :
Column 2 ['full_data'] is length 4487 but column 1 is length 11; malformed data.table.
Side note: Row 1877
is the last row with readings from sensor p6/ms5803