I am interested in, for example, replacing (nearly) all of the columns of data.frame
or tibble
with columns where the row minimum has been subtracted from each row. For example, if X
is a numerical matrix, then in base R I would write:
X = sweep(X, 1, apply(X, 1, min))
My current function for doing this using the data I have--I'll explain the format momentarily--pulls out the numerical columns into a matrix, does the sweep, then cbind
s the transformed data and the non-numerical data back together again. That is:
subtractMin = function(data){
X = data %>%
select(starts_with("X")) %>%
as.matrix()
X = sweep(X, 1, apply(X, 1, min))
labels = data %>%
select(-starts_with("X"))
return(cbind(labels, X))
}
This strikes me as inefficient, and there must be a smarter way.
I don't think it is important to know given the context, but my data has 77 rows and 1133 columns. Four of the columns contain label information, and the remaining 1129 contain the numerical measurements for each observation (they're spectra if you care). The number of numerical variables is such that individual mutate
s are not a way forward. Equally - you still need to know the row minimum to make the standardisation for each row.
I have been asked to add some data. The original data has over 1,000 columns, so I will provide a smaller data set
> x.df
nm X1799.38928 X1798.01526 X1796.64124 source color rep
1 s001c1 13901.944 13889.056 13883.334 01 c 1
2 s001c2 17293.586 17279.375 17291.365 01 c 2
3 s001c3 8011.764 8028.584 8033.548 01 c 3
4 s001c4 7499.272 7510.719 7517.064 01 c 4
5 s001c5 20300.408 20293.604 20297.185 01 c 5
pmin
that can be leveraged in your case. It would depend of your context but you might be better of having your 1129 columns in a matrix with rownames as current nm, and a 4 column metadata data.frame/tibble on the side. On the matrix you can usesweep
,apply
withmargin=1
etc, because it's what matrices are made for. – Moody_Mudskipper