0
votes

I'm exploring the acss package.

I want to know which strings for a given length of the acss_data dataframe have been assigned maximum K.i value.

tail(acss_data)
             K.2 K.4 K.5 K.6      K.9
012345678883  NA  NA  NA  NA 50.28906
012345678884  NA  NA  NA  NA 50.31291
012345678885  NA  NA  NA  NA 49.71200
012345678886  NA  NA  NA  NA 49.81041
012345678887  NA  NA  NA  NA 49.51936
012345678888  NA  NA  NA  NA 48.61247

The acss_data dataframe contains K.2, K.4, K.5, K.6, and K.9 values associated to strings from lengths 1 to 12, and I want to know the maximum K.i for each string length, i.e, I want to know the max K.2 for strings of length 1, length 2, ... length 12. Then I would like to know the max K.4 for strings of length 1, length 2, ... length 12, etc.

How can I query this in R?

1
What are these strings?Roman Luštrik
They "are" nothing in particular, acss gives the algorithmic complexity of a string as an entropy function would give you its entropy. complexitycalculator.comandandandand
But do you not know what string length corresponds to each row?Ernest A
yes, acss_data["010101", ] gives the K.i values for the string "010101". I want a non-hideous and idiomatic way to get the max K.i values for strings of a given length.andandandand

1 Answers

1
votes

You can use aggregate to summarize the data:

library(acss.data)
d=acss_data
d$len=nchar(rownames(d))  # calculate lengths of strings
d[is.na(d)]=-1            # fix NAs for max function
s=aggregate(d[,1:5], list(d$len), max)

The result is a data frame:

   Group.1       K.2       K.4       K.5       K.6       K.9
1        1  2.514277  3.547388  3.947032  4.268200  4.964344
2        2  3.327439  5.414104  6.108780  6.675197  7.927055
3        3  5.505383  8.520908  9.432003 10.189697 11.905392
4        4  8.406714 12.231447 13.284113 14.182866 16.280365
5        5 11.834019 16.230760 17.340010 18.329451 20.735158
6        6 15.366332 19.993828 21.291613 22.410022 25.170522
7        7 18.989162 23.816377 25.389206 26.615356 29.685526
8        8 22.679752 27.556472 29.379371 30.880603 34.243156
9        9 26.343527 31.187297 33.264487 35.097073 38.851463
10      10 29.427574 34.891807 37.282071 39.258235 43.506412
11      11 32.778797 39.506517 42.000889 43.657406 48.208571
12      12 37.064199 40.506517 42.263923 43.657406 52.897870