I have a multi-label classification problem. I have a dataset available at the following link: dataset
This data set is originally from siam competition 2007. The dataset comprises of aviation safety reports describing the problem(s) which occurred in certain flights. It is a multi-classification, high dimensional problem. It has 21519 rows and 30438 columns.
The dataset contains .svm format file. I have read the file with the help of "read.delim" in R. After that I got following output:
head(data[,1])
1 18 2:0.136082763488 6:0.136082763488 7:0.136082763488 12:0.136082763488 20:0.136082763488 23:0.136082763488 32:0.136082763488 37:0.136082763488 39:0.136082763488 43:0.136082763488 53:0.136082763488 57:0.136082763488 58:0.136082763488 59:0.136082763488 60:0.136082763488 61:0.136082763488 62:0.136082763488 63:0.136082763488 64:0.136082763488 65:0.136082763488 66:0.136082763488 67:0.136082763488 68:0.136082763488 69:0.136082763488 70:0.136082763488 71:0.136082763488 72:0.136082763488 73:0.136082763488 74:0.136082763488 75:0.136082763488 76:0.136082763488 77:0.136082763488 78:0.136082763488 79:0.136082763488 80:0.136082763488 81:0.136082763488 82:0.136082763488 83:0.136082763488 84:0.136082763488 85:0.136082763488 86:0.136082763488 87:0.136082763488 88:0.136082763488 89:0.136082763488 90:0.136082763488 91:0.136082763488 92:0.136082763488 93:0.136082763488 94:0.136082763488 95:0.136082763488 96:0.136082763488 97:0.136082763488 98:0.136082763488 99:0.136082763488
[2] 1,12,13,18,20 2:0.0916698497028 4:0.0916698497028 6:0.0916698497028 12:0.0916698497028 14:0.0916698497028 16:0.0916698497028 19:0.0916698497028 23:0.0916698497028 26:0.0916698497028 31:0.0916698497028 32:0.0916698497028 33:0.0916698497028 37:0.0916698497028 53:0.0916698497028 57:0.0916698497028 66:0.0916698497028 71:0.0916698497028 72:0.0916698497028 81:0.0916698497028 83:0.0916698497028 84:0.0916698497028 86:0.0916698497028 90:0.0916698497028 92:0.0916698497028 100:0.0916698497028 101:0.0916698497028 102:0.0916698497028 103:0.0916698497028 104:0.0916698497028 105:0.0916698497028 106:0.0916698497028 107:0.0916698497028 108:0.0916698497028 109:0.0916698497028 110:0.0916698497028 111:0.0916698497028 112:0.0916698497028 113:0.0916698497028 114:0.0916698497028 115:0.0916698497028 116:0.0916698497028 117:0.0916698497028 118:0.0916698497028 119:0.0916698497028 120:0.0916698497028 121:0.0916698497028 122:0.0916698497028 123:0.0916698497028 124:0.0916698497028 125:0.0916698497028 126:0.0916698497028 127:0.0916698497028 128:0.0916698497028 129:0.0916698497028 130:0.0916698497028 131:0.0916698497028 132:0.0916698497028 133:0.0916698497028 134:0.0916698497028 135:0.0916698497028 136:0.0916698497028 137:0.0916698497028 138:0.0916698497028 139:0.0916698497028 140:0.0916698497028 141:0.0916698497028 142:0.0916698497028 143:0.0916698497028 144:0.0916698497028 145:0.0916698497028 146:0.0916698497028 147:0.0916698497028 148:0.0916698497028 149:0.0916698497028 150:0.0916698497028 151:0.0916698497028 152:0.0916698497028 153:0.0916698497028 154:0.0916698497028 155:0.0916698497028 156:0.0916698497028 157:0.0916698497028 158:0.0916698497028 159:0.0916698497028 160:0.0916698497028 161:0.0916698497028 162:0.0916698497028 163:0.0916698497028 164:0.0916698497028 165:0.0916698497028 166:0.0916698497028 167:0.0916698497028 168:0.0916698497028 169:0.0916698497028 170:0.0916698497028 171:0.0916698497028 172:0.0916698497028 173:0.0916698497028 174:0.0916698497028 175:0.0916698497028 176:0.0916698497028 177:0.0916698497028 178:0.0916698497028 179:0.0916698497028 180:0.0916698497028 181:0.0916698497028 182:0.0916698497028 183:0.0916698497028 184:0.0916698497028 185:0.0916698497028 186:0.0916698497028 187:0.0916698497028 188:0.0916698497028 189:0.0916698497028 190:0.0916698497028 191:0.0916698497028 192:0.0916698497028 193:0.0916698497028 194:0.0916698497028
How can I convert it into the regular dataset?
Any other method than read.delim
for reading ".svm" file in R will also be helpful.