I read this data set and I want to join the data for the training set and the test set (I should mention that this is part of a coursera course exercise).
I have read both data sets and gave all columns names,the training data have 7352 rows and 562 columns and the test set have 2947 rows and 562 columns. The names of the columns of both data sets are the same.
When I try to join the data with bind_rows I get a data set with 10299 rows but with 478 columns, not 562.
When I use rbind I get the correct result, but I need to cast it again using tbl_df so I prefer doing it using bind_rows.
The following is the script I wrote, running it from a folder containing the unzipped data from the above ling (e.g the folder "UCI HAR Dataset") reproduces the problem.
## Setting the script folder to be current directory
CurrentScriptDirectory = script.dir <- dirname(sys.frame(1)$ofile)
setwd(CurrentScriptDirectory)
library(dplyr)
#Readin the data
train_x <- tbl_df(read.table("./UCI HAR Dataset/train/X_train.txt"))
train_y <- tbl_df(read.table("./UCI HAR Dataset/train/y_train.txt"))
test_x <- tbl_df(read.table("./UCI HAR Dataset/test/X_test.txt"))
test_y <- tbl_df(read.table("./UCI HAR Dataset/test/y_test.txt"))
#Giving the y's proper names
colnames(train_y) <- c("Activity Name")
colnames(test_y) <- c("Activity Name")
#Reading features names
featuerNames<-read.table("./UCI HAR Dataset/features.txt")
featuerNames<-featuerNames[,2]
#Giving the training and test data proper names
colnames(train_x) <- featuerNames
colnames(test_x) <- featuerNames
labeledTrainingSet <- bind_cols(train_x,train_y)
labeledTestSet <- bind_cols(test_x,test_y)
labledDataSet <- bind_rows(labeledTrainingSet,labeledTestSet)
Can someone help me understand what I'm doing wrong ?