3
votes

I have wide data that I want to transform to long for. But before doing this, I want to rename all the variable column names.

The first variables in my data frame are anagraphical (id, names, etc) so the loop should run for all columns except the first 9.

Moreover, being the data is wide, I have repeated variables (that should have the same prefix) in the columns representing different years (10 years).

I was thinking at something like this:

for (i in seq(10:440)){
  names(mydata)[i:i+10]<- paste("varname", 1:10, sep="_")
}

Obviously, it doesn't work. But I need something like this, with also "varname" varying with i (I need to recode about 45 variables repeated for 10 years).

I hope to have been clear.

Thanks to anyone will help!!!!!

my data look like this

id Operating_renvenue_last_yr Operating_renvenue_-1 Operating_renvenue-2 ... Fixed_assets_last_yr Fixed_assets-1 Fixed_assets-_2 
ESA08005449 1973859 1983692 2028124 ... 205824 205955 208695
ESA08000820 1044971 962639 912788 ... 100355 120558 135448
ESA17000852 1005575 1035578 1055304 ... 509555 520687 705777
ESA08800450 861971 812596 765714 ... 1120587 1130458 1145200

And I want to obtain:

id            OR_1    OR_2     OR_3 ... FA_1    FA_2   FA_3 
ESA08005449 1973859 1983692 2028124 ... 205824 205955 208695
ESA08000820 1044971 962639 912788 ... 100355 120558 135448
ESA17000852 1005575 1035578 1055304 ... 509555 520687 705777
ESA08800450 861971 812596 765714 ... 1120587 1130458 1145200
4
kindly give a reproducible example. In this way we will be able to help you better. - Mustufain
Show us your current column names and how you want them to look. It's a little unclear. - Sam
post in your question: dput(names(YourData)[1:50]) - Andre Elrico
id Operating_renvenue_last_yr Operating_renvenue_-1 Operating_renvenue-2 .... Fixed_assets_last_yr Fixed_assets-1 Fixed_assets-_2 ESA08005449 1973859 1983692 2028124 ESA08000820 1044971 962639 912788 ESA17000852 1005575 1035578 1055304 ESA08800450 861971 812596 765714 - user9395105
Does it help? Otherwise, how can I give a reproducible example? Is there a command like dataex in Stata? Thank you - user9395105

4 Answers

0
votes

Hope this helps!

#sample data
set.seed(1)
df <- data.frame(id=1:4, replicate(5,sample(0:1,4,rep=TRUE)))

#define a list of varying "varname"
varname <- c('OR', 'FA')
#define how many times above "varname" repeat itself
n <- c(2, 3) #let's say that 'OR' repeats 2 times and 'FA' 3 times

#replace column name
names(df)[2:ncol(df)] <- unlist(mapply(function(x,y) paste(x, seq(1,y), sep="_"), varname, n))

Output is:

  id OR_1 OR_2 FA_1 FA_2 FA_3
1  1    0    0    1    1    1
2  2    0    1    0    0    1
3  3    1    1    0    1    0
4  4    1    1    0    0    1
2
votes

If "varnames" is a vector of your variable names, like

varnames<-c("OR", "FA", ..)

you might simply run

names(mydata)[10:ncol(mydata)]<- paste0(rep(varnames, each=10), "_",1:10)

which gives you

names(mydata)[10:ncol(mydata)]

 "OR_1"  "OR_2"  "OR_3"  "OR_4"  "OR_5"  "OR_6"  "OR_7"  "OR_8"  "OR_9"  "OR_10" "FA_1"  "FA_2"  "FA_3"  "FA_4"  "FA_5"  "FA_6"  "FA_7"  "FA_8" 
 "FA_9"  "FA_10" ...

But be aware that this only is right, if each variable has the same number of repetitions!

1
votes

I would advise you to divide your work in small steps.

# Make a copy mydata column names
newnames <- names(mydata)

# Build input data
mydata <- data.frame(a=1,b=2,c=3,d=4,e=5,e=6,e=7,e=8,f=9)
for (i in seq(10:440)) mydata[[i]] <- 10

# A vector of variable names for the sake of the example
varnames <- paste('var', 1:45)

# Set new variable names
newnames[10:length(newnames)] <- paste(rep(varnames, each = 10)[1:(length(newnames)-9)], 1:10, sep = '_')

# Commit your changes
names(mydata) <- newnames

# Result
names(mydata)[1:20]
0
votes

Simple Case scenario If you have two data frames or csv with same number of columns but different name and you need to rename them in order to merge.

Then:

names(df2)[1:ncol(df2)] <- paste0(names(df1)[1:ncol(df1)])

where,
df1 is having the column names that you want to have in df2 .
ncol returns the number of columns in that particular data frame