i have 10 different subdirectories with same file names in each directory ( 20 files per directory ) and column 0 is the index column in each file.
e.g
**strong text**DIRECTORY A
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
**DIRECTORY B**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
**DIRECTORY C**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
Each of the above files contains 6 columns and index_col = 0 with NO
column headers
**DIRECTORY FILES_MERGED**
- data_20170101_k.csv
- data_20170102_k.csv
- data_20170102_k.csv
- data_20170103_k.csv
- data_20170104_k.csv
- data_20170105_k.csv
.....
.....
- data_20170120_k.csv
I want to merge all the files with SAME NAME from EACH subdirectory into 1 file with SAME NAME and save the new file in a NEW subdirectory e.g DIRECTORY FILES_MERGED with INDEX = Column 0. The merged file has only one index column with columns 1,2,3,4,5 from each file with same name from each directory
i have read a csv file into a pandas dataframe
df= pd.read_csv(filename, sep=",", header = None, usecols=[0, 1, 2, 3, 4, 5])
Here is the format of dataframe
my initial original Dataframe:
0 1 2 3 4 5
0 1451606820 1.0862 1.08630 1.08578 1.08578 25
1 1451608800 1.0862 1.08630 1.08578 1.08610 10
2 1451608860 1.0862 1.08620 1.08578 1.08578 16
3 1451610180 1.0862 1.08630 1.08578 1.08578 27
4 1451610480 1.0858 1.08590 1.08560 1.08578 21
5 1451610540 1.0857 1.08578 1.08570 1.08578 2
6 1451610600 1.0857 1.08578 1.08570 1.08578 2
7 1451610720 1.0857 1.08578 1.08570 1.08578 2
8 1451610780 1.0857 1.08578 1.08570 1.08578 2
Column '0' = Datetime in Epoch time
Columns 1,2,3,4,5 are values
os.listdir()
oros.walk()
to loop over the directories and files, make adict
using the filenames as keys and lists of dataframes as values, and thenpd.concat()
to merge the lists into one dataframe for output. - Victor Chubukovpd.concat()
as you show is to append dfs a distinct operation than merge. If OP can clarify the intended result is a merge (column-bind) or append (row-bind/stack), we can help precisely. - Parfait