0
votes

I have 3 CSV files (separated by ',') with no headers and need to concat them into one file:

file1.csv

United Kingdom     John

file2.csv

France  Pierre

file3.csv

Italy   Marco

expected result:

United Kingdom    John
France            Pierre
Italy             Marco

my code:

import pandas as pd

df = pd.read_csv('path/to/file1.csv', sep=',')
df1 = pd.read_csv('path/to/file2.csv', sep=',')
df2 = pd.read_csv('path/to/file3.csv', sep=',')

df_combined = pd.concat([df,df1,df2])

df_combined.to_csv('path/to/output.csv')

the above gives me data merged but it added rows from my CSV files as new columns and rows, instead to add only new rows to existing two columns:

United Kingdom     John
                          France     Pierre
                                               Italy    Marco

Could someone please help with this? Thank you in advance!

2
Are you sure your field separators in the csv files are commas and your read is correct ? What is the shape of your df, df1 and df2 before concat ? It seems to me that you are reading the csv content as column names and not the data itself (use header = None if there are no column headers)predmod
Hi predmod, when I print each df in python it shows the same column count and checked that rows are divided by commas. Would there exist maybe another way than pandas to merge/concatenate rows?Baobab1988
the concat method should work fine and is the right one (you could use frame's append as well) but something is not binding correctly (indexes are off?) in your case - what are the column names of df, df1, df2 ? print(df.columns) ... you could also print df.shape tuples. I suspect you import first row as column names (you need to use header=None argument in read_csv)predmod

2 Answers

1
votes

Pandas usually infer the column name from the first row when reading CSV file. One thing you can do here is to check each data frame's header, which you should expect to see the sample data is treated as header.

In order to override this default behaviour, you can use names field to explicitly specify column names, like df1=pd.read_csv("file1.csv", names=['country','name']). Then pandas would be able to merge columns accordingly.

1
votes

Read csv as follows

df = pd.read_csv('path/to/file1.csv', sep=',', header=None)
df1 = pd.read_csv('path/to/file2.csv', sep=',', header=None)
df2 = pd.read_csv('path/to/file3.csv', sep=',', header=None)

You can concatenate as bellow

df.reset_index(inplace=True, drop=True)
df1.reset_index(inplace=True, drop=True)
df2.reset_index(inplace=True, drop=True)
pd.concat([df,df1,df2], axis=0)

output as expected

enter image description here