When knowing your column names beforehand, you can explicitly pass them to the pd.read_csv
via the usecols
parameter. In case of a column name mismatch between your csv file and the predefined column names, an ValueError
will be raised automatically.
To merge your csv-files you can use pd.concat
:
# define your column names
column_names = ["Col A", "Col B", "Col C", "Col D"]
# setup file paths
base_path = os.path.join("E:/","Datasets","Dataset01") # adopted your example here
file_names = ["file1.csv", "file2.csv", "file3.csv", "file4.csv"]
abs_paths = [os.path.join(base_path, file_name)
for file_name in file_names]
dfs = pd.concat([pd.read_csv(abs_path, usecols=columns_names)
for abs_path in abs_paths])
In case you want to check if all columns are identical across your csv files, you can simply load only the header of the csv files while using nrows=0
:
cols = [pd.read_csv(abs_path, nrows=0).columns
for abs_path in abs_paths]
cols_identical = [all(cols[0] == colx) for colx in cols[1:]]
all_cols_same = all(cols_identical)