6
votes

I have many .csv files that are to be loaded into pandas data-frames, there are at a minimum two delimiters comma and semi colon, and I am unsure of the rest of the delimiters. I understand that the delimeter can be set using

dataRaw = pd.read_csv(name,sep=",")

and

dataRaw = pd.read_csv(name,sep=";")

unfortunately if I was to not specify a delimiter the default is comma which results in a single column data frame for other delimiters. thus is there a dynamic way to allocate a delimiter so that any csv can be passed to pandas? such as try comma or semicolon. The pandas documentation doesn't allude to the use of logic in the csv read

2

2 Answers

7
votes

If you have different separators you can use:

dataRaw = pd.read_csv(name,sep=";|,")

is a Regular expression that can handle multiple separators divided by the OR (|) operator.

8
votes

There is actually an answer in pandas documentation (at least, for pandas 0.20.1)

sep : str, default ‘,’

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used automatically. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'

This means you can read your files just with

dataRaw = pd.read_csv(name, sep = None, engine = 'python')

This should also work if there are other separators than ';' or '.' among your .csv files (for example, tab-separators).