0
votes

I am parsing an Apache log file and saving it into pandas data frame for my further investigation.

But in the log file I have some bad lines and so the following error occurs:

ValueError: Expected 11 fields in line 4320, saw 27

To overcome this issue, I included error_bad_lines = False while reading the file. This doesn't help as I am getting the following error:

ValueError: The 'error_bad_lines' option is not supported with the 'python' engine

Note : I am explicitly using python engine as I have separator as a regular expression.

Code snippet:

data = pd.read_csv(
    log_file, 
    sep=r'\s(?=(?:[^"]*"[^"]*")*[^"]*$)(?![^\[]*\])', 
    engine='python', 
    na_values='-',
    header=None,
    usecols = use_cols,
    skiprows =1,
    converters={time_taken_index[0]:parse_sec, time_index[0]:parse_datetime, req_index[0]:parse_str,status_index[0]:parse_str},
    error_bad_lines = False
    )

I'd be grateful for any suggestions. Thank you.

1
Could you attach a part of the log file you're talking about?jjj

1 Answers

1
votes

It seems that you are using an old version of Pandas (<= 0.19.0).

The parameter error_bad_lines = False will work with the python engine in Pandas 0.20.0+.

So, just update the Pandas library.