2
votes

I am reading in a fixed-width file with read_fwf, and I need white space to be preserved, as the form uses blanks as a response type. I have looked for a keyword/parameter that changes the default settings which strip this white space, but I'm not sure if one exists. I saw the exact same question on here, which was not resolved until the user solved this issue, but not with pandas read_fwf

After calling read_fwf into a df, I tried padding the beginnings/ends of my strings with extra characters, but it didn't solve the problem that information was being lost due to the white space stripping in the first place.

df = pd.read_fwf(file, widths=widths)

1
There's a section on Files with Fixed Width Columns here. Not sure if you saw that? - run-out
It says that the parser "takes care of white space", but I want it to leave excess white space if certain rows do not reach the width limits. I saw that this problem was discussed on the pandas git repo, github.com/pandas-dev/pandas/issues/16772, but I couldn't figure out if the commits made fixed the problem - dylanbking97
It looks like they just updated the documentation to reference the whitespace behaviour. It was suggested in the comments that an option be added to allow for keeping whitespaces, but this was not completed, at least not in this thread on Git. - run-out
Ah I see. Thanks for the help! - dylanbking97

1 Answers

0
votes

This worked for me:

pd.read_fwf(file,header=None,colspecs=[(0,5000)],delimiter="\n\t")

Addressed in Issue #16772 https://github.com/alanbato/pandas/commit/ad1d3a1688fd489404e91ecc0017c2abc1a322a4