0
votes

Or to put the question another way: Why is PapaParse's ParseResult.data an empty array when trimming all leading and trailing empty cells during Papa.step() function? EDIT: Please note I can achieve what I'm wanting by mapping over the parsed results and trimming, but I don't want to parse and then map, I'd rather do it all in one go.

Example CSV:

Col 1,Col 2,Col 3
1-1,1-2,
,2-2,2-3
3-1,3-2,3-3

Note that row 1 contains headers (Col 1, Col 2, etc). Row 2 col 3 is empty, and row 3 col 1 is empty.

Given that CSV, I want to present this back to the user (as a nicely-formatted table):

|     |     |     |
|-----|-----|-----|
| 1-1 | 1-2 |     |
| 2-2 | 2-3 |     |
| 3-1 | 3-2 | 3-3 |

I want to push all rows as far to the left as they can go, and remove all empty cells from the end of each row.

In other words, I want to trim all empty cells from both the beginning and the end of each row. Below is the code I'm using. I have put debuggers inside of trimEmptyCells and it is doing exactly as expected. However, the ParseResult that parseAndTrim returns contains an empty data array.

export const parseAndTrim = (csv: string): Papa.ParseResult => {
    return Papa.parse(csv, {
        skipEmptyLines: true,
        step: trimEmptyCells,
    })
};

const trimEmptyCells = (results: Papa.ParseResult) => {
    // Note that `_.dropWhile` and `_.dropRightWhile` are [lodash
    // functions](https://lodash.com/docs/4.17.15#dropRight).
    const leftTrimmed = _.dropWhile(results.data, (r) => r === "");
    return _.dropRightWhile(leftTrimmed, (r) => r === "");
};

My first guess was that PapaParse was experiencing errors with arrays with different lengths, but the errors array is also empty. So I tested what I could (no step function) at https://www.papaparse.com/demo using the example below and simply having missing cells (not merely empty) throws no errors and returns a proper data array.

Example test input at https://www.papaparse.com/demo

Col 1,Col 2,Col 3
1-1,1-2
,2-2,2-3
1

1 Answers

0
votes

Based on this comment from pokoli (the #2 contributor to PapaParse and the #1 contributor since early 2017), I believe this is impossible. pokoli's proposed solution is

You should use Papa.parse to read records as array, filter them and then use Papa.Unparse to write the second file.

I wish I could mutate data while parsing to be faster, but PapaParse is very fast. I was able to parse a 36,000-line csv in under 300ms, and unparse in twice the time. Parsing a 2,000-line csv took under 30ms and unparse again took twice the time. My use case will involve CSVs under 2,000 lines 99% of the time so parsing into 2d array, filtering, unparsing back into csv, then parsing again into json won't take too long.