I have read tables from pdf using tabula-py command with the following code:
table = tabula.read_pdf(files[0],pages = 'all',multiple_tables = True, stream = True)
Sometimes values from two columns are joined into a single column(separated by single space). For example:
col0 | col1 | col2 | col3 | col4 | col5 | col6 | col7 |
---|---|---|---|---|---|---|---|
a1 | b1 c1 | d1 | e1 f1 | g1 | h1 | NA | NA |
a2 | b2 | c2 | d2 | e2 | f2 | g2 | h2 |
How can i readjust the values into the correct columns, to get:
col0 | col1 | col2 | col3 | col4 | col5 | col6 | col7 |
---|---|---|---|---|---|---|---|
a1 | b1 | c1 | d1 | e1 | f1 | g1 | h1 |
a2 | b2 | c2 | d2 | e2 | f2 | g2 | h2 |