0
votes

From a file I have parsed the fields that I need and stored them in variables and it looks something like below:

field_list = ['some_value','some_other_value']
raw_data = """sring1|0|2|N.S.|3|
sring2|0|2|N.S.|2|
sring3|0|2|3|5|"""

Now I need to create a df which looks like:

      Str           Measure   Value
0  sring1        some_value  N.S.
1  sring1  some_other_value     3
2  sring2        some_value  N.S.
3  sring2  some_other_value     2
4  sring3        some_value     3
5  sring3  some_other_value     5

The logic here is as following:

E.g. For the line in raw_data "sring1|0|2|N.S.|3|" the Str Column value would be sring1 and the value for Measure Column will be some_value(which is coming from the field_list) and the value for Value Column will be N.S

Now, again for the same string the value for the Str Column value would be sring1 and the value for Measure Column will be some_other_value and the value for Value Column will be 3.

The |2| in the string "sring1|0|2|N.S.|3|" tells us how many rows will be there and the last two are the values for the field_list

Currently I have the following code:

field_list = ['some_value','some_other_value']

db_columns = ['Str','Measure','Value']

raw_data = """sring1|0|2|N.S.|3|
sring2|0|2|N.S.|2|
sring3|0|2|3|5|"""

entry_list = raw_data.splitlines()
final_db_list =[]
for entries in entry_list:
    each_entry_list = entries.split('|')
    security = each_entry_list[0].strip()
    print(each_entry_list)
    no_of_fields = int(each_entry_list[2])
    db_list=[]
    upload_list=[]
    for i in range (0,no_of_fields):
        field = field_list[i]
        value = each_entry_list[3+i]
        db_list=[security,field,value]
        upload_list.append(db_list)
    final_db_list.append(upload_list)
flatList = [ item for elem in final_db_list for item in elem]

df = DataFrame(flatList,columns=db_columns)
print(df)

Can someone please help me with a better way of doing this. The one that I have works but is too messy. Need to pythonize it a bit and I am out of ideas. Please help!

1

1 Answers

0
votes

We can do it like this:

import pandas as  pd
from io import StringIO

field_list = ['some_value','some_other_value']
raw_data = """sring1|0|2|N.S.|3|
sring2|0|2|N.S.|2|
sring3|0|2|3|5|"""

df = pd.read_csv(StringIO(raw_data), sep='|', header=None)

df = df.drop(5, axis=1)

df = (df.set_index([0,1,2])
        .set_axis(field_list, axis=1)
        .reset_index(level=[1,2], drop=True)
        .stack()
        .rename('Value')
        .rename_axis(['Str', 'Measure'])
        .reset_index()
      )

print(df)

Output:

      Str           Measure Value
0  sring1        some_value  N.S.
1  sring1  some_other_value     3
2  sring2        some_value  N.S.
3  sring2  some_other_value     2
4  sring3        some_value     3
5  sring3  some_other_value     5