0
votes

Thank you for taking the time to read through my question. I hope you can help.

I have a large DataFrame with loads of columns. One column is a unique ID on which I would like to calculate totals and other custom calculations based on the columns above it.

The DataFrame columns look something like this:

|  ID  |  CLASS  |  AREA  |  VAR1  |  VAR2  |  VAR3  |
||||||___|

I would like to calculate the Total AREA for each unique ID for all the CLASSES. Then I need to calculate the a custom totals for the VAR columns using the variables from the other columns. At the end I would like to have a series of grouped IDs that look like this:

enter image description here

I hope that this make sense. The current thinking I have applied is to use the following code:

df = pd.read_csv(data.csv)

df.groupby('ID').apply(lambda x: x['AREA'].sum())

This just provides me a list of all the summed areas, which I can store in a variable to append bakct ot the original dataframe through the unique ID. However I am not sure how I get the other calculations done as shown above. On top of that I am ot sure how to get the final DataFrame to mimic the above table format.

I am just starting to understand Pandas and constantly having to teach myself and ask for help where it gets rough.

Some guidance would be greatly appreciated. I am open to providing more information and clarity on the problem if this question is not sufficient. Thank you.

Is it correct that your ´unique´ ID has duplicates? It is much easier to answer your question if you provide a example DataFrame that can be copied easily. - Jacob