1
votes

Problem Statement:

I have a list of bank transactions with an amount and a category.

I have categories defined as a string which can contain any number of : characters. The : character indicates a sub category. For example expenses Household contains all household expenses and Household : Utilities indicates the subcatogy Utiltities within houshold.

I can groupby the category and take the sum to get expenses within a specific subcategory. For example I can get the total spent on Household : Utilities or Household : Rent. But doing a groupby sum will not give me a Household entry which is the sum of the Utilities and Rent Subcategories.

I wish to be able to expand these sub category and parent category sums in in a Hierarchical way so that I have a comprehensive list. E.g If I have a transaction with Household : Utilties : Water for $10, and a transaction with Household : Utilities : Electricity for $15 I want the result to be

  • Household .............................. $ 25
  • Household : Utilities .................. $ 25
  • Household : Utilities : Water .......... $ 10
  • Household : Utilities : Electricity .... $ 15

This is a link to an example spreadsheet.

https://docs.google.com/spreadsheets/d/14URPJ4fWl6id9z0-AI1hxNClo-10gotKdMSXnuehNVI/edit?usp=sharing

I can get simple summary of category vs sum using a query() and groupby but this will not expand to parent categories.

I can do this in python, But am having difficulty getting my head around how I would do this in a spreadsheet. Does anyone have any ideas?

1
could there be rent twice? and... how many subcategories/levels you will have ? - player0
There could be rent twice in the initial transaction list. But in the groupby, sum(amount) summary shown in the spreadsheet there should only be one rent. I'm looking for an algorithm that works for any number of levels as I have this in python. But Realistically this will probably always be < 10. - Pierce

1 Answers

1
votes

try:

=ARRAYFORMULA(QUERY({A5:B; 
 SPLIT(FLATTEN(IFERROR(REGEXREPLACE(REGEXREPLACE(IF(
 IFERROR(SPLIT(A5:A, ":"))="",,A5:A),
 IFERROR(SPLIT(A5:A, ":")), ), " :$| ::.+|^:.+", ))&"×"&B5:B), "×")}, 
 "select Col1,sum(Col2)
  where Col2 is not null
  group by Col1
  order by Col1
  label Col1'Category',sum(Col2)'Quantity'"))

enter image description here