0
votes

I have a fact table containing 50 million records named AccountLines partitioned by Posting_Date_New. When I'm filtering records on particular partitioned column then my query works fine and scan only limited data between given range. But when I made join with Dimension table on the basis of Posting_Date_New column and filter on Financial Year then it scans whole table. How can I solve this problem? I need to join my Fact Table with Dimension table and filter on Dimension table's columns without scanning whole table. Please help.

My queries are as follows.

--Query complete (5.542 sec elapsed, 244.37 MB processed)

select ah.ChargeGroup, sum(Amount) Amount from SSIS_STAGING.AccountLine acc
inner join SSIS_STAGING.Dim_Times_BI_Clustering dd on dd.Posting_Date_New = acc.Posting_Date_New
inner join SSIS_STAGING.BranchHierarchy br on br.CostCenterId = acc.BookingBranchID
inner join SSIS_STAGING.Accounts_Hierarchy ah on ah.Account = acc.G_L
where acc.Posting_Date_New between '2018-04-01' and '2019-03-31' and ZoneName = 'BU-North'
group by ah.ChargeGroup

--Query complete (16.530 sec elapsed, 5.51 GB processed)

select ah.ChargeGroup, sum(Amount) Amount from SSIS_STAGING.AccountLine acc
inner join SSIS_STAGING.Dim_Times_BI_Clustering dd on dd.Posting_Date_New = acc.Posting_Date_New
inner join SSIS_STAGING.BranchHierarchy br on br.CostCenterId = acc.BookingBranchID
inner join SSIS_STAGING.Accounts_Hierarchy ah on ah.Account = acc.G_L
where dd.FinancialYear = '2018-19' and ZoneName = 'BU-North'
group by ah.ChargeGroup
1
have you tried adding AND acc.Posting_Date_New between '2018-04-01' and '2019-03-31' into the WHERE clause - so you actually explicitly state which partitions to only scan? because now - your quesry has no idea that you need specific partitions! - Mikhail Berlyant

1 Answers

0
votes

To prune partitioned table you need to use the timestamp in a where clause:

standardSQL

SELECT t1.name, t2.category FROM table1 t1 INNER JOIN table2 t2 ON t1.id_field = t2 field2 WHERE t1.ts = CURRENT_TIMESTAMP()

This being said as BigQuery is performs better with denormalize data.