0
votes

I have a table that has appox. 450,000 records. I have to find the avg data usage per day per user for non gmail users. the application names such as gmail, facebook, yahoo.etc are captured in 'name' field. IF 'name' has value as 'gmail', another field called 'label' captures a value as 'D' saying it is capturing data usage and updating it in two other fields(Datausage1, datausage2). I cannot have 'name' and 'label' in the same where clause as there are other values that are captured in 'label'. Hence, I have joined the tables as below but the query is not efficient and it is running for a long time. Is there any other way to get the required results?

(SELECT
t1.network AS Network,
(((SUM (t2.datausage1) + SUM
(t2.datausage2))/1073741824)/EXACT_COUNT_DISTINCT(t1.user))/33 AS Avg_data_GigaBytes_Per_day_Per_User,
from
(SELECT user,StartTime,
network
FROM [mytable]
WHERE name NOT IN('gmail')
)t1
JOIN
(SELECT user,datausage1,datausage2
FROM [mytable]
WHERE label='T'
)t2
ON t1.user=t2.user
GROUP BY 1
)

2

2 Answers

1
votes

Try below

SELECT
  network,
  AVG(usage_per_day_by_user) AS usage_per_day_per_user_average
FROM (
  SELECT
    network,
    user,
    DATE(StartTime) AS usage_day,
    SUM(t2.datausage1 + t2.datausage2)/1073741824 AS usage_per_day_by_user
  FROM [mytable]
  WHERE NOT name IN ('gmail') 
  AND label = 'D' 
  GROUP BY 1, 2, 3
)  
GROUP BY network  

Below statement is not clear so I just ignored it for now

I cannot have 'name' and 'label' in the same where clause as there are other values that are captured in 'label'

Btw, the main issue I see with your query in question is that it produces cross join of entries for the same user. so essentially you 0.5M rows becomes MM rows and that is what then looks to you as slow

0
votes

Your query is missing some details, but it sounds like you want a conditional sum rather than a join. Something along the lines of:

SUM(IF(name='gmail' AND label='D',datausage1+datausage2,<however you calculate the non-gmail case>))