2
votes

I involved in project where I need to build histogram by dates. Before me this done in Java code by tons of SQL queries to DB for each rectangles (date subregions).

I try another approach:

select sum(CNT), trunc(DATE, 'MM') from DATA
  where DATE >= TO_DATE('01-01-2012','DD-MM-YYYY')
  and INC_DATE <= TO_DATE('31-12-2012','DD-MM-YYYY')
  group by trunc(DATE, 'MM')
  order by trunc(DATE, 'MM');

and collect data from ResultSet in Java code. But if some month have no data, I miss rectangle in histogram!!!

Is it possible to fix SQL (or may be PL/SQL) expression to include missing dates in result with zero sum?

Or how to build more elegant date sequence generator in Java to find missing dates (aligned to days/months/quarters/years)?

3

3 Answers

4
votes

Try something like this (simplified example):

with 
months_int as
(select trunc(min(inc_date), 'MM') min_month, trunc(max(inc_date), 'MM') max_month
 from data),
months as
(
  select add_months(min_month, level-1) mnth_date
  from months_int 
  connect by add_months(min_month, level-1)<= max_month
  )
select  mnth_date, sum(cnt) 
from data  right outer join months on trunc(inc_date, 'MM') = mnth_date
group by mnth_date
order by mnth_date

Here is a sqlfiddle example

3
votes

You need to create your list of dates first; either by creating a calender table or by using the CONNECT BY syntax.

select to_date('01-01-2012','DD-MM-YYYY') + level - 1
  from dual
connect by level <= to_date('31-12-2012','DD-MM-YYYY') 
                    - to_date('01-01-2012','DD-MM-YYYY') + 1

You can then LEFT OUTER JOIN this to your main query to ensure the gaps are populated:

with the_dates as (
  select to_date('01-01-2012','DD-MM-YYYY') + level - 1 as the_date
    from dual
 connect by level <= to_date('01-01-2012','DD-MM-YYYY') 
                      - to_date('31-12-2012','DD-MM-YYYY') + 1
         )
select sum(b.cnt), trunc(a.the_date, 'MM') 
  from the_dates a
  left outer join data b
    on a.the_date = b.date
 group by trunc(a.the_date, 'MM')
 order by trunc(a.the_date, 'MM')

You no longer need the WHERE clause as this is taken care of in the JOIN. Note that I'm not using the DATE column from your main table but the date from the generated one instead. This will work if you want to modify the dates to not be the end of the month but if you want it month-wise you could truncate the date in the WITH clause. You should be aware of indexes before doing this though. If your table is indexed on DATE and not TRUNC(DATE, 'MM') then it is preferable to JOIN on DATE alone.

DATE is a bad name for a column as it's a reserved word; I suspect you're not using it but you should be aware.

If you were using a calender table it would look something like this:

select sum(b.cnt), trunc(a.the_date, 'MM') 
  from calender_table a
  left outer join data b
    on a.the_date = b.date
 where a.the_date >= to_date('01-01-2012','DD-MM-YYYY') 
   and a.the_date <= to_date('31-12-2012','DD-MM-YYYY')
 group by trunc(a.the_date, 'MM')
 order by trunc(a.the_date, 'MM')
2
votes

My production code based on local guru advises and @Ben technique:

-- generate sequence 1..N:
SELECT level FROM dual CONNECT BY level <= 4;

-- generates days:
select to_date('01-01-2012','DD-MM-YYYY') + level - 1
  from dual
connect by level <= to_date('31-12-2012','DD-MM-YYYY') - to_date('01-01-2012','DD-MM-YYYY') + 1;

with dates as (
  select (to_date('01-01-2012','DD-MM-YYYY') + level - 1) as daterange
    from dual
    connect by level <= to_date('31-12-2012','DD-MM-YYYY') - to_date('01-01-2012','DD-MM-YYYY') + 1
  ) select sum(tbl.cnt) as summ, trunc(dates.daterange, 'DDD')
      from dates
           left outer join DATA_TBL tbl
        on trunc(tbl.inc_date, 'DDD') = trunc(dates.daterange, 'DDD')
      group by trunc(dates.daterange, 'DDD')
      order by trunc(dates.daterange, 'DDD');

-- generates months:
select ADD_MONTHS(to_date('01-01-2012','DD-MM-YYYY'), level - 1)
  from dual
connect by level <= months_between(to_date('31-12-2012','DD-MM-YYYY'), to_date('01-01-2012','DD-MM-YYYY')) + 1;

with dates as (
  select add_months(to_date('01-01-2012','DD-MM-YYYY'), level-1) as daterange
    from dual
    connect by level <= months_between(to_date('31-12-2012','DD-MM-YYYY'), to_date('01-01-2012','DD-MM-YYYY')) + 1
  ) select sum(tbl.cnt) as summ, trunc(dates.daterange, 'MM')
      from dates
           left outer join DATA_TBL tbl
        on trunc(tbl.inc_date, 'MM') = trunc(dates.daterange, 'MM')
      group by trunc(dates.daterange, 'MM')
      order by trunc(dates.daterange, 'MM');

-- generates quarters:
select ADD_MONTHS(to_date('01-01-2012','DD-MM-YYYY'), (level-1)*3)
  from dual
  connect by level <= months_between(to_date('31-12-2012','DD-MM-YYYY'), to_date('01-01-2012','DD-MM-YYYY'))/3 + 1;

with dates as (
  select add_months(to_date('01-01-2012','DD-MM-YYYY'), (level-1)*3) as daterange
    from dual
    connect by level <= months_between(to_date('31-12-2012','DD-MM-YYYY'), to_date('01-01-2012','DD-MM-YYYY'))/3 + 1
  ) select sum(tbl.cnt) as summ, trunc(dates.daterange, 'Q')
      from dates
           left outer join DATA_TBL tbl
        on trunc(tbl.inc_date, 'Q') = trunc(dates.daterange, 'Q')
      group by trunc(dates.daterange, 'Q')
      order by trunc(dates.daterange, 'Q');

-- generates years:
select add_months(to_date('01-01-2007','DD-MM-YYYY'), (level-1)*12)
  from dual
  connect by level <= months_between(to_date('31-01-2012','DD-MM-YYYY'), to_date('01-01-2007','DD-MM-YYYY'))/12 + 1;

with dates as (
  select add_months(to_date('01-01-2007','DD-MM-YYYY'), (level-1)*12) as daterange
    from dual
    connect by level <= months_between(to_date('31-01-2012','DD-MM-YYYY'), to_date('01-01-2007','DD-MM-YYYY'))/12 + 1
  ) select sum(tbl.cnt) as summ, trunc(dates.daterange, 'YYYY')
      from dates
           left outer join DATA_TBL tbl
        on trunc(tbl.inc_date, 'YYYY') = trunc(dates.daterange, 'YYYY')
      group by trunc(dates.daterange, 'YYYY')
      order by trunc(dates.daterange, 'YYYY');

But connect by level is hack according to: