3
votes

I have a table (in Oracle 12c) with 22 million records so far, and plus 10,000 records inserted every day. We need the counts based on this table, such as:

select col1, col2, count(*) cnt from my_table group by col1, col2;

This query will return less than 30 rows, and the combination of col1, col2 will be unique.

Our application needs to check the CNT value frequently, but the approximate values of the CNT are good enough. That means we can create a materialized view and refresh it every 10-20 Minutes.

Is the materialized view a good choice for this requirement, or should I create a regular table for it?

Thanks in advance!!!

2
MView itself creates a table with the data returned from the view. So I don't see the differenceNaresh
@ILLUMINATI7590, true, a MV is just a table based on a view. But, with one simple create statement it creates the table, the view, the scheduled job to populate and refresh the table; in short it's like one stop shopping. In the right circumstances, MV's are very handy.Wolf
Maybe create BITMAP INDEX on col1 and col2 (one on each column). Then the query should be much faster.Wernfried Domscheit

2 Answers

1
votes

There are at least three different ways to achieve this:

  1. Fast Refresh Materialized View A fast refresh materialized view is probably the ideal solution. The 10,000 rows inserted will have a small amount of overhead but then there is no need to rebuild anything; the new totals are available immediately after each commit and retrieving the new totals will be incredibly fast. The downside is that fast refresh materialized views are difficult to setup, and have lots of weird gotchas. They work well with your sample schema but may not work with a more complicated scenario.

    Sample Schema

    drop table my_table;
    
    create table my_table(
        id number not null,
        col1 number not null,
        col2 number not null,
        constraint my_table_pk primary key (id)
    );
    
    insert into my_table
    select level, mod(level, 30), mod(level+1, 30)
    from dual
    connect by level <= 100000;
    
    begin
        dbms_stats.gather_table_stats(user, 'MY_TABLE');
    end;
    /
    

    Create materialized view log and materialized view

    create materialized view log on my_table with rowid(col1, col2) including new values;
    
    create materialized view my_table_mv
    refresh fast on commit
    enable query rewrite as
    select col1, col2, count(*) total
    from my_table
    group by col1, col2;
    

    Query rewrite

    The sample query is silently modified to use the small materialized view instead of the large table.

    explain plan for
    select col1, col2, count(*) cnt
    from my_table
    group by col1, col2;
    
    select * from table(dbms_xplan.display);
    
    Plan hash value: 786752524
    
    --------------------------------------------------------------------------------------------
    | Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |             |    30 |   300 |     3   (0)| 00:00:01 |
    |   1 |  MAT_VIEW REWRITE ACCESS FULL| MY_TABLE_MV |    30 |   300 |     3   (0)| 00:00:01 |
    --------------------------------------------------------------------------------------------
    
  2. Compressed B*Tree Index If there are only 30 unique values the index should compress well and not take up much space. Then the index can be used in a fast full index scan and act like a skinny table. This method requires at least one value to be not null. If both could be null then a function-based index could be useful here.

    create index my_table_idx on my_table(col1, col2) compress;
    
  3. Bitmap index Bitmap indexes are small and fast when there are a small number of distinct values. However they can introduce disastrous locking problems for some types of DML.

    create bitmap index my_table_idx on my_table(col1, col2);
    
0
votes

Depending on how approximate your method can be, you can also try a SAMPLE clause:

select col1,
       col2,
       count(*) cnt
from   my_table sample(1)
group by col1, col2;

Depending on the data distribution of the values this might give a reasonable estimate. You can test how high a number you need in order to give a reasonable result, but it's rarely worth going over 4 or 5 unless using the block clause:

select col1,
       col2,
       count(*) cnt
from   my_table sample block(10)
group by col1, col2;