what is skewed column in Oracle

Question

I found some bottleneck of my query which select data from only single table then require time and
i used non unique key index on two column and with column used in where clause.

select name ,isComplete from Student where year='2015' and isComplete='F'

Now i found some concept from internet like skewed column so what is it?
have an idea then plz help me?
and how to resolve problem of skewed column?
and how skewed column affect performance of the Query?

@GriffeyDog , see below physical table like Student (StudID,Name,year,isComplete,location) and index is on year and isComplete — Smith Magra

Matthew McPeak Matthew McPeak · Accepted Answer · 2016-02-19T22:51:07

Skewed columns are columns in which the data is not evenly distributed among the rows.

For example, suppose:

You have a table order_lines with 100,000,000 rows
The table has a column named customer_id
You have 1,000,000 distinct customers
Some (very large) customers can have hundreds of thousands or millions of order lines.

In the above example, the data in order_lines.customer_id is skewed. On average, you'd expect each distinct customer_id to have 100 order lines (100 million rows divided by 1 million distinct customers). But some large customers have many, many more than 100 order lines.

This hurts performance because Oracle bases its execution plan on statistics. So, statistically speaking, Oracle thinks it can access order_lines based on a non-unique index on customer_id and get only 100 records back, which it might then join to another table or whatever using a NESTED LOOP operation.

But, then when it actually gets 1,000,000 order lines for a particular customer, the index access and nested loop join are hideously slow. It would have been far better for Oracle to do a full table scan and hash join to the other table.

So, when there is skewed data, the optimal access plan depends on which particular customer you are selecting!

Oracle lets you avoid this problem by optionally gathering "histograms" on columns, so Oracle knows which values have lots of rows and which have only a few. That gives the Oracle optimizer the information it needs to generate the best plan in most cases.

what is skewed column in Oracle

2 Answers