SQL Performance

        SQL performance is always a key issue that every DBA faces. This article introduces five ways for improving SQL performance.

  1. If subquery is a parameter, then use EXISTS instead IN

If you use subquery as a parameter, it is better that you us EXISTS than IN even these two statements return same result, the performance of EXISTS is superior to IN

  1. Use IN

SELECT * FROM Class_A WHERE id IN (SELECT id FROM CLASS_B);

  • Use EXISTS

SELECT * FROM CLASS_A A WHERE EXISTS

(SELECT * FROM Class_B B WHERE A.id=B.id);

The reason why EXISTS has higher speed

  1. If id(key) has index, it then will reference index
  2. EXISTS will stop executing once it has found a qualified row, but IN will continue to search all the data for a table.

2.  Avoid using order

        DBMS does ordering execution frequently, so the user doesn’t have to care which ordering operation is performing

  1. Use All option

For example UNION, if you don’t need repeated data, you can use UNION ALL instead UNION

1.

SELECT * FROM CLASS_A

UNION

SELECT * FROM CLASS_B

2.

SELECT * FROM CLASS_A

UNION ALL

SELECT * FROM CLASS_B

Use EXISTS rather than DISTINCT

DISTINCT will exclude repeated data. When you try to combine two tables, you shall use EXISTS instead of DISTINCT to avoid ordering.

For example,

1. SELECT DISTINCT I.item_no FROM Items I INNER JOIN SalesHistory SH ON I.item_no = SH.item_no;

2. SELECT item_no FROM Items I WHERE EXISTS

(SELECT * FROM SalesHistory SH WHERE I.item_no=SH.item_no);

3.(MAX/MIN) use index

When you use MAX and MIN two math functions, these two will do ordering but if the parameter has index, it will seek not scan all the table

4.Write the statement in WHERE than HAVING

1.

SELECT sale_date, SUM(quantity) FROM SalesHistory

GROUP BY sale_date

HAVING sale_date = ‘2010-10-2’;

2.

SELECT sale_date, SUM(quantity) FROM SalesHistory

WHERE sale_date= ‘2010-10-2’ GROUP BY sale_date

The second example (WHERE) has better performance. There are two reasons. Firstly, index works on where statement. Secondly, GROUP BY will do ordering and hashing executions. Hence we have to minimize the row to improve the performance of ordering. Index does not work on HAVING.

5. Reduce the intermediary table

Intermediary table needs lots of memory and it is hard to use index, therefore minimize the use of intermediary table is a key to improve table performance.

Use HAVING

SELECT * FROM(SELECT sale_date, MAX(quantity) AS max_qty FROM SalesHistroy

GROUP BY sale_date)TMP WHERE max_qty >= 10;

Do not build extra middle table, use Having

SELECT sale_date, MAX(quantity) FROM SalesHistory

GROUP BY sale_date HAVING MAX(quantity) >= 10

The above five tips are the key method to improve overall SQL performance, but the most important thing is to find the fundamental problem and then solve it.