Simple SQL Optimization Tips and Tricks

Simple SQL Optimization Tricks

Here are some SQL optimizations that are meant to be simple to implement and not compromise the overall code readability while giving you a performance boost.

1. UNION ALL vs. UNION

Both operators are used to combine the result sets of two or more SELECT statements. The difference is that UNION removes the duplicate records by performing a DISTINCT operation on the result set, whereas UNION ALL does not remove duplicates and is, therefore, faster than UNION.

Solution: Use UNION ALL instead of UNION when unioned results are mutually exclusive or when duplicates are not a problem.

2. Using leading wildcards when filtering

When you use leading wildcards in a query, e.g., WHERE name like ‘%Miles%’, the query is not sargable – which is a fancy way of saying that a query cannot take advantage of an index to speed up its execution. 

If, on the other hand, you replace the leading wildcard with a trailing wildcard like the following: WHERE name like ‘Miles%’, your query will be able to leverage any of the indexes you have defined on the column when performing the search – the query will be sargable.

Solution: Replace ‘%string%’ with ‘string%’ where possible.  

3. Using OR in the JOIN predicate vs. UNION ALL

The OR operator is an expensive database used to chain multiple JOIN conditions—especially when used in the JOIN clause. One way to boost the query performance is to split the JOIN predicates into individual queries and use UNION to combine the result sets. 

Using OR:

SELECT p.name

FROM dim_date d

INNER JOIN fact_salesorder f ON p.dim_product = f.dim_productid OR p.market = f.store_market

Using UNION ALL:

SELECT p.name

FROM dim_product p

INNER JOIN fact_salesorder f ON p.dim_productid = f.dim_productid

UNION ALL

SELECT p.name

FROM dim_product p

INNER JOIN fact_salesorder f ON p.market = f.store_market AND p.dim_productid != f.dim_productid

Solution: Split conditions chained with OR in the JOIN predicate into multiple queries combined with UNION ALL.

4. Using ANY_VALUE(column) vs. GROUP BY column

When writing analytical queries, you will often use the GROUP BY clause to report metrics qualified by the selected attributes you want to report on. 

But sometimes, it doesn’t make sense to add all the attributes in the GROUP BY - they are not grouping data at a higher granularity but are only used for display purposes alongside the other attributes and metrics. 

For example, if you want to see the total sales by customer name and id, we might write the following query:

SELECT c.dim_customerid, c.name, sum(f.sales_amount)

FROM dim_customer c 

INNER JOIN fact_salesorder f ON c.dim_customerid = f.dim_customerid

GROUP BY c.dim_customerid, c.name;

Suppose we know that each dim_customerid corresponds to one single customer name. In that case, we can safely remove the name attribute from the GROUP BY and instead select one single value from the group using the ANY_VALUE aggregate function in the select clause:

SELECT c.dim_customerid, any(c.name), sum(f.sales_amount)

FROM dim_customer c 

INNER JOIN fact_salesorder f ON c.dim_customerid = f.dim_customerid

GROUP BY c.dim_customerid; 

5. Using indexes to speed up the query performance.

Indexes are the primary way for users to accelerate query performance. While indexes are common with Online Transaction Processing applications, they are used sparingly with big data. If you want to find out more on how to use indexes with a data warehouse, check out our blog on Indexes in Action.