Introduction

DRY (don’t repeat yourself) is a vital principle of professional programming practice In SQL queries, although it is common to see repeated expressions. This article investigates how we can use the SQL APPLY operator to minimize redundancy and make our queries DRYer and easier to maintain.

MCSE Training – Resources (Intense)

Repeated Expressions Example

A good example of repetition can be shown in the AdventureWorksDW2008R2 database. Consider the view [dbo].[USA]:

CREATE VIEW [dbo].[USA] 
AS
    SELECT 
        CASE [Model] 
            WHEN 'Mountain-100' THEN 'M200' 
            WHEN 'Road-150' THEN 'R250' 
            WHEN 'Road-650' THEN 'R750' 
            WHEN 'Touring-1000' THEN 'T1000' 
            ELSE Left([Model], 1) + Right([Model], 3) 
        END + ' ' + [Region] AS [ModelRegion] 
        ,(Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex] 
        ,Sum([Quantity]) AS [Quantity] 
        ,Sum([Amount]) AS [Amount]
		,CalendarYear
		,[Month]
		,[dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25)
		as ReportingDate
    FROM 
        [dbo].[vDMPrep] 
    WHERE 
        [Model] IN ('Mountain-100', 'Mountain-200', 'Road-150', 'Road-250', 
            'Road-650', 'Road-750', 'Touring-1000') 
    GROUP BY 
        CASE [Model] 
            WHEN 'Mountain-100' THEN 'M200' 
            WHEN 'Road-150' THEN 'R250' 
            WHEN 'Road-650' THEN 'R750' 
            WHEN 'Touring-1000' THEN 'T1000' 
            ELSE Left(Model,1) + Right(Model,3) 
        END + ' ' + [Region] 
        ,(Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month])
		,CalendarYear
		,[Month]
		,[dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25);

GO

In the query above, three expressions were repeated. They are given the aliases [ModelRegion], [TimeIndex], and [ReportingDate], respectively. These expressions are first seen in the SELECT list and later on in the GROUP BY. We’re going to eliminate this repetition through APPLY operator usage.

Using APPLY to Encapsulate Expressions

When you read about the APPLY operator in the official documentation, it is not immediately obvious how it can help us reduce repetition. [Check out (Using Apply) at https://msdn.microsoft.com/en-us/library/ms177634.aspx.]

Lots of examples show how to use the APPLY operator to invoke a table valued function for each row in the rowset. Be that as it may, since it is conceivable to use SELECT queries in the APPLY context, it is also possible to use an expression operating on columns in each row in the rowset.

See the simple example below:

SELECT dt, dt_year
FROM (VALUES (GetDate())) cur(dt)
CROSS APPLY (SELECT YEAR(cur.dt)) _(dt_year)

Result:

dt                         dt_year
2015-11-18 10:26:27.753    2015

You can see that the APPLY operator only contains an expression that uses a column in the rowset and aliases the result. The alias is then utilized in the main query. As an aside, observe the use of “_” as the alias for the APPLY rowset. This is helpful when you don’t care about the rowset alias, as in this case. It is a typical example used in lots of programming languages as well.

We can also use the results of one APPLY operation in a subsequent “APPLY”. The operation goes from top to bottom and left to right. This means that the results of one APPLY are not accessible until the operator appears in sequence. Let’s take an in-depth look at the example:

SELECT dt, dt_year, LeapYear
FROM (VALUES (GetDate())) cur(dt)
CROSS APPLY (SELECT YEAR(cur.dt)) _(dt_year)
CROSS APPLY (
    SELECT
        CASE WHEN dt_year % 4 = 0 AND dt_year % 400 <> 0
            THEN 'Leap Year'
            ELSE 'Not Leap Year'
        END
) _1(LeapYear)

This query produces:

dt                         dt_year    LeapYear
2015-11-18 11:48:27.607     2015    Not Leap Year

If we decide to reverse the sequence of the two APPLY clauses, the SQL query would rightly throw an error that “dt_year” can’t be found.

Making [dbo].[USA] DRYer

Armed with these tools, how can we simplify the AdventureWorks view? Let’s have a look at the three repeated expressions and build an APPLY clause to contain them:

    ...
    CROSS APPLY
    (
        SELECT
               CASE [Model]
                   WHEN 'Mountain-100' THEN 'M200'
                   WHEN 'Road-150' THEN 'R250'
                   WHEN 'Road-650' THEN 'R750'
                   WHEN 'Touring-1000' THEN 'T1000'
                   ELSE Left([Model], 1) + Right([Model], 3)
               END + ' ' + [Region] AS [ModelRegion]
             , (Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex]
             , [dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25) AS ReportingDate
    ) _
...

Much as in the example above, the APPLY clause just returns aliased expressions using columns in the rowset. In what way will this make the original query DRYer? Repetition wouldn’t be needed in these expressions!

SELECT
         [ModelRegion]
        ,[TimeIndex]
        ,Sum([Quantity]) AS [Quantity]
        ,Sum([Amount]) AS [Amount]
        ,[CalendarYear]
        ,[Month]
        ,[ReportingDate]
    FROM
        [dbo].[vDMPrep]

-- APPLY clause

...

    GROUP BY
         [ModelRegion]
        ,[TimeIndex]
        ,[CalendarYear]
        ,[Month]
        ,[ReportingDate]

Putting it all together, the view definition becomes:

CREATE VIEW [dbo].[USA]
AS
    SELECT
         [ModelRegion]
        ,[TimeIndex]
        ,Sum([Quantity]) AS [Quantity]
        ,Sum([Amount]) AS [Amount]
        ,[CalendarYear]
        ,[Month]
        ,[ReportingDate]
    FROM
        [dbo].[vDMPrep]
    CROSS APPLY
    (
        SELECT
               CASE [Model]
                   WHEN 'Mountain-100' THEN 'M200'
                   WHEN 'Road-150' THEN 'R250'
                   WHEN 'Road-650' THEN 'R750'
                   WHEN 'Touring-1000' THEN 'T1000'
                   ELSE Left([Model], 1) + Right([Model], 3)
               END + ' ' + [Region] AS [ModelRegion]
             , (Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex]
             , [dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25) AS ReportingDate
    ) _
    WHERE
        [Model] IN ('Mountain-100', 'Mountain-200', 'Road-150', 'Road-250',
            'Road-650', 'Road-750', 'Touring-1000')
    GROUP BY
         [ModelRegion]
        ,[TimeIndex]
        ,[CalendarYear]
        ,[Month]
        ,[ReportingDate]

GO

Now, if we want to change the expression we only have one place to change them. The query has become DRYer.

Summary

The ”APPLY” operator can be used to encapsulate and alias expressions that are repeated in our queries. This permits us to apply the fundamental truth of “DRYer” (don’t repeat yourself) and make our queries easier to maintain.