Introduction
DRY (don’t repeat yourself) is a vital principle of professional programming practice In SQL queries, although it is common to see repeated expressions. This article investigates how we can use the SQL APPLY operator to minimize redundancy and make our queries DRYer and easier to maintain.
MCSE Training – Resources (Intense)
Repeated Expressions Example
A good example of repetition can be shown in the AdventureWorksDW2008R2 database. Consider the view [dbo].[USA]:
CREATE VIEW [dbo].[USA] AS SELECT CASE [Model] WHEN 'Mountain-100' THEN 'M200' WHEN 'Road-150' THEN 'R250' WHEN 'Road-650' THEN 'R750' WHEN 'Touring-1000' THEN 'T1000' ELSE Left([Model], 1) + Right([Model], 3) END + ' ' + [Region] AS [ModelRegion] ,(Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex] ,Sum([Quantity]) AS [Quantity] ,Sum([Amount]) AS [Amount] ,CalendarYear ,[Month] ,[dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25) as ReportingDate FROM [dbo].[vDMPrep] WHERE [Model] IN ('Mountain-100', 'Mountain-200', 'Road-150', 'Road-250', 'Road-650', 'Road-750', 'Touring-1000') GROUP BY CASE [Model] WHEN 'Mountain-100' THEN 'M200' WHEN 'Road-150' THEN 'R250' WHEN 'Road-650' THEN 'R750' WHEN 'Touring-1000' THEN 'T1000' ELSE Left(Model,1) + Right(Model,3) END + ' ' + [Region] ,(Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) ,CalendarYear ,[Month] ,[dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25); GO
In the query above, three expressions were repeated. They are given the aliases [ModelRegion], [TimeIndex], and [ReportingDate], respectively. These expressions are first seen in the SELECT list and later on in the GROUP BY. We’re going to eliminate this repetition through APPLY operator usage.
Using APPLY to Encapsulate Expressions
When you read about the APPLY operator in the official documentation, it is not immediately obvious how it can help us reduce repetition. [Check out (Using Apply) at https://msdn.microsoft.com/en-us/library/ms177634.aspx.]
Lots of examples show how to use the APPLY operator to invoke a table valued function for each row in the rowset. Be that as it may, since it is conceivable to use SELECT queries in the APPLY context, it is also possible to use an expression operating on columns in each row in the rowset.
See the simple example below:
SELECT dt, dt_year FROM (VALUES (GetDate())) cur(dt) CROSS APPLY (SELECT YEAR(cur.dt)) _(dt_year)
Result:
dt dt_year 2015-11-18 10:26:27.753 2015
You can see that the APPLY operator only contains an expression that uses a column in the rowset and aliases the result. The alias is then utilized in the main query. As an aside, observe the use of “_” as the alias for the APPLY rowset. This is helpful when you don’t care about the rowset alias, as in this case. It is a typical example used in lots of programming languages as well.
We can also use the results of one APPLY operation in a subsequent “APPLY”. The operation goes from top to bottom and left to right. This means that the results of one APPLY are not accessible until the operator appears in sequence. Let’s take an in-depth look at the example:
SELECT dt, dt_year, LeapYear FROM (VALUES (GetDate())) cur(dt) CROSS APPLY (SELECT YEAR(cur.dt)) _(dt_year) CROSS APPLY ( SELECT CASE WHEN dt_year % 4 = 0 AND dt_year % 400 <> 0 THEN 'Leap Year' ELSE 'Not Leap Year' END ) _1(LeapYear)
This query produces:
dt dt_year LeapYear 2015-11-18 11:48:27.607 2015 Not Leap Year
If we decide to reverse the sequence of the two APPLY clauses, the SQL query would rightly throw an error that “dt_year” can’t be found.
Making [dbo].[USA] DRYer
Armed with these tools, how can we simplify the AdventureWorks view? Let’s have a look at the three repeated expressions and build an APPLY clause to contain them:
... CROSS APPLY ( SELECT CASE [Model] WHEN 'Mountain-100' THEN 'M200' WHEN 'Road-150' THEN 'R250' WHEN 'Road-650' THEN 'R750' WHEN 'Touring-1000' THEN 'T1000' ELSE Left([Model], 1) + Right([Model], 3) END + ' ' + [Region] AS [ModelRegion] , (Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex] , [dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25) AS ReportingDate ) _ ...
Much as in the example above, the APPLY clause just returns aliased expressions using columns in the rowset. In what way will this make the original query DRYer? Repetition wouldn’t be needed in these expressions!
SELECT [ModelRegion] ,[TimeIndex] ,Sum([Quantity]) AS [Quantity] ,Sum([Amount]) AS [Amount] ,[CalendarYear] ,[Month] ,[ReportingDate] FROM [dbo].[vDMPrep] -- APPLY clause ... GROUP BY [ModelRegion] ,[TimeIndex] ,[CalendarYear] ,[Month] ,[ReportingDate]
Putting it all together, the view definition becomes:
CREATE VIEW [dbo].[USA] AS SELECT [ModelRegion] ,[TimeIndex] ,Sum([Quantity]) AS [Quantity] ,Sum([Amount]) AS [Amount] ,[CalendarYear] ,[Month] ,[ReportingDate] FROM [dbo].[vDMPrep] CROSS APPLY ( SELECT CASE [Model] WHEN 'Mountain-100' THEN 'M200' WHEN 'Road-150' THEN 'R250' WHEN 'Road-650' THEN 'R750' WHEN 'Touring-1000' THEN 'T1000' ELSE Left([Model], 1) + Right([Model], 3) END + ' ' + [Region] AS [ModelRegion] , (Convert(Integer, [CalendarYear]) * 100) + Convert(Integer, [Month]) AS [TimeIndex] , [dbo].[udfBuildISO8601Date] ([CalendarYear], [Month], 25) AS ReportingDate ) _ WHERE [Model] IN ('Mountain-100', 'Mountain-200', 'Road-150', 'Road-250', 'Road-650', 'Road-750', 'Touring-1000') GROUP BY [ModelRegion] ,[TimeIndex] ,[CalendarYear] ,[Month] ,[ReportingDate] GO
Now, if we want to change the expression we only have one place to change them. The query has become DRYer.
Summary
The ”APPLY” operator can be used to encapsulate and alias expressions that are repeated in our queries. This permits us to apply the fundamental truth of “DRYer” (don’t repeat yourself) and make our queries easier to maintain.