A lot of books about integration services (SSIS), Microsoft’s ETL tool that comes along with SQL Server, deal with how the product works. What are the different components, how do you configure those tasks, how does logging work, and so on. However, most books don’t go in depth on how you can actually use the product for certain circumstances and what you can do with it. One book that was a bit of an exception was Microsoft SQL Server 2008 Integration Services: Problem, Design, Solution. Each chapter started with a certain problem, described how you could solve that problem and then showed the implementation with SSIS. This is a great book that does not describe how SSIS works, but how you can use it to solve problems. However, this book is already a few years old and it has not been updated to more recent releases of SSIS.
MCSE Training – Resources (Intense)
That’s why I was thrilled when I learnt that the book, SQL Server 2012 Integration Services Design Patterns, was released. This book introduces different design patterns in 20 chapters, where each pattern usually describes methods or best practices for dealing with a particular situation. In contrast with Problem, Design, Solution, this book isn’t all about solving problems but more about giving the SSIS developer a design toolbox for tackling data warehouse and BI projects. It explains how problems should be solved in certain ways.
About the Authors
Andy Leonard is a well-known SSIS author, teacher and consultant who frequently gives courses about SSIS design patterns. Matt Masson is the “face” of the SSIS development team at Microsoft and has given plenty of sessions about SSIS. Tim Mitchell, Jessica M. Moss and Michelle Ufford, are also SQL Server community enthusiasts and are frequent volunteers for PASS, the professional association for SQL Server. It suffices to say that an all-star team of SSIS experts is behind this book.
Let’s take a look what’s inside the book.
The first chapter presents a pattern using metadata collection. Here an SSIS package iterates over different servers and it collects various types of metadata from system views—such as data and log file sizes—and stores them in a central repository. This pattern is most likely useful for administrators who want to automate this type of information gathering. The chapter has ready-to-use scripts available to get you started.
The second pattern is about executing SSIS packages. The chapter gives an overview of all the different manners in which packages can be executed, from the debugging environment to executing stored procedures in the SSISDB catalog. Again, this chapter has a lot of useful code inside it, such as, for example, T-SQL code to execute packages in the catalog and how to add a data tap using stored procedures. The chapter goes a step further and provides an entire execution framework. This framework will use a generic master package to get execution details about child packages from a table and then it will execute those packages. Next some scheduling methods are discussed and the chapter ends with another execution framework, but this time using .NET. Again, all of the code is provided so you can easily try it out yourself.
The third chapter deals with scripting. Unlike the two previous chapters, this pattern doesn’t provide a lot of code to solve a specific problem, but rather deals with how you can script in SSIS and how it works and it provides a few how-to topics. It also gives a few recommendations and best practices. The fourth chapter, which is about source patterns, continues in the same trend and talks about a few of the different connection managers available. The chapter mainly deals with connecting to SQL Server and hands out a few tips—such as when to use T-SQL and when to use the data flow—and a few caveats, for example, on why you should only select the columns you need in the query.
Chapter 5 deals with Data Quality Services—a service introduced in SQL Server 2012—and the collaboration of DQS with SSIS. Most of this chapter is pretty much a how-to of DQS and how SSIS can leverage its services, but again there are some packages explained that might turn up useful. Given the fact that DQS is a new service, it is nice to have some reference material on it.
The next chapter was personally the least useful to me, as the pattern is about connecting to DB2 databases. I can imagine, though, that there is a lot of quality information and tips in this chapter for people who actually use DB2 and who want to use SSIS to get data out of it. Chapter 8 falls in the same boat, as it’s all about Parallel Data Warehouse. There are not a lot of people out there who have the opportunity to work with this product and I’m not one of them, so I just skipped this chapter. If I ever get the chance to work with PDW, I’m sure to revisit it, though.
Flat file sources are the subject of the seventh chapter and there is a ton of information to be found. The pattern deals with strongly typing data, how to deal with variable length rows, and how to get around headers and footers, and it ends with an archive pattern with a lot of example code.
Everyone’s favorite subject is treated in chapter 9: reading XML files! (Excuse me for the sarcasm.) The patterns explains a few interesting possibilities aside the built-in XML source (which is to be avoided for complex XML): XSLT is discussed and a few implementations in .NET are illustrated. These implementations use either XMLSerializer, XMLReader or LINQ to XML. As usual, there are a lot of code scripts available, so this pattern can be really useful for people taking their first steps with XML and SSIS.
The SSIS expression language is explored in chapter 10 and a variety of examples are given of where you can use SSIS expressions. An interesting chapter, but more seasoned SSIS developers should already have those tricks up their sleeve. For novices in SSIS, this chapter can be extremely useful because it offers some caveats and warns of shortcomings in the expression language.
Chapter 11 is quite interesting, as it deals with various data warehousing patterns. It introduces the incremental load pattern and discusses various options: native SSIS, SCD wizard, the MERGE statement, and CDC components. There are also some patterns about handling data errors and the chapter finishes off with a discussion of ETL workflows. Chapter 13 goes into more detail about slowly changing dimensions (I have written about them here and here). There is some overlap between the two articles, though.
Chapter 12 describes the options for logging in to SSIS 2012: logging through the package itself, using the built-in catalog logging, or building your own custom logging. Strangely enough, Chapter 15, which is titled “Logging and Reporting Patterns,” is again about logging but now just introduces some queries and reports on top of them. Those two chapters have thus some considerable overlap and should have been combined into one single chapter.
The cloud is everywhere nowadays, so Chapter 14 shortly describes how you can load data to SQL Azure using SSIS. Nothing groundbreaking here, you can just use the standard connection managers in SSIS. The pattern introduces a slightly changed incremental load pattern in order to minimize the number of rows read.
The dynamic between parent and child packages is treated in chapter 16. There are a few changes in SSIS 2012 in how you can execute child packages, so I recommend that people new to SSIS 2012 should read this chapter. The patterns here describe how you can pass information from a master package to a child package and vice versa. Also a generic master package is described, in which one master package loops over the different child packages.
Chapter 17 is the must-read chapter of this book. It introduces BIML—Business Intelligence Markup Language—and it is basically an XML language to generate SSIS packages using metadata. BIML can be extremely powerful and can shorten ETL development lifecycles enormously. The chapter provides some BIML scripts to get you started, but it doesn’t go into the depths of the capabilities of BIML. This is probably because BIML was quite new when the book was written. If you want to know more about BIML, be sure to check out BIMLScript.com.
Chapters 18 and 19 deal with the most powerful new feature of SSIS 2012: the project deployment model. Chapter 18 talks about configurations with parameters and chapter 19 about all the different deployment options. Recommended reading material for people new to SSIS 2012.
Chapter 20 is a bit of an odd chapter because it is not technical; it discusses the estimation of ETL projects. At the end of the book there is a rather large appendix explaining an extensive SSIS framework (for the die-hard fans).
Personally I liked this book, but I liked Problem, Design, Solution better. However, I read the PDS book when I was still starting out with SSIS, so it’s natural that book made a bigger impact on me. I read SSIS Design Patterns when I already had a couple of years’ experience with SSIS. This is a bit of a downside of the book: Readers with a lot of experience will probably know most of the patterns described in this book. Either they developed them over the years, or they read a blog post about or saw it in a technical presentation. It’s possible a lot of the scripts in this book are already in the toolbox of quite a lot of SSIS developers. On the other hand, this book is excellent for people starting out with SSIS or developers with just a bit of experience with SSIS. Also, I still picked up a few tips and tricks here and there and there are certainly some scripts from the book that will be useful to me.
If you read this book, you can pick out the chapters that interest you and read them in any order you want. This book is more a reference work meant to inspire you, instead of being a front-to-back book.
This book will certainly be an asset for starting SSIS developers. More experienced developers might think this book lacks depth, but even those people can find some interesting pieces in the book. My favorite chapters were Chapter 2 (execution patterns), Chapter 17 (BIML, absolutely recommended) and Chapter 19 (deployments). A second edition is planned for the end of 2014.