Five years ago from now – it is 2014 at the time of writing – I started my career in the fascinating world of business intelligence. I started using the Microsoft business intelligence stack of SQL Server 2005, which at that time basically entailed Integration Services (SSIS), Analysis Services (SSAS) and Reporting Services (SSRS). A lot has changed since then: the BI stack has grown considerably, not only in features but also in additional products, and there were a few hypes along the way, such as the cloud, big data and data science, which might change the way how we provide value to the business.
In this article, I’ll give my opinion on what it takes to be a successful BI developer. This opinion is formed by my personal experience of dealing with the Microsoft business intelligence stack, so of course your mileage may vary.
PMP Training – Resources (Intense)
If you are a BI developer in 2009, in 2014 or even in the near future, there are certain skills that you must have (or grow if you lack expertise in them) in order to succeed in your career. You won’t get far without them and I believe this will not go away.
Feeling of the Business
I believe this is one of the most important requirements of a BI developer: the ability to converse and understand business people. It is one of the reasons I like business intelligence so much: it is not all about technical skills, but also about soft skills. You need to be able to conduct interviews and gather the needs, motives and desired outcomes for the BI environment you are going to build. You cannot talk in technical terms, you need to be able to talk with them on their own level. It’s great if you can write a clever SQL script that can pump millions of rows into a data warehouse in a few minutes, but if you have no clue why you are doing this…
Of course it helps if you have a certain amount of analytical insight. When asking what a person wants in a report or dashboard you are going to build and what kind of information they want to retrieve from the data warehouse, most likely they will just want whatever they need to do their current tasks better. If you can think outside the box and give them more value than they ask have asked for, that’s when you can truly shine. For example, you can include additional dimensions attributes so that a more detailed analysis is possible, or you can set up conformed dimensions between different data marts to enable drilling across fact tables. This opens up analysis that transcends the boundaries of the different departments and brings your BI environment to the enterprise level.
Data Warehouse Design
I mentioned a few terms about data warehousing in the previous paragraph and in my opinion, a data warehouse is still the cornerstone of a successful BI project. There are different methodologies for building a data warehouse: the Corporate Information Factory by Bill Inmon, the dimensional data warehouse by Ralph Kimball, data vault modelling, agile data warehouse modelling and so on. Through my career so far, I most often come across data warehouses built according to the principles of Kimball. This approach is usually the easiest to understand and to explain to business users, it aligns closely how cubes are built in Analysis Services and it gives fairly quickly a good return on investment. I would recommend the following books to get a good grip of Kimball and dimensional modelling:
After you’ve walked a few miles in the BI realm, you can expand your expertise to other methodologies as well of course. Agile data warehouse methodologies have grown considerably in popularity since 2009, since they are more resilient to changes and allow better incremental development cycles. It might be useful to not leave them aside for too long.
Chances are very high your data warehouse will be stored in a database and I expect that this will be true for a very long time. So it might be beneficial if you know how to work with databases. You don’t have to be a world-class DBA (database administrator), but it helps that when the DBA asks you what kind of backup schedule and recovery model you want for your data warehouse, you are able to give them an answer. Furthermore, knowing the database engine allows you to leverage its strengths for your BI project. I’m talking for example about using the simple recovery model, using columnstore indexes, applying row/page compression, knowing how to create efficient indexes et cetera.
Setting up clusters or Availability Groups probably doesn’t belong to your job requirements, but having a notion of what is out there certainly won’t harm you. When you try to achieve the MCSE business intelligence certification, there is also an exam on administrating a database. If you are able to pass this exam, you know well enough about all the DBA topics that you might encounter.
I’ll also include writing SQL scripts in this section. This is even more important than knowing how to take a backup. It’s very likely that a lot of SQL is involved in the ETL process. Some even argue to do almost everything in SQL, resulting in an ELT process: extract the data, load in into the database, and transform it with SQL. Although SSIS has some extraordinary tasks and components, some transformations are more efficient when performed by the database engine, such as grouping, sorting and updating rows. In short, the ability to write efficient SQL scripts should be in the skill set of every BI developer.
Although not used in every project, Reporting Services reports can be integrated into SharePoint libraries. There is not much skill needed to do this, but experience has thought me that SharePoint admins who have never set-up a business intelligence site sometimes need assistance in the configuration. Especially setting up Power Pivot integration can be challenging. You don’t need to be a certified SharePoint admin or developer, but knowing your way around can certainly be helpful.
Last but not least, a business intelligence developer should know its own BI stack of course. I’m not saying you should know all three products in the greatest detail possible, but you should have at least a basic understanding and working knowledge of all three. This can be too overwhelming when you just start out in the field. In this case, I advice to start with a specialization in just one of them: some people start out as report developers, others as ETL developers. As you gain experience, you can deepen your understanding of the other products.
An additional skill that might be useful is being able to write some rudimentary .NET code. This allows you to go beyond the limits of SSIS in the script tasks and script components.
All Those Choices…
In 2009, life was simple. You just had to know the trinity of BI products – SSIS, SSAS and SSRS – and you were a rock solid Microsoft BI developer. Everything was black and white. ETL? Integration Services. OLAP cubes? Analysis Services. Reporting? SSRS or maybe Pivot Tables in Excel. Even SharePoint integration was just in its diapers. But over the years, a lot of choices have been introduced that complicate the entire BI project lifecycle. Although it is difficult to have great experience in all of the possible options, as a developer, you need to be informed about them. Let’s take a look.
On Premises versus Cloud
When starting with a fresh project, one of the things to be taken into account is the hardware: are you going to use your own server or if you don’t have them buy some server, or are you going to move everything into the cloud. With Windows Azure (soon to be called Microsoft Azure) you can for example easily set-up a virtual machine that is just the same as an on premise server. Recently Microsoft also launched Power BI, a self-service BI offering rooted in Excel 2013 and Office 365, which takes BI in the cloud to a new level. This brings us to the next topic.
Enterprise BI versus Self-Service BI
Enterprise BI, also called traditional BI or corporate BI, is usually understood as having a data warehouse with on top a reporting environment, consisting of either Excel with Pivot Tables and dashboards or of a set of Reporting Services Reports. In some cases, dashboards in PerformancePoint are also considered, although they require SharePoint integration.
Self-service BI is the cool new kid on the block. It comes in the form of Power BI, which consists of a few add-ons in Excel:
* Power Query (codename Data Explorer): an intuitive self-service ETL tool, although its capabilities are far bigger than just ETL.
* Power Pivot: a powerful data modelling tool that deals with tabular data. Because of its columnar structure and in-memory capabilities, this tool is able to deal with a large number of rows. This tool was already introduced in Excel 2010 and stood at the start of Microsoft’s self-service BI offering.
* Power View: an easy-to-use dashboarding tool, allowing end users to create interactive dashboards with just a few clicks. No scripts or code needs to be written, which is the strength of this tool.
* Power Map (codename GeoFlow): a visualization tool allowing you to tell stories with geographical data.
These add-ons can function perfectly in their native habitat in Excel, but they can be shared to the Power BI center in Office 365. There you have additional functionalities, such as the Q&A feature. Power Pivot and Power View also have good integration into the SharePoint libraries.
Report Builder can also be considered a self-service reporting tool, but since writing queries is sometimes still required, this tool is only fit for the real power users.
But actually it isn’t a black-and white choice between self-service and enterprise BI. Self-service BI has the benefits of rapid development and enterprise BI has more control and more predictable performance. In reality, both offerings can go hand in hand in a comprehensive BI offering. Usually self-service BI can really thrive if it lays on top of a well-defined enterprise data warehouse.
As a BI developer, you obviously need to be skilled in enterprise BI. You also need to be on top of self-service BI, although you might not actually do the development yourself. You might be asked to set up and configure the environment and train/coach the end users in the use of the different tools.
The duality of the BI stack
Inside the core Microsoft BI stack (SSIS, SSAS and SSRS), you are confronted with even more choices:
* SSIS: when using SSIS 2012 or higher, you need to choose between the project deployment model and the package deployment model. The last one is used in earlier versions of SSIS. Given the additional benefits and features of the project deployment model, this choice shouldn’t be too hard to make.
* SSAS: you either use the traditional Multidimensional model to develop cubes, or you go with the new Tabular model. Both have their own scripting language: MDX and DAX (which is also used in Power Pivot). For more information: LINK.
* SSRS: although developing reports hasn’t changed much since SQL Server 2008R2, you still need to decide if you are going to use Native integration (and display reports through Report Manager or through a custom framework) or if you will use SharePoint integration.
All those hypes…
Over the past 5 years, a number of “hypes” went viral. The first one was “the Cloud”, which I already discussed in a previous paragraph. Another one is Big Data, where the difficulty of the data is not necessarily defined by its volume (despite the name), but also by other factors such as variety and velocity. A more recent one is data science, which is in fact more of a rebranding of older subjects, such as machine learning, data mining, text analysis et cetera.
Big Data and Data Science can bring considerable value to the enterprise, if tackled correctly. However, they are more advanced subject areas which require significantly more expertise and skills. It’s possible the average BI developer will never come in contact with either of these subjects, but nonetheless they are very interesting to delve into.
Since 2009, the Microsoft BI stack has grown considerably in products and features. A direct consequence is the plethora of languages available: SQL for relational databases, MDX for querying Multidimensional cubes, DAX for Power Pivot and Tabular models, DMX for data mining in SSAS, VB.NET for custom code in Reporting Services, C# .NET for scripting inside SSIS packages, the SSIS expression language and for the advanced users there is also the language M inside Power Query.
A business intelligence developer starting in the field has a bigger learning curve to deal with if they wish to achieve deep expertise in the SQ Server BI stack. There are also more design decisions to be made during a typical BI project. And if you want to achieve the MCSE Business Intelligence certification, you need to know almost everything mentioned in this article…