Throughout many years of working with BI solutions and data warehouse solutions, we have strived to put all the data in one location so it could be easily consumed by reporting and analysis tools from SQL Server Reporting Services to Microsoft Excel.We have followed the dimensional modeling processes promoted by Ralph Kimball and others. These techniques were developed to turn relational data platforms into viable and well-performing reporting platforms. They worked.
Throughout the years, I have built many star schemas, enterprise data warehouses, and reporting databases based on these techniques. However, they were not without their flaws. In the early days, it would take years to create the enterprise data warehouse. When it was done, the business had already moved on. So, we started creating data marts which were departmentally focused. This allowed us to shorten the development lifecycle to more quickly meet the needs of the business. But the speed of business continued to outpace IT’s ability to deliver effective BI solutions when they were needed.
During this same time, the worlds most ubiquitous BI tool, Microsoft Excel, ran more and more businesses, both large and small. Why was Excel so popular? It allowed the user, not IT, to do the analysis and produce results when the business needed them. Self-service BI is not new, we just refuse to accept Excel as a “real” BI tool in the industry. Inevitably, no matter how good your BI or reporting tool is, users want to know, “How do I export this to Excel?” I found it humorous that Microsoft suffered the same problem with Power BI. The ability to extract the data is hugely important to users. Why? So they could do this on their own. They feel empowered with Excel.
In today’s world, we are alsowitnessinga shiftto a more mobile, tech savvy group of users. As my teenage and young adult children begin to enter the workforce, I still see a disconnect with enterprise BI solutions in most cases. They want the data at their fingertips and easily consumable to solve the question they have now, not in three weeks. That brings us to the architecture I have been promoting for a few years. It’s not new, but I needed a way to talk about modern data and BI solutionsthat focused on one the most significant needs in the business consumable data .
The Consumption Based Architecture is based on the following key concepts:
Keep the data close to the source Data interfaces should be easy to use Modern, in-memory tools make this possible Keep the data close to the source In the normal enterprise data warehouse solution, we process data to clean it up, reshape it, and generally make it “better”. However, the reality is that the users don’t see it as
better if it does not match the source. The only “source of truth” that matters in reality is the one that is closes to the data entry as possible. If that data is wrong, it needs to be corrected there. Why? Because users will always verify that the data warehouse or any BI solution is correct by checking the system of record.
Besides the source of truth issue, any time data is transformed or shaped differently than the source, documentation and maintenance are required. In most cases, documentation is lacking in data warehouse solutions. It’s hard and changes often. Furthermore, the amount of maintenance required to support a traditional data warehouse solution is a problem. When the business needs to change a field or the source changes, how do you measure how long it will take to get that in the data warehouse? Days? Weeks? Months? Years? By the time we have the change in place, tested and ready for use, the business has moved on.
In Consumption Based Architecture, the goal is to move the data as needed and transform it as little as possible. Operational Data Stores which are replicas or copies of the data from the source systems are the best mechanisms to move the data to a reporting area. Only transform if absolutely necessary. In most cases, our transforms will involve data type cleanup or other in cases where fields have changed use over time maybe adding a column to clarify the data. Each time you move data or transform data, it should be to make the data more consumable.
The primary exception to this rule is dimensional data. Dimensional data has a special place in the architecture. While it is not necessary to transform this data, the overall solution is improved by combining similar dimensions and using natural keys for relationships. For example, if you have a customer record in CRM and shipping data, you can create a “golden record” or conformed dimension which contains both natural keys. This will allow you to cross reference data easily in both solutions. Master data solutions help with this but are not required to be successful in this architecture.
Data Interfaces Should Be EasyHaving done a lot of work in the application development field where interfaces are used to simplify programming, I think we should have a similar concept in working with data. We have dabbled in the area for some time. I know that I have used views with schemas (or users in Oracle) that were specifically designed to support SQL Server Analysis Services. We used views to support a consistent data set to the cubes for processing and allow changes in the back end data as needed. The views operated as an interface between the data warehouse and the cube.
In a similar fashion, these interfaces need to be identified and used throughout the Consumption Based Architecture. If you are referencing a relational data structure, views continue to make sense. They allow consumers to interact with the data in a known fashion. They are also logical constructs which can be deprecated with a timeline for users to move off of them to the newer versions.
Analytic modelssuch as those found in Power BI, Qlik, and SQL Server Analysis Services create a similar tool for consumers. Usually those models can be used in the tool itself or even in other tools to produce reports and dashboards. Depending on how the model is
created, they are often a table based view of data. For instance, Power BI can turn a folder of files into a table structure for easy consumption in Power BI reports. Power Pivot models created in Excel can be shared in SharePoint and in Power BI. These are just a few examples of using analytic models as interfaces.
The third option is data virtualization . This tends to be fairly expensive. The two that I am aware of are Cisco Data Virtualization and RedHat OpenShift virtualization. The concept of data virtualization is perfect for Consumption Based Architecture; however, I have not seen this used much due to cost implications. Microsoft may be changing the landscape of data visualization. During PASS Summit 2016, they announced expanding the use of Polybase to reference other data sources such as Oracle and Teradata.Currently, Polybase supportsHadoop data, but this change could allowit to become a virtualization tool that is cost effective (included with SQL Server) and simple to use. I will be reviewing these tools in a later post as I get more information on them.
Modern, In-Memory ToolsThe most significant technological improvement in the past few years to make this architecture really possible is in-memory data tools. While this revolution has occurred in more technologies than Microsoft, Microsoft tools are what I am most familiar with. In the Microsoft arena, this started with Power Pivot in Excel. The ability to mashup various data sources using in-memory models is awesome.
WhenPower Pivot came on the scene, I was doing a lot of work with SQL S