SQL Business Intelligence Tools

Source

(c) Kevin Languedoc (klanguedoc)

Microsoft in recent years has been on a technological buying spree and has invested heavily in Business Intelligence tools like SSAS, SSIS, SSRS, PowerPivot and Excel using its server SQL Server as a cornerstone of that strategy to gain a foothold on the business intelligence industry; one of the fastest growing segments of information technology business industry. To remain competitive as the industry has rapidly evolved at a meteoric rate going from simple SQL based reporting and data warehouses to predictive analytics, OLAP servers, statistical modelling and machine-learning data mining.

Business Intelligence is about gaining insight or intelligence in business, although the term is fairly new, the concept of extracting knowledge from data dates back to the early days of computing. In recent times there has been an explosion of new technology not to mention all the consolidation in the market as many vendors try to capture their own share of the market and either try surpass the competition or at least keep abreast with the rapid pace and thirst of our collective need for knowledge.

Microsoft has focused its Business Intelligence efforts around its SQL Server technology, its Visual Studio client framework, SQL Server Management Studio, Excel and Sharepoint Server.

SQL Server Analysis Services (SSAS)

SSAS is a technology that resides in SQL Server and must be installed when the server is installed or afterwards from the SQL Server Installation program. It is one of four services provided by SQL Server. The others being the Database Engine, SQL Server Integration Services and the SQL Server Reporting Services.

SSAS also provides technologies that use the Visual Studio framework (or shell) and along with the other mentioned services are collectively known as the Visual Studio Business Intelligence Studio. These project templates are also included in the standard Visual Studio if you have version professional or above (Ultimate, Premium or Architect).

SSAS services involves creating and managing cubes, which are three dimensional blocks of data. In a standard SQL query, data is returned based on the selected columns and the WHERE clause. In contrast, a cube query defines a hierarchical data model and can contain both rows and columns.

You define a cube using one of the SSAS templates in the Visual Studio Business Intelligence Studio. A screenshot is provided below. SSAS, in essence, takes as input a flat table, as opposed to a relational table once you have defined a Data Source.. A relational table can also be used except that your cube will incur a performance hit because there will be extra lookups to perform in order (using the keys) to get the required data. I prefer using a flat table, which contains the actual data and no primary or foreign keys. It is easier to build your hierarchies then using a relationship table.

Building a cube
Building a cube is fairly straightforward process. First you define your Data Connection to the data, which should reside in a data warehouse or a data mart. A warehouse contains data from multiple sources that can be from flat files (text files), spreadsheets, the Internet, other databases. A data mart contains data from a single source which is usually a database.

Once the connection to the data is defined, the next step would be to define your Data Sources. This consists of table and views, an aggregate table or data from a query. These Data Source Views can be arranged visually as the provided screenshot denotes below and new custom views can be created from other data.

Once this step is completed, you can define your Facts, Dimensions, Measures, Attributes and Metrics.

Term
Definition
FACT
The Fact represent the business process or event in the model. This entails the tables and views that were used to create the Star or Snowflake schema. Fact usually contain keys to the Dimension tables.
Dimension
The Dimensions is where you define or arrange your data. In fact the Dimensions are closely related to the tables and views that are used in the Fact.
Attributes
Dimensions also have Attributes that describe the data in the Dimensions.
Metric
Finally the Metrics is the measure or aggregates like the SUM or AVG. Metrics are numerical values.

These steps define the cube which must be stored in a multi-dimensional database which is accessible from the SQL Server Analysis Services in the SQL Server Management Studio, a separate SQL Server management tool that is also installed, if selected, when the server is setup.

SSAS Editor
SSAS Editor | Source

SSIS (SQL Server Integration Services)

These services are basically what is called an ETL (Extract-Transform-Load). This technology provides the tools to connect to heterogeneous data sources which can include other databases from other vendors, ERP and MRP systems, flat files, spreadsheets, web services or any other type of data repository or data service.

The Extract functionality involves connecting to external systems and transferring the data to the calling system. The Transform contains technologies to transform the data to suit the requirements of the data warehouse which also includes cleaning the data. Finally the Load actually writes the data to the data warehouse or DW.

The following screenshots provide examples of the two main screens in the Microsoft Business Intelligence Studio: Data Flows and Control Flows.

Data Flows
The Data Flows allow a developer to define the source and destination of the data. You build Data Flows by dragging objects from the toolbox onto the Designer where they are connected to one another, thus creating connections and flows. Many tasks are available being like able to do joins like in SQL or to merge data using the Merge Task.

Control Flows
On the other hand, Control Flows allow you to manipulate and manage the data flow process as it is being imported into the system. For example you can define an Execute task to start the flow process, or to manage and combine different packages which contain different data flows and data profiles. The Control flow also allows you to define the workflow and add notifications, like e-mails to indicate if there are issues with the data flow or to provide a status on the data flows.

Data Flow Editor
Data Flow Editor | Source
Control Flow
Control Flow | Source

SSRS (SQL Server Reporting Services)


SSRS, which stands for SQL Server Reporting Services, is a set of technologies to allow a developer to developer reports in a visually interactive designer. The process involves defining a Data Source, which is usually a database or data warehouse. Next you would define the data model consisting of tables, views, or a query to define the data to be reported on. Then you could create the style of the report like either a matrix or a tabular style using the integrated Report Builder where you can define the rows, headers, details.

In addition, a developer can add several windows based or ASP.net controls to the report, like charts and graphs to summarize the data or to add more visually appealing presentation of the data.

SQL Server includes a Reporting Server that must be configured during the installation or afterwards if the service wasn't installed during installation. Although you can run the installation program and add this service afterwards. Once the SQL Server Report Server is setup and is configured in the Report Designer which is part of the Visual Studio Business Intelligence Studio, you can publish the web based report to the server. Alternatively the reports can be publish to a Sharepoint server if it has been configured to store reports.

Report Editor
Report Editor | Source

MDX (Multidimensional Extensions)

MDX is a query language that has been developed by Microsoft but that has gained widespread acceptance with most leading database vendors like Oracle and IBM. MDX stands for MultiDimensional Expressions and was developed to be able to query OLAP databases like those defined using SSAS.

MDX has a syntax that is very similar to SQL (Transact-SQL) but can use hierarchical data and use rows as well as columns to perform queries. The results are displayed in a matrix style format. MDX is a very sophisticated language and would require very complex and lopsided SQL queries to emulate the elegant style of a few lines of a MDX query.

MDX acts on the dimensions, its members, attributes and measures of the data that is defined in the fact in a cube. Facts and its dimensions are usually modelled using a Star diagram or a Snowflake. There are so called because the model, when laid out on paper or in a modelling software, resembles these physical objects. Here is an example of a fact and dimensions modeled as a star.

Star Data Model
Star Data Model | Source

Furthermore, the data is organized into hierarchies and levels. To illustrate, the following diagram depicts a product hierarchy, which consists of Product Category, Product Sub-Category and Product. From this example, we can surmise that this hierarchy has three levels. Hierarchies are logical groupings of data in the data set. It is up to the Data Modeler to define those hierarchies based on the available data.

MDX Data  Hierarchy
MDX Data Hierarchy | Source

Here is sample code to give you an idea on what the syntax looks like:

MDX SELECT Query

SELECT {[Measures].[Net Invoiced Sales] ON COLUMNS},
{[Category].[Sub-Category].[Product]} ON ROWS}
FROM  [Reseller Sales]
WHERE [Territory].&[South-West]
AND [Date].[Calendar Year].&[2012]

MDX is a powerful query language that acts on three dimensional data models and is one of the more important tools, in my opinion, since the language allows an analyst of extract meaning and insight from the cubes of data that is stored in data warehouses.

DMX

DMX is similar to MDX, however it is used for Data Mining which consists of discovering patterns in large data sets. DMX means Data Mining Extensions. DMX is only found in SQL Server technology unlike MDX. One important use of DMX is statistical analysis and is becoming important in business intelligence as the practice evolves and data sets in data centers get larger and larger. DMX is also used to define and query decision trees, to predict outcomes based on a set of variables using large data sets.

DMX is organized into two broad types of operations:

  • Data Manipulation Language
  • Data Definition Language


Data Definition Language (DML)
The DML part of DMX can be used to copy DMX model, import and export models or create and delete models and structures. DMX syntax is similar to MDX and SQL but its API is completely different. For example the following code creates a mining structure that will be used to add models which can be used to make predictions from data sets afterwards using the Data Manipulation Language.

Basic Query Example

CREATE [SESSION] MINING STRUCTURE SalesStructure
(
    [([Product Name], [Sales Territory], Sales, [Quantity Ordered])]
)
[WITH HOLDOUT (10)]
[REPEATABLE(0)]
10::=  50 PERCENT | 500 CASES

ALTER MINING STRUCTURE [SalesStructure]
ADD MINING MODEL [Territory Sales]
(
    [Product Name],
    [Product Name], [Sales Territory], Sales, [Quantity Ordered],
) USING Using Microsoft_Decision_Trees
  WITH DRILLTHROUGH

Data Manipulation Language

DML in contrast works with existing models to make predictions and to query the models. Data Manipulation Language primarily uses SELECT and INSERT statements on model’s, content, cases, sample_cases or dimension_content.

The following SELECT is a simple DML query:

DML Select

SELECT StructureColumn('Product Name') AS Product, * 
FROM [Territory Sales].Cases
WHERE IsTrainingCase()
AND IsInNode('Book')

XMLA (XML for Analysis)

XMLA is not a query tool but it is a standard API (Application Programming Interface) to allow client applications, including custom built applications and web based applications to interface in a standard way with cubes that are stored in Multidimensional databases like SQL Server’s server Analysis Services.

Most of the leading BI vendors support this new open standard API that has been developed by Microsoft. XMLA is a web based technology that provides the API to interface with cubes using HTTP, SOAP, or RESTful and Internet Protocols. XMLA is an integral part of SQL Server Business Intelligence services.

PowerPivot

One of the latest additions to Microsoft Business Intelligence architecture. PowerPivot can be added to Excel as a free plugin to allow for OLAP queries and aggregated data. While it can create a pivot table from a single source, PowerPivot can aggregate the data from multiple sources, up to 100 million rows, thus adding graat analytical strength to Excel. PowerPivot can be built as an in-memory OLAP cube using a collection of databases tables, views and tables in Excel which be used create to a Pivot Table in Excel to slice and dice the data.

PowerPivot can also be enabled on the SQL Server if you are using SQL Server 2008 R2 or greater and can be used from Sharepoint for data hosting. PowerPivot will actually create cubes from its relational data source or other sources like another spreadsheet or other database. PowerPivot greatly increases Excel’s processing power, analytical and statistical capabilities.

PowerPivot can be downloaded for free from Microsoft and added to Excel 2007 or 2010 or greater. PowerPivot provides many tools for data manipulation like DAX.

Connect to data source
Connect to data source | Source
Build data model
Build data model | Source

Microsoft Office SharePoint Server

Sharepoint is actually a very large and complex web based framework for creating portals that works as a repository for digital content. Sharepoint is a vast technology but in regards to BI, it can publish reports (from SSRS) or elsewhere, it can publish dashboards and KPIs (Key Performance Indicators) and embed Excel spreadsheets and even has a PowerPivot hosting functionality all as web pages.

In essence Sharepoint acts as a portal to dispense BI content that is generated from the other SQL BI Tools. As a BI component, Sharepoint can host dashboards, scorecards and other analytical components.

Excel

Microsoft Excel is probably the most used BI tool besides Microsoft SQL Server. Excel is also the most recognized tool for data analysis that is used in business, government and academia.

One of the most prominent features in Excel is the calculation engine that can be used for data crunching using either tabular data and more importantly for business intelligence using the pivot table. Tabular data is formatted in tables using Excel worksheets that are stored in a workbook. Pivot tables on the other hand can use tabular data or data stored in databases like a SQL server database or from other sources to slice and dice the data to gain insight and also predictions or to discover patterns or trends.

The following screenshot depicts a pivot table with two data slicers that were introduced with Excel 2010. These slicers can also allow the developer, analyst or other end user quickly slice the data and drill down into large data sets that can extend to a 100 millions rows.

Excel Pivot Table & Slicers
Excel Pivot Table & Slicers | Source

Excel also offers a very powerful macro language: VBA (Visual Basic for Applications) which can be accessed from the Tools menu. The VBA code can be created using the macro recorder, which captures every step that a user performs in Excel and writes the code for you. It is a very powerful tool and a real time saver. It is also possible to write VBA code directly in one of the event methods that are precoded into each workbook and worksheet.

VBA Editor in Excel
VBA Editor in Excel | Source

Another powerful feature that is available in the VB Editor is the ability to create custom forms just like Visual Basic as the following screenshot demonstrates. Most of Microsoft Visual Basic controls are available and can be used to build powerful user screens can be used in conjunction with either a worksheet or the workbook.

Creating Visual Basic Forms in Excel
Creating Visual Basic Forms in Excel | Source

SQL Server Database Engine

Within the BI environment, the SQL Server Database Engine is the cornerstone of the whole architecture since it the main repository of information either as a data mart or as a data warehouse. The Database engine in the SQL Server server environment is used to create relational tables, views, stored procedures and functions amongst other advanced features like Messaging.

SQL server databases are great for storing all types of data, including native XML, binary data like images and videos. SQL Server is not a database, but a repository of databases that can be accessed from other RDBMS’s, custom applications, web applications and other end user clients applications.

Newer versions of the software, at the time of this writing, will be able to handle Big Data with the implementation of Apache Hadoop.

Creating Data Schemas in SQL Server
Creating Data Schemas in SQL Server | Source

Microsoft SQL Server is now one of the leading Business Intelligence platforms on the market.

More by this Author


Comments 4 comments

Kaili Bisson profile image

Kaili Bisson 4 years ago from Canada

Wow Kevin, this is fantastic information. I am quite familiar with SQL/SQL Server, but learned much from reading this (my BI experience is with Cognos). Well done!


klanguedoc profile image

klanguedoc 4 years ago from Canada Author

Hi Kaili. Thanks for the feedback. I primarily use SQL Server BI on a daily basis. My employer is talking about implementing Cognos and for now SQL BI is doing a great job at a fraction of the cost.


tillsontitan profile image

tillsontitan 4 years ago from New York

This is a very informative hub and very well written. I'm sure all the technology lovers will be thrilled! Voted up, useful, and interesting.


klanguedoc profile image

klanguedoc 4 years ago from Canada Author

Thanks tillsontitan, I really appreciate. It is actually my longest hub to date. My hubs are usually in and around 1500 words. Thanks for the vote.

    Sign in or sign up and post using a HubPages Network account.

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    Click to Rate This Article
    working