Power BI Desktop: Query Examples for Beginners
What is Microsoft Power BI
Power BI is Microsoft's answer to a Business Intelligence reporting tool. It's all online through Office 365, and allows for performing Extract, Transform, Load (ETL) and creating visualizations of data. (Charts, graphs, heat maps, geo, etc.)
Visit the Microsoft Power BI marketing site
What is Microsoft Power BI Desktop
Microsoft Power BI Desktop is a stand-alone application that is downloaded to a local machine, and allow for the creation of PBI Reports. The exact reasons as to WHY a desktop version was needed isn't clear. I've seen videos from 2015 where it was referred to as PBI Designer, but now it appears to have settled into Desktop. The presenter tried to answer the question as to why the functionality that is in Designer is also in Excel, and he stated that it is more difficult for people to upgrade Excel than Designer/Desktop, and he implied that there are regular updates.
I can definitely say that the ONLY reason that I had to use PBI Desktop is because I wanted to directly link SharePoint Lists to create a dashboard, and the ONLY way to do that is using Desktop. Other than that, I don't know the scope of duplication that exists between the cloud and the desktop version at this time. My guess is that other than the linking, everything that I'm about to discuss will apply to the cloud, too.
Linking to a SharePoint List in Microsoft Power BI Desktop
Since the action of linking a SharePoint List is what has brought me to this point, I figure I might as well share it with you. There are numerous places on the web where you can find these step-by-step instructions, but since I encountered an annoying error, for those that attempt to do this, you'll find this gotcha-workaround to be beneficial... hopefully.
Link to a SharePoint List from Power BI Desktop... Take 1
- Download PBI Desktop, and install as directed
- From the Home Ribbon, choose Get Data
- At the bottom, locate the tiny little word: "More...", and click on it.
- In the Get Data dialog, from the left pane, Click on "Other".
- Upon update, choose "SharePoint List" from the right pane
- Click the yellow Connect button.
- You will be prompted with a URL dialog box, but it does NOT have a browse option. (Copy and paste it from a browser window that is open to your SharePoint site.) Only enter the URL up to the site name where your list resides, not a link to the list itself.
https://myserver/sites/mysite/
https://mycompany.sharepoint.com/sites/mysite - Click the yellow Ok button
- The Navigator dialog will open and show you the lists available to be linked. Select all of the lists that you would like to link.
DIGRESSION:
At this point, PBID is expecting that you will pull in the data from the entire list. This isn't great because perhaps you only want a subset of the data, but there's IS NOT a way to partition the data, prior to linking to it. It will link to everything, but once it's there, you can slice and dice it.
DANGER, DANGER, WILL ROBINSON!!
As with all things Microsoft, there are times when worlds collide, and despite the fact that someone in Redmond should assign an intern from a foreign land to fix it, they never bother. It won't make them any money, so you just get to live with it. Anyway, here it goes:
In every SharePoint list, there is an inherent field called ID. You can't delete it, you can't change the fieldname, you can't edit the data, and you can't prevent it from being imported in when you link. You'll have nothing and like it.
Whenever data is brought into PBID, the first thing it does is ... can you guess? That's right, it creates a new field, called ID, and assigns a unique value to every row. Just like SharePoint.
Now, in an ideal world, SharePoint would create an ID field called ID_SHPT, and PBID would create an ID field called PBID_ID, and whenever the two crossed paths in a dark alley, nary a conflict would occur. NOT SO FAST!
So, here you are, the innocent victim, just taking your data out for a stroll, and upside your head comes an error message that states that the field named ID cannot be duplicated. You scratch your head, because you may not even know that SharePoint lists have a field called ID, and I'm 100% sure that you had no clue that PBID was going to slap one in there, and now you're left holding an ice bag on your blackeye trying to figure out what YOU did wrong. Here, take this with you. POW! SOCK! BAM!
Ok, so fear not. The error message box has an Edit Query button on it. Click that and then do either of the following:
1. Delete the incoming ID column. This is fine to do if you have no need for it, but be wary as that in the future, you may have linking data and instead wish that you had brought in the ID.
2. Change the field name of the ID column. This is probably a better practice, because, at some point, you may need that piece of data for troubleshooting the data between your reports, or linking this data to a related table.
Either way, just right-click on the offending ID field name, and choose Delete or Rename. I like to rename mine to something like, ID_Tasks, ID_Docs, or ID_NotTheDamnPBID_ID.
Please note that when tracing through the code below, this is covered again. Sorry for the duplication, but some people may skip this section, or maybe you'll skip the section below.
PBID Query Language
Learning the query language for PBID starts with understanding that it's based on a language called "M". That's right, M. No, just M. Well, that's the nickname, anyway, it's really called "Power Query" and Microsoft refers to it as a "mashup query language". If you haven't found anything about it, here's the place to start:
Introduction to Power Query (informally known as "M") Formula Language
I have yet to find a repository that satisfies my developer needs, but this is about as close as I'm going to get for now. Read on to help understand a little about this glorious new "mostly pure, higher-order, dynamically typed, partially lazy, functional language." (For whatever that means?!")
Example 1: Link to a SharePoint List
let Source = SharePoint.Tables("https://EasyBins.sharepoint.com/sites/advertising", [ApiVersion = 15]), #"3c1a1547-02ee-441a-a3ce-cca60043b5d5" = Source{[Id="3c1a1547-02ee-441a-a3ce-cca60043b5d5"]}[Items], #"Renamed Columns" = Table.RenameColumns(#"3c1a1547-02ee-441a-a3ce-cca60043b5d5",{{"Id", "Id_Env"}}), #"Removed Other Columns" = Table.SelectColumns(#"Renamed Columns",{"Id_Env", "Code"}) in #"Removed Other Columns"
Example 1: Breakdown
Disclaimer: This example isn't great, because ALL of this code can be written for you with clicks on the UI. To try to write this code by hand would be fool hardy, as it will be fraught with error that are nearly impossible to debug.
Let Clause - In a query expression, it is sometimes useful to store the result of a sub-expression in order to use it in subsequent clauses. You can do this with the "let" keyword, which creates a new range variable and initializes it with the result of the expression you supply. Once initialized with a value, the range variable cannot be used to store another value. However, if the range variable holds a queryable type, it can be queried.
So, in this code, the Let clause starts by stating what the Source of the data link will be:
Source = SharePoint.Tables("https://easybins.sharepoint.com/sites/advertising", [ApiVersion = 15]),
This creates a variable called Source that points to an Office365 SharePoint site. (I don't know what the ApiVersion does, but assume I shouldn't mess with it.) Note that the line ends with a comma, but only because there are subsequent lines of code in the let clause. The last line in the let clause does not have a comma. (This will be your first source of errors, but fortunately, the parser catches it.)
#"3c1a1547-02ee-441a-a3ce-cca60043b5d5" = Source{[Id="3c1a1547-02ee-441a-a3ce-cca60043b5d5"]}[Items]
This next line starts with "#", a pound sign, hashtag, or tic-tac-toe board, depending what generation you are. For whatever reason, rather than using a variable like MyList, they opt to use the entire list GUID as a variable name. I guess it guarantees that it's unique, but that's some serious overkill. Ok, so, GUID = Source, and source refers to the Source variable in the first let clause line of code, and then it passes the Id parameter populated with the GUID. It finishes with [Items], which I assume means ALL Items. It would be great to be able to put an equasion right there instead, like: Status="Completed", but I couldn't find any reference to this.
#"Renamed Columns" = Table.RenameColumns(#"3c1a1547-02ee-441a-a3ce-cca60043b5d5",{{"Id", "Id_Env"}}),
This line of code, for me, was required. There is a disconnect between PBID and SharePoint, whereas when a new PBID datasource is made, it attempts to add it's own ID field. If you've used SharePoint for more than 15 minutes, then you understand that ALL SharePoint Lists already HAVE a field called ID. So, when attempting to link a SharePoint list to PBID, you are immediately shown an error that the ID field name cannot be duplicated. The left hand doesn't know what the right hand is doing, so this line of code is the result of using the UI to manually rename the incoming ID field to something unique. In this case, it has been renamed to "Id_Env".
Take note that this now creates a new variable called "Renamed Columns". It renames the column in the GUID-named variable from Id to Id_Env.
#"Removed Other Columns" = Table.SelectColumns(#"Renamed Columns",{"Id_Env", "Code"})
This next line is the result of deleting unneeded columns. SharePoint adds many extra fields to a list that you may or may not see. So, the PBID UI was employed to pare them down. (I used Ctrl-Clicks to select the fields that I wanted to keep, then use the "Delete other columns" option from the ribbon.
This creates a new variable called "Removed Other Columns" and it starts with the data in the "Renamed Columns" variable, and then pares it down to just the two columns "Id_Env" and "Code".
in #"Removed Other Columns"
The "In" Clause - The in clause is then used to declare what is returned by this query. (At least this is the only description I could really discern.) It seems to me that the last variable in the let clause is always used in the in clause, so it tells me that I don't really know the true purpose of the in clause. I don't know how else it could be used. Maybe it's possible to generate a couple of sources in the Let, and then join them together in the In. (I'm thinking akin to Joins in SQL, but again, I can't find any documentation to support this.) That would be really cool, so stay tuned, maybe I'll get daring and try later.
Example 2: SharePoint Linked List Distribution Query
let Source = SharePoint.Tables("https://easybins.sharepoint.com/sites/advertising", [ApiVersion = 15]), #"e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5" = Source{[Id="e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5"]}[Items], #"Renamed Columns" = Table.RenameColumns(#"e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5",{{"Id", "Id_Tasks"}}), #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([Status0] <> "Complete")), #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Module", "EnvironmentId"}, {{"Count", each Table.RowCount(_), type number}}) in #"Grouped Rows"
Example 2: Code Breakdown
In this example, it start with all the code that you'll recognize from Example 1 for importing the data, and overcoming the ridicules duplicate ID field name error.
Next, I wanted to import data for only those task rows where the Status was not equal to "Complete". This code is NOT generated by the UI, and it was hand entered. (Well, copy and pasted from another blog, anyway.)
#"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([Status0] <> "Complete")),
This creates a new variable called "Filtered Rows". It starts with the data found in the variable "Renamed Columns", and applies the filter. I'll do another example later that has a more complex filter, such that AND, OR, NOT, and a myriad of other operator are used.
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"Module", "EnvironmentId"}, {{"Count", each Table.RowCount(_), type number}})
This bit was created from the UI, and it groups the data, by one or more fields, and then performs an aggregation, like SUM or COUNT. I will note that I had trouble getting the grouping dialog box to open again after this code was placed. So, if you mess up, just delete this piece of code and then opening the dialog box works. (I suppose you could try just altering the code here, but with all the parens and quotes and things, it's just too error prone for me.)
Example 3: Stacked Bar Graph from Task List
This 3rd example requires a little setup, and it will really help get your mind around the simplicity that is Power BI. The key word here is simple, in that PBI is very simple in it's abilities, not in the setup. It's quite the opposite of the word simple.
Ok, so I have a basic SharePoint Tasks list. Something like:
ID - SharePoint Task ID
Task Name - Task Name
Task Status - Status (Open, Working, Testing, Completed)
Module - Business Rule Category
For my Power BI Visualization, I would like to create a stacked bar chart that each bar represents a Module, where the count of Active (Open, Working, Testing) tasks are at the bottom of the bar, and the count of the Completed tasks are at the top. Together, they add up to the total number of tasks assigned to the module.
In order to do this, the Tasks List needs to be queried twice. Once for the Count(ID) per Module, where Status is NOT equal to "Completed", and again for a Count(ID) per Module where Status IS equal to "Completed".
Example 3a: Completed Tasks
let Source = SharePoint.Tables("https://easybins.sharepoint.com/sites/advertising", [ApiVersion = 15]), #"e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5" = Source{[Id="e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5"]}[Items], #"Renamed Columns" = Table.RenameColumns(#"e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5",{{"Id", "Id_Tasks"}}), #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([Status0] = "Complete")), #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Module", "EnvironmentId"}, {{"Count", each Table.RowCount(_), type number}}), #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Status", each "Complete") in #"Added Custom"
Example 3a: Code Breakdown
Most of the code in this example mimics that of Example 2, with the inclusion of one custom field.
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Status", each "Complete")
To understand why this field is needed, you need to understand the data that is the result of the distribution query. The table below shows a data sample:
Distribution Query Result Example
Module
| TaskCount
|
---|---|
A
| 111
|
B
| 22
|
C
| 55
|
By adding the custom column to the end, this allow the designation of which types of tasks where counted. So, you end up with a result set like this:
Completed Tasks Distribution Query
Module
| Count
| Status
|
A
| 111
| Completed
|
B
| 22
| Completed
|
C
| 55
| Completed
|
Example 3b: Active Tasks with Transformations
let Source = SharePoint.Tables("https://easybins.sharepoint.com/sites/advertising", [ApiVersion = 15]), #"e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5" = Source{[Id="e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5"]}[Items], #"Renamed Columns" = Table.RenameColumns(#"e2fd7e68-cdbe-42a3-b546-8f1cccec6aa5",{{"Id", "Id_Tasks"}}), #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([Status0] <> "Complete")), #"Grouped Rows" = Table.Group(#"Filtered Rows", {"Module", "EnvironmentId"}, {{"Count", each Table.RowCount(_), type number}}), #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Status", each "Active"), #"Appended Query" = Table.Combine({#"Added Custom", #"Tasks Complete"}), #"Pivoted Column" = Table.Pivot(#"Appended Query", List.Distinct(#"Appended Query"[Status]), "Status", "Count") in #"Pivoted Column"
Example 3b: Code Breakdown
Now that we have a Count of Completed Tasks per Module in Example 3a, the code in 3b will now calculate the Count of Active Tasks per Module. So, that code is relatively the same, with the exception of the filter being <> "Completed", and the custom field containing "Active".
The next steps are to smash the two data results together, into one long list, then transpose the Status column, thus giving us a PivotTable or a CrossTab. (Again, depending on your generation.)
#"Appended Query" = Table.Combine({#"Added Custom", #"Tasks Complete"}),
In case I didn't mention it, Example 3a is it's own Query, called Tasks Completed. Example 3b is another query, which generates the Active Tasks, but then performs an append of the Active Tasks with the Completed tasks. So, a variable called "Appended Query" is made and it takes the Active Tasks and Appends the Completed Tasks. Note that it references the other query called "Tasks complete".
#"Pivoted Column" = Table.Pivot(#"Appended Query", List.Distinct(#"Appended Query"[Status]), "Status", "Count")
Next, the pivot is made on the variable "Appended Query", and that Transforms the data into the Crosstab. The end result puts the data into this shape:
Module
| Active
| Completed
|
---|---|---|
A
| 44
| 111
|
B
| 33
| 22
|
C
| 78
| 55
|
It's worth noting that both the Append Query and the Pivot Transformation are accomplished using the menu choices from the Ribbon.
Summary
Power BI is a useful tool, but only if you know how to gather the data that the desired chart expects. To me, make the chart is the icing, but the ETL is the making of the multi-tiered wedding cake. It requires the correct batter, the right pan shape, the correct baking, man am I hungry for some cake. You get the point.
Again, it may be that after the data is loaded, that I can do the rest from just PBI Cloud, but since I started doing things in PBI Desktop, I just kept going to see what it can do. In another post, I'll create a report full of integrated Visualizations and then you can taste the whole cake.