Microsoft Power BI Desktop contains a rich set of data source connectors and transformation capabilities that support the integration and enhancement of source data. These features are all driven by a powerful functional language and query engine, M, which leverages source system resources when possible and can greatly extend the scope and robustness of the data retrieval process beyond the possibilities of the standard query editor interface alone. As with almost all BI projects, the design and development of the data access and retrieval process has great implications for the analytical value, scalability, and sustainability of the overall Power BI solution.
[box type=”note” align=”” class=”” width=””]Our article is an excerpt from the book Microsoft Power BI Cookbook, written by Brett Powell. This book shows how to leverage Microsoft Power BI and the development tools to create better data driven analytics and visualizations. [/box]
In this article, we dive into Power BI Desktop’s Get Data experience and go through the process of establishing and managing data source connections and queries. Examples are provided of using the Query Editor interface and the M language directly to construct and refine queries to meet common data transformation and cleansing needs. In practice and as per the examples, a combination of both tools is recommended to aid the query development process.
Viewing and analyzing M functions
Every time you click on a button to connect to any of Power BI Desktop’s supported data sources or apply any transformation to a data source object, such as changing a column’s data type, one or multiple M expressions are created reflecting your choices. These M expressions are automatically written to dedicated M documents and, if saved, are stored within the Power BI Desktop file as Queries. M is a functional programming language like F#, and it’s important that Power BI developers become familiar with analyzing and later writing and enhancing the M code that supports their queries.
- Build a query through the user interface that connects to the AdventureWorksDW2016CTP3 SQL Server database on the ATLAS server and retrieves the DimGeography table, filtered by United States for English.
- Click on Get Data from the Home tab of the ribbon, select SQL Server from the list of database sources, and provide the server and database names. For the Data Connectivity mode, select Import.
A navigation window will appear, with the different objects and schemas of the database. Select the DimGeography table from the Navigation window and click on Edit.
- In the Query Editor window, select the EnglishCountryRegionName column and then filter on United States from its dropdown.
Figure 2: Filtering for United States only in the Query Editor
At this point, a preview of the filtered table is exposed in the Query Editor and the Query Settings pane displays the previous steps.
Figure 3: The Query Settings pane in the Query Editor
How to do it
- With the Formula Bar visible in the Query Editor, click on the Source step under Applied Steps in the Query Settings pane. You should see the following formula expression:
Figure 4: The SQL.Database() function created for the Source step
- Click on the Navigation step to expose the following expression:
Figure 5: The metadata record created for the Navigation step
- The navigation expression (2) references the source expression (1)
- The Formula Bar in the Query Editor displays individual query steps, which are technically individual M expressions
- It’s convenient and very often essential to view and edit all the expressions in a centralized window, and for this, there’s the Advanced Editor
M is a functional language, and it can be useful to think of query evaluation in M as similar to Excel spreadsheet formulas in which multiple formulas can reference each other. The M engine can determine which expressions are required by the final expression to return and evaluate only those expressions.
Configuring Power BI Development Tools, the display setting for both the Query Settings pane and the Formula bar should be enabled as GLOBAL | Query Editor options.
Figure 6: Global layout options for the Query Editor
- Alternatively, on a per file basis, you can control these settings and others from the View tab of the Query Editor toolbar.
Figure 7: Property settings of the View tab in the Query Editor
Advanced Editor window
Given its importance to the query development process, the Advanced Editor dialog is exposed on both the Home and View tabs of the Query Editor.
It’s recommended to use the Query Editor when getting started with a new query and when learning the M language. After several steps have been applied, use the Advanced Editor to review and optionally enhance or customize the M query. As a rich, functional programming language, there are many M functions and optional parameters not exposed via the Query Editor; going beyond the limits of the Query Editor enables more robust data retrieval and integration processes.
Figure 8: The Home tab of the Query Editor
- Click on Advanced Editor from either the View or Home tabs (Figure 8 and Figure 9, respectively). All M function expressions and any comments are exposed
Figure 9: The Advanced Editor view of the DimGeography query
When developing retrieval processes for Power BI models, consider these common ETL questions:
- How are our queries impacting the source systems?
- Can we make our retrieval queries more resilient to changes in source data such that they avoid failure?
- Is our retrieval process efficient and simple to follow and support or are there unnecessary steps and queries?
- Are our retrieval queries delivering sufficient performance to the BI application?
- Is our process flexible such that we can quickly apply changes to data sources and logic?
M queries are not intended as a substitute for the workloads typically handled by enterprise ETL tools such as SSIS or Informatica. However, just as BI professionals would carefully review the logic and test the performance of SQL stored procedures and ETL packages supporting their various cubes and reports environment, they should also review the M queries created to support Power BI models and reports.
How it works
Two of the top performance and scalability features of M’s engine are Query Folding and Lazy Evaluation. If possible, the M queries developed in Power BI Desktop are converted (folded) into SQL statements and passed to source systems for processing. M can also reduce the required resources for a given query by ignoring any unnecessary or redundant steps (variables).
M is a case-sensitive language. This includes referencing variables in M expressions (RenameColumns versus Renamecolumns) as well as the values in M queries. For example, the values “Apple” and “apple” are considered unique values in an M query; the Table.Distinct() function will not remove rows for one of the values. Variable names in M expressions cannot have spaces without a hash sign and double quotes. Per Figure 10, when the Query Editor graphical interface is used to create M queries this syntax is applied automatically, along with a name describing the M transformation applied. Applying short, descriptive variable names (with no spaces) improves the readability of M queries.
The query from this recipe was “folded” into the following SQL statement and sent to the ATLAS server for processing.
Figure 10: The SQL statement generated from the DimGeography M query
Right-click on the Filtered Rows step and select View Native Query to access the Native Query window from Figure 11:
Figure 11: View Native Query in Query Settings
Finding and revising queries that are not being folded to source systems is a top technique for enhancing large Power BI datasets. See the Pushing Query Processing Back to Source Systems recipe of Chapter 11, Enhancing and Optimizing Existing Power BI Solutions for an example of this process.
M query structure
- The great majority of queries created for Power BI will follow the let…in structure as per this recipe, as they contain multiple steps with dependencies among them.
- Individual expressions are separated by commas.
- The expression referred to following the in keyword is the expression returned by the query.
- The individual step expressions are technically “variables”, and if the identifiers for these variables (the names of the query steps) contain spaces then the step is placed in quotes, and prefixed with a # sign as per the Filtered Rows step in Figure 10.
- The M engine also has powerful “lazy evaluation” logic for ignoring any redundant or unnecessary variables, as well as short-circuiting evaluation (computation) once a result is determinate, such as when one side (operand) of an OR logical operator is computed as True. The order of evaluation of the expressions is determined at runtime; it doesn’t have to be sequential from top to bottom.
- In the following example, a step for retrieving Canada was added and the step for the United States was ignored. Since the CanadaOnly variable satisfies the overall let expression of the query, only the Canada query is issued to the server as if the United States row were commented out or didn’t exist.
Figure 12: Revised query that ignores Filtered Rows step to evaluate Canada only
View Native Query (Figure 12) is not available given this revision, but a SQL Profiler trace against the source database server (and a refresh of the M query) confirms that CanadaOnly was the only SQL query passed to the source database.
Figure 13: Capturing the SQL statement passed to the server via SQL Server Profiler trace
Partial query folding
- A query can be “partially folded”, in which a SQL statement is created resolving only part of an overall query
- The results of this SQL statement would be returned to Power BI Desktop (or the on-premises data gateway) and the remaining logic would be computed using M’s in-memory engine with local resources
- M queries can be designed to maximize the use of the source system resources, by using standard expressions supported by query folding early in the query process
- Minimizing the use of local or on-premises data gateway resources is a top consideration
Limitations of query folding
- No folding will take place once a native SQL query has been passed to the source system. For example, passing a SQL query directly through the Get Data dialog. The following query, specified in the Get Data dialog, is included in the Source Step:
Figure 14: Providing a user defined native SQL query
- Any transformations applied after this native query will use local system resources. Therefore, the general implication for query development with native or user-defined SQL queries is that if they’re used, try to include all required transformations (that is, joins and derived columns), or use them to utilize an important feature of the source database not being utilized by the folded query, such as an index.
- Not all data sources support query folding, such as text and Excel files.
- Not all transformations available in the Query Editor or via M functions directly are supported by some data sources.
- The privacy levels defined for the data sources will also impact whether folding is used or not.
- SQL statements are not parsed before they’re sent to the source system.
- The Table.Buffer() function can be used to avoid query folding. The table output of this function is loaded into local memory and transformations against it will remain local.
We have discussed effective techniques for accessing and retrieving data using Microsoft Power BI. Do check out this book Microsoft Power BI Cookbook for more information on using Microsoft power BI for data analysis and visualization.