SQL – Akif – akif.techie.ca

Modern No-Code Cloud-First Enterprise Business Intelligence Solution

With all new shiny and ever changing systems/solutions in the market, isn’t it the right time to rethink about enterprise business intelligence solution? Is using the same traditional design still the best approach where

You have SSIS packages along with so many console applications extracting the data from the source systems, then manipulating it and storing in the staging database.
Another layer of SSIS packages/stored procedures will prepare the data and move them into datawarehouse.
Your rigid reporting tool will connect to Data warehouse and to the Multi-dimensional model.
You need a report developer creating new reports and your SSIS developer working on new packages for any change.

Some of the organization are considering moving to cloud thus avoiding the ever-increasing storage and computational power needed to retrieve the data from terabytes of storage. So how does the above design approach fit into cloud first design or is there a better solution?

What about all new online sources like social feeds, streams of data about the footfall of your consumers, surveys from survey monkey, interactions with your bot etc. Is it the best approach to fetch these using the same traditional packages, thus purchasing connectors and adapters to connect to these systems and implement what has already been done so many time, or can we utilize cloud bases services having all these connectors and logic available as part of the platform?

With all these in mind, we can go with cloud-only (Azure only) with minimal development efforts. Obviously design and implementation efforts for databases and SQL queries for data cleansing and preparation will still be needed.

Proposed Design

Data Warehouse

Azure Data Warehouse (ADW) is a very good contender with Massively Parallel Processing capabilities and options to elastically increase/decrease computational nodes.

Azure SQL database

Why do you need SQL database when you have data warehouse. You need landing and staging areas to clean, process and prepare the data before pushing it to data warehouse. ADW is not the right contender for data manipulation. It is best for querying. Secondly, conceptually, it does not make sense to use your data warehouse for landing and staging purposes.

ETL Technologies

Following are the ETL technologies that I propose:

Azure Data Factory (ADF)(preferred)
Microsoft Flow

Azure Data Factory (ADF)

With a growling list of supported data sources and ability to develop custom .Net activity or use SSIS (in v2), you can use it to connect any data source. The list of supported data sources is available here.

You will need to install data gateway within an on-premises server having access to all internal sources and to Azure SQL database to integrate on-premises sources.

ADF will also be used to move the data from staging SQL database to Azure Data Warehouse.

Microsoft Flow

If we can use ADF for all kind of sources, why do we need Microsoft Flow. Although the list of sources supported by ADF is extensive but still there are many common sources that are not so easy to connect with ADF and require you to write code. For example Excel files lying in your network share, OneDrive, SharePoint, SFTP or DropBox, or data within SharePoint online etc.

Microsoft Flow has a much more extensive list of connectors. What I like most about Microsoft Flow is it is event triggered. You will not need to pull the data from your budget file every month. It can be pushed as soon as updated file is available. Please note Microsoft Flow is not meant to move large volumes of data. It is good for small chunks.

If you need to pull your budgets from OneDrive/SFTP or need to pull Footfalls from a dropbox, then I will recommend Microsoft Flow. It will not require you to write code and you can even add approval/review steps.

Please note, you will need to install a gateway for Microsoft Flow within your network to connect to on-premises sources. Its gateway is different from ADF but same works for other services like PowerApps, Azure Analysis Services and PowerBI.

Data Sources

In the above design, I divided all sources into 4 broader categories:

Internal Sources
Azure/Office 365 (Azure EventHub, Azure Stream, Dynamics 365, SharePoint Online etc)
Other public sources
Data entered by users (PowerApps) (e.g. metadata, external factors like public holidays etc.)

Azure Analysis Services (AAS)

This will be your semantic layer on-top of ADW. It will provide necessary aggregations and performance that you need for your reporting.

AAS will be processed on regular intervals by Azure Automation Runbook.

Scheduler

Azure Data Factory and Microsoft Flow have its own schedulers. But we will not be using them to process AAS database. For this, you can use scheduler of Azure Automation for Azure Automation Runbooks.

Not recommended, but if you have to integrate with sources not supported by Azure Data Factory and Microsoft Flow and you don’t want to write any .Net code or develop SSIS package, you can utilize Azure Automation Runbooks. They support Python as well as PowerShell with an extensive list of modules.

PowerBI

PowerBI is my preferred choice for reporting with an extensive list of capabilities and ease of use. You can hook it with Azure Analysis Services with a live connection and then be creative with your reports. Being PowerBI report designer is as easy as being a Power User of Excel.

Last Words

I made the overall design very simple 🙂 but it is not as simple when you start implementing it. Let me know your thoughts and feedback.

Find Reporting Services (SSRS) Reports using Dynamics CRM (DB)

Recently we migrated a CRM 4.0 implementation to Dynamics CRM Online. As part of the migration, we revamped all scripts, integration and reports connecting to CRM.

Once the system was live, it was discovered that the DBA has created some reports that were hosted in an SSRS server that was not connected to CRM. There was no CRM local or shared data-source.

DBA decided to use the data from CRM within joins of some of his reports. So he added CRM as a linked DB to his SQL server and then developed queries within the reports that uses data from CRM as well as from his SQL DBs. These reports were not available in any of the lower environments or in source code repository. The only way to identify the reports consuming data from CRM, as per him, was to export reports from SSRS and open their definitions one-by-one, which was cumbersome, for around 60 reports hosted on the SSRS server.

Fortunately, with the help of some forums, I was able to come up with the following query to list all SQL queries within all reports reading data from CRM DB. This query will list reports and the SQL statements within these reports consuming data from CRM DB:

Declare @Namespace NVARCHAR(500)

Declare @SQL   VARCHAR(max)

SELECT @Namespace= SUBSTRING(

x.CatContent

,x.CIndex

,CHARINDEX(‘”‘,x.CatContent,x.CIndex+7) – x.CIndex

)

FROM

(

SELECT CatContent = CONVERT(NVARCHAR(MAX),CONVERT(XML,CONVERT(VARBINARY(MAX),C.Content)))

,CIndex    = CHARINDEX(‘xmlns=”‘,CONVERT(NVARCHAR(MAX),CONVERT(XML,CONVERT(VARBINARY(MAX),C.Content))))

FROM Reportserver.dbo.Catalog C

WHERE C.Content is not null

AND C.Type = 2

) X

SELECT @Namespace = REPLACE(@Namespace,’xmlns=”‘,”) + ” ;

SELECT @SQL = ‘WITH XMLNAMESPACES ( DEFAULT ”’ + @Namespace + ”’, ”http://schemas.microsoft.com/SQLServer/reporting/reportdesigner” AS rd )

SELECT ReportName        = name

,DataSetName        = x.value(”(@Name)[1]”, ”VARCHAR(250)”)

,DataSourceName    = x.value(”(Query/DataSourceName)[1]”,”VARCHAR(250)”)

,CommandText        = x.value(”(Query/CommandText)[1]”,”VARCHAR(500)”)

,Fields            = df.value(”(@Name)[1]”,”VARCHAR(250)”)

,DataField        = df.value(”(DataField)[1]”,”VARCHAR(250)”)

,DataType        = df.value(”(rd:TypeName)[1]”,”VARCHAR(250)”)

FROM ( SELECT C.Name,CONVERT(XML,CONVERT(VARBINARY(MAX),C.Content)) AS reportXML

FROM ReportServer.dbo.Catalog C

WHERE C.Content is not null

AND C.Type = 2

) a

CROSS APPLY reportXML.nodes(”/Report/DataSets/DataSet”) r ( x )

CROSS APPLY x.nodes(”Fields/Field”) f(df)

Where x.value(”(Query/CommandText)[1]”,”VARCHAR(500)”) LIKE ”%_MSCRM%”

ORDER BY name ‘

EXEC(@SQL)

SQL Server 2008 Hidden Gems – Audit Trail Feature

Let’s say you have a requirement to record audit trail of CRUD operations. You are using SQL Server 2008 with either Entity framework or WCF Data Services. What will you do? First you will create a database design for recording audit trail, then you will either implement triggers or interceptors to record all activities. You may want to keep it simple and may spend 4-5 days to develop a simple and efficient audit trail recording module and feel proud like myself :). Wait a minute…. you should not feel proud as you have tried to develop something which is available in SQL Server 2008 Out-of-box waiting for you to just enable it and is much more efficient and detailed then what you have developed.

How???? Have you ever noticed that when you expand the security node in SSMS, there is a folder “Audit”. What is its purpose? It is a SQL Server Audit Trail feature that no one explores and then starts implementing their own ones.

Configuring it as simple as running 2 SQL commands. Below snapshop shows you how to enable audit trail for all CRUD operations by members of an AD Group ‘TestG’ on Production.Products table of AdventureWorks:

Use [master]
Go
Create Server Audit TestAudit
To Application_Log
With (QUEUE_DELAY = 1000, ON_FAILURE = Continue, State=ON);
Go

Use [AdventureWorks]
Go
Create Database Audit Specification ProductsListAccess
For Server Audit TestAudit
ADD (Select, Insert, Update, Delete ON Production.Product by TestG)
With (State = ON);
Go

In above example, we are logging activities in Application Log (Event Viewer). You can also log audit trail in a log file and then use fn_get_audit_file function of SQL Server to read contents e.g.

SELECT * FROM fn_get_audit_file('E:\SqlAudits\*', default, default)

In the example above, we are reading all audit trails stored in E:\SQLAudits. Now what is logged and in what format? Your CRUD operation will be logged in the data field in key value format where each key is separated by a space as shown below:

As shown above, entire TSQL statement is logged as part of the audit trail. Key for the SQL statement is “statement“. You can check audit logs by expanding Server -> Security -> Audits and then selecting “View Audit Logs” from context menu of your Server Audit that you have created using above SQL statements.

No need to write code any more to implement audit trail. Just configure it as you want including events that you want to log, obects which you want to monitor, and storage of these log files.

Finally to take most out of SQL, I will recommend to use Kerberos to pass user login to SQL, then play around with schemas to define permissions on objects and if possible have SQL controlling your data security for easier reporting.

Thanks everyone for implementing all sorts of Audit Trail modules when this is there for you to use….:)

How to configure SQL Server Performance Condition Alert for Log File Size

In SQL Server Management Studio, below SQL Server Agent, right-click Alerts and select New Alert.
Give your alert a name e.g. “Low Log Space Alert”, and from the Type drop-down list, select “SQL Server Performance Condition Alert”.
From the Object drop-down list, select the SQLServer:Databases object.
From the Counter drop-down list, select Percent Log Used.
Select the your database from the Instance drop-down list, and
Set the alert for when the counter rises above 90 by selecting Rises Above from the Alert If Counter drop-down list and entering 90 into the Value text box.
Select the Response page, select the Notify Operators check box, and select the check boxes for the notification options for your operator.
Select the Options page and select the E-mail check box to include the alert error text. Click OK.

Stored Procedure to Reindex Database

There are always many ways of reindexing all tables within a database but what about if you want to make it part of your maintenance plan. It is just configuration in Maintenance Plan wizard but what about if SSIS is not installed or configured. You can achieve same using SQL Agent based jobs and use following SP to reindex entire database:

CREATE PROCEDURE dbo.asp_reindex @database SYSNAME, @fragpercent INT
AS
DECLARE @cmd NVARCHAR(max),
@table SYSNAME,
@schema SYSNAME
–Using a cursor for demonstration purposes.
–Could also do this with a table variable and a WHILE loop
DECLARE curtable CURSOR FOR
SELECT DISTINCT OBJECT_SCHEMA_NAME(object_id, database_id) SchemaName,
OBJECT_NAME(object_id,database_id) TableName
FROM sys.dm_db_index_physical_stats (DB_ID(@database),NULL,NULL,NULL,’SAMPLED’)
WHERE avg_fragmentation_in_percent >= @fragpercent
FOR READ ONLY
OPEN curtable
FETCH curtable INTO @schema, @table
WHILE @@FETCH_STATUS = 0
BEGIN
SET @cmd = ‘ALTER INDEX ALL ON ‘ + @database + ‘.’ + @schema + ‘.’ + @table
+ ‘ REBUILD WITH (ONLINE = ON)’
–Try ONLINE build first, if failure, change to OFFLINE build.
BEGIN TRY
EXEC sp_executesql @cmd
END TRY
BEGIN CATCH
BEGIN
SET @cmd = ‘ALTER INDEX ALL ON ‘ + @database + ‘.’ + @schema + ‘.’
+ @table + ‘ REBUILD WITH (ONLINE = OFF)’
EXEC sp_executesql @cmd
END
END CATCH
FETCH curtable INTO @schema, @table
END
CLOSE curtable
DEALLOCATE curtable
GO

Find all Indexes for a database

If you want to list all indexes within a database excluding primary key and unique indexes, then use following query

SELECT ind.name, ind.index_id, ic.index_column_id, col.name, ind.* , ic.*, col.*
FROM sys.indexes ind

INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id

INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id

INNER JOIN sys.tables t ON ind.object_id = t.object_id

WHERE ind.is_primary_key = 0
AND ind.is_unique = 0
AND ind.is_unique_constraint = 0
AND t.is_ms_shipped = 0
ORDER BY t.name, ind.name, ind.index_id, ic.index_column_id

To List all primary key indexes, you can use following query:

SELECT ind.name, ind.index_id, ic.index_column_id, col.name, ind.* , ic.*, col.*
FROM sys.indexes ind

INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id

INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id

INNER JOIN sys.tables t ON ind.object_id = t.object_id

WHERE ind.is_primary_key = 1
ORDER BY t.name, ind.name, ind.index_id, ic.index_column_id

To list all clustered indexes, you can use following query:

SELECT ind.name, ind.index_id, ic.index_column_id, col.name, ind.* , ic.*, col.*
FROM sys.indexes ind

INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id

INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id

INNER JOIN sys.tables t ON ind.object_id = t.object_id

WHERE ind.type = 1
ORDER BY t.name, ind.name, ind.index_id, ic.index_column_id

To list all non-clustered indexes, use following query:

SELECT ind.name, ind.index_id, ic.index_column_id, col.name, ind.* , ic.*, col.*
FROM sys.indexes ind

INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id

INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id

INNER JOIN sys.tables t ON ind.object_id = t.object_id

WHERE ind.type = 2
ORDER BY t.name, ind.name, ind.index_id, ic.index_column_id

Hint: Don’t forget to set NEWSEQUENTIALID() as the default expression for your primary key column if you are planning to set uniqueidentifier as its datatype to reduce fragmentation of data. Otherwise it is better to use either IDENTITY column or creation datetime as the clustered index for your table

How to recover a suspended DB

Recently I faced an issue with my reporting services database. After restart of machine, it got suspended and I was not able to bring it online. After doing some googling, I ended up doing following steps to recover my DB:

EXEC sp_resetstatus ‘ReportServer’;

ALTER DATABASE ReportServer SET EMERGENCY

DBCC checkdb(‘ReportServer’)

ALTER DATABASE ReportServer SET SINGLE_USER WITH ROLLBACK IMMEDIATE

DBCC CheckDB (‘ReportServer’, REPAIR_ALLOW_DATA_LOSS)

ALTER DATABASE ReportServer SET MULTI_USER

SSAS errors: Errors in the metadata manager. The dimension with ID of ‘xxx’ referenced by the ‘yyy’ cube, does not exist.

While working on an SSAS solution, my Visual Studio crashed. When I restarted it and tried to deploy database, it started giving me error:

After some bit of googling, came to the following article. SSAS errors: Errors in the metadata manager. The dimension with ID of ‘xxx’ referenced by the ‘yyy’ cube, does not exist.. Once I removed old database from database directory which is “$:\Program Files\Microsoft SQL Server\MSAS10_50.MSSQLSERVER\OLAP\Data” and restarted the Analysis Service on target instance, I was able to deploy the solution.

Distributed Queries

Introduction

Many times we face the scenario where we have to execute some query on more than one database. One such example may be you have separate databases for each regional office and corporate office. At the end of year, you want o compile accounts and the effect of each regional office should reflect in these accounts. So you have to query each regional database but how to do this.

Well it is not so difficult. SQL server provides us system stored procedures and SQL functions that allow us to perform such operations.

Using the code

It is really easy to perform operations on different databases using distributed queries. Distributed queries provide SQL Server users with access to:

Ø Distributed data stored in multiple instances of SQL Server.

Ø Heterogeneous data stored in various relational and non-relational data sources accessed using an OLE DB provider.

Distributed queries can allow users to access another data source (for example, files, non-relational data sources such as Active Directory™, other SQL Servers and so on) using the security context of the Microsoft Windows account under which the SQL Server service is running. SQL Server 2000 impersonates the login appropriately for Windows NT logins; however, that is not possible for SQL Server logins.

Tables and views in external data sources can be referenced directly in SELECT, INSERT, UPDATE, and DELETE Transact-SQL statements. Because distributed queries use OLE DB as the underlying interface, distributed queries can access traditional relational DBMS systems with SQL query processors, as well as data managed by data sources of varying capabilities and sophistication. As long as the software owning the data exposes it in a tabular rowset through an OLE DB provider, the data can be used in distributed queries.

The simple way of performing a select operation against different data sources is:

Exec sp_addlinkedserver @server='Test', @srvproduct='', @provider='SQLOLEDB', @datasrc='192.168.1.254', @catalog='LuckyERP_LAL_Live'

Exec sp_serveroption 'Test', 'data access', 'true'

Exec sp_addlinkedsrvlogin 'Test', 'true'

Select * From OPENQUERY(Test, 'SELECT * FROM dbo.SalesContractHeader')

Where

@server: Is the label for the data-source that will be used to refer to it.

@srvproduct: Is the product name of the OLEDB data-source to be linked as the linked server. In case of SQL Server, you can either set it as ‘SQL Server’, in that case you can’t specify other parameters or you can leave it empty and set other parameters like catalog name and the IP address or host name of the SQL Server.

@provider: Is the name of the OLEDB provider for the data-source.

@datasrc: Is the name of the datasource as interpreted by the OLEDB provider. It can be either IP address of the server or its host name.

@catalog: Is the name of the catalog. In case of SQL Server, it is the name of the database to be linked to.

Executing sp_serveroption is optional. It is used to configure linked server. You can refer msdn for details. In our case, I enable ‘data access’ to linked server. The default behavior also allows data access to linked server. So doing this is not necessary.

Sp_addlinkedsvrlogin creates or updates a mapping between logins on the local instance of Microsoft SQL Server and remote logins on the linked server. For details refer msdn. The first parameter of this stored procedure is the local name of the remote server as specified in sp_addlinkedserver. The second parameter is @useself. A value of true specifies that SQL server authenticated logins use their own credentials to connect to remote server. In case the credentials are different then use false and specify locallogin, remote login and remote password after @useself parameter as 3^rd, 4^th and 5^th parameters.

Using this stored procedure is also optional as default behavior is to use SQL server authenticated logins’ credentials to connect to remote server.

Now after all things are properly configured, we come to actually executing a distributed query. Will it be different than executing ordinary select or DML queries? I don’t think so. There is only a slight difference. Use OPENQUERY to execute the specified pass through query on the given linked server and here we are done with executing a distributed query.

Points of Interest

Have you notice that it is not necessary that the linked server is always SQL Server. Yup you can use any OLEDB source as the data-source in distributed queries including SQL Server, Oracle, MS-Access, ODBC data-source, File system of Indexing Service, MS-Excel Sheets and IBM DB2 Databases. Wow what a long list and you can also develop your own data-source by implementing OLEDB interface. Thanks to Microsoft SQL we are able to perform operations on heterogeneous data stored in such diversified data-sources.

Paging in SQL

Many times, it is required to implement paging in SQL. Well you can find many ways to accomplish it but recently I came across to a very simple way of doind this.

Consider you want to find the records from 21-30 in order of their IDs, then you can do this using the folllowing query:

SELECT     TOP 10 *
FROM         (SELECT     TOP 30 *
                       FROM          tableName
                       ORDER BY ID)

ORDER BY ID DESC.

Yup it is a very easy and handy way to implement paging.