Modern No-Code Cloud-First Enterprise Business Intelligence Solution

With all new shiny and ever changing systems/solutions in the market, isn’t it the right time to rethink about enterprise business intelligence solution? Is using the same traditional design still the best approach where

  • You have SSIS packages along with so many console applications extracting the data from the source systems, then manipulating it and storing in the staging database.
  • Another layer of SSIS packages/stored procedures will prepare the data and move them into datawarehouse.
  • Your rigid reporting tool will connect to Data warehouse and to the Multi-dimensional model.
  • You need a report developer creating new reports and your SSIS developer working on new packages for any change.

Some of the organization are considering moving to cloud thus avoiding the ever-increasing storage and computational power needed to retrieve the data from terabytes of storage. So how does the above design approach fit into cloud first design or is there a better solution?

What about all new online sources like social feeds, streams of data about the footfall of your consumers, surveys from survey monkey, interactions with your bot etc. Is it the best approach to fetch these using the same traditional packages, thus purchasing connectors and adapters to connect to these systems and implement what has already been done so many time, or can we utilize cloud bases services having all these connectors and logic available as part of the platform?

With all these in mind, we can go with cloud-only (Azure only) with minimal development efforts. Obviously design and implementation efforts for databases and SQL queries for data cleansing and preparation will still be needed.

Proposed Design

Data Warehouse

Azure Data Warehouse (ADW) is a very good contender with Massively Parallel Processing capabilities and options to elastically increase/decrease computational nodes.

Azure SQL database

Why do you need SQL database when you have data warehouse. You need landing and staging areas to clean, process and prepare the data before pushing it to data warehouse. ADW is not the right contender for data manipulation. It is best for querying. Secondly, conceptually, it does not make sense to use your data warehouse for landing and staging purposes.

ETL Technologies

Following are the ETL technologies that I propose:

  • Azure Data Factory (ADF)(preferred)
  • Microsoft Flow

Azure Data Factory (ADF)

With a growling list of supported data sources and ability to develop custom .Net activity or use SSIS (in v2), you can use it to connect any data source. The list of supported data sources is available here.

You will need to install data gateway within an on-premises server having access to all internal sources and to Azure SQL database to integrate on-premises sources.

ADF will also be used to move the data from staging SQL database to Azure Data Warehouse.

Microsoft Flow

If we can use ADF for all kind of sources, why do we need Microsoft Flow. Although the list of sources supported by ADF is extensive but still there are many common sources that are not so easy to connect with ADF and require you to write code. For example Excel files lying in your network share, OneDrive, SharePoint, SFTP or DropBox, or data within SharePoint online etc.

Microsoft Flow has a much more extensive list of connectors. What I like most about Microsoft Flow is it is event triggered. You will not need to pull the data from your budget file every month. It can be pushed as soon as updated file is available. Please note Microsoft Flow is not meant to move large volumes of data. It is good for small chunks.

If you need to pull your budgets from OneDrive/SFTP or need to pull Footfalls from a dropbox, then I will recommend Microsoft Flow. It will not require you to write code and you can even add approval/review steps.

Please note, you will need to install a gateway for Microsoft Flow within your network to connect to on-premises sources. Its gateway is different from ADF but same works for other services like PowerApps, Azure Analysis Services and PowerBI.

Data Sources

In the above design, I divided all sources into 4 broader categories:

  1. Internal Sources
  2. Azure/Office 365 (Azure EventHub, Azure Stream, Dynamics 365, SharePoint Online etc)
  3. Other public sources
  4. Data entered by users (PowerApps) (e.g. metadata, external factors like public holidays etc.)

Azure Analysis Services (AAS)

This will be your semantic layer on-top of ADW. It will provide necessary aggregations and performance that you need for your reporting.

AAS will be processed on regular intervals by Azure Automation Runbook.

Scheduler

Azure Data Factory and Microsoft Flow have its own schedulers. But we will not be using them to process AAS database. For this, you can use scheduler of Azure Automation for Azure Automation Runbooks.

Not recommended, but if you have to integrate with sources not supported by Azure Data Factory and Microsoft Flow and you don’t want to write any .Net code or develop SSIS package, you can utilize Azure Automation Runbooks. They support Python as well as PowerShell with an extensive list of modules.

PowerBI

PowerBI is my preferred choice for reporting with an extensive list of capabilities and ease of use. You can hook it with Azure Analysis Services with a live connection and then be creative with your reports. Being PowerBI report designer is as easy as being a Power User of Excel.

Last Words

I made the overall design very simple ūüôā but it is not as simple when you start implementing it. Let me know your thoughts and feedback.

Advertisements

Find Reporting Services (SSRS) Reports using Dynamics CRM (DB)

Recently we migrated a CRM 4.0 implementation to Dynamics CRM Online. As part of the migration, we revamped all scripts, integration and reports connecting to CRM.

Once the system was live, it was discovered that the DBA has created some reports that were hosted in an SSRS server that was not connected to CRM. There was no CRM local or shared data-source.

DBA decided to use the data from CRM within joins of some of his reports. So he added CRM as a linked DB to his SQL server and then developed queries within the reports that uses data from CRM as well as from his SQL DBs. These reports were not available in any of the lower environments or in source code repository. The only way to identify the reports consuming data from CRM, as per him, was to export reports from SSRS and open their definitions one-by-one, which was cumbersome, for around 60 reports hosted on the SSRS server.

Fortunately, with the help of some forums, I was able to come up with the following query to list all SQL queries within all reports reading data from CRM DB. This query will list reports and the SQL statements within these reports consuming data from CRM DB:

Declare @Namespace NVARCHAR(500)

Declare @SQL   VARCHAR(max)

SELECT @Namespace= SUBSTRING(

x.CatContent

,x.CIndex

,CHARINDEX(‘”‘,x.CatContent,x.CIndex+7) – x.CIndex

)

FROM

(

SELECT CatContent = CONVERT(NVARCHAR(MAX),CONVERT(XML,CONVERT(VARBINARY(MAX),C.Content)))

,CIndex¬†¬†¬† = CHARINDEX(‘xmlns=”‘,CONVERT(NVARCHAR(MAX),CONVERT(XML,CONVERT(VARBINARY(MAX),C.Content))))

FROM Reportserver.dbo.Catalog C

WHERE C.Content is not null

AND C.Type  = 2

) X

 

SELECT @Namespace = REPLACE(@Namespace,’xmlns=”‘,”) + ” ;

 

SELECT @SQL = ‘WITH XMLNAMESPACES ( DEFAULT ”’ + @Namespace + ”’, ”http://schemas.microsoft.com/SQLServer/reporting/reportdesigner” AS rd )

SELECT  ReportName        = name

,DataSetName¬†¬†¬†¬†¬†¬†¬† = x.value(”(@Name)[1]”, ”VARCHAR(250)”)

,DataSourceName¬†¬†¬† = x.value(”(Query/DataSourceName)[1]”,”VARCHAR(250)”)

,CommandText¬†¬†¬†¬†¬†¬†¬† = x.value(”(Query/CommandText)[1]”,”VARCHAR(500)”)

,Fields¬†¬†¬†¬†¬†¬†¬†¬†¬†¬†¬† = df.value(”(@Name)[1]”,”VARCHAR(250)”)

,DataField¬†¬†¬†¬†¬†¬†¬† = df.value(”(DataField)[1]”,”VARCHAR(250)”)

,DataType¬†¬†¬†¬†¬†¬†¬† = df.value(”(rd:TypeName)[1]”,”VARCHAR(250)”)

FROM (  SELECT C.Name,CONVERT(XML,CONVERT(VARBINARY(MAX),C.Content)) AS reportXML

FROM  ReportServer.dbo.Catalog C

WHERE  C.Content is not null

AND  C.Type = 2

) a

CROSS APPLY reportXML.nodes(”/Report/DataSets/DataSet”) r ( x )

CROSS APPLY x.nodes(”Fields/Field”) f(df)

Where x.value(”(Query/CommandText)[1]”,”VARCHAR(500)”) LIKE ”%_MSCRM%”

ORDER BY name ‘

EXEC(@SQL)

Dynamics CRM Online – Azure Service Bus – Bad Gateway Error

After a gap of 4 years, I have finally convinced myself to start blogging again. There is so much to blog. I will start with a simple problem that I faced last year while integrating Dynamics CRM Online with on-premises microservices.

If you are wondering, how a cloud based Dynamics CRM can send message to on-premises API, then have a look at this MSDN article.

Once the Azure service bus was configured, service bus listener was implemented and functional, custom workflow activity was registered within CRM Online, and service endpoint was configured, data started flowing to the back-end services. Great! At this time, our developer got stuck with a “50200: Bad Gateway”¬†error within the custom workflow activity which was supposed to send data to on-premises service via Azure Service Bus.

When I started troubleshooting, I noticed that the data flow is fine for all except Account entity. Account entity, in our case, has a number of additional attributes. This entity was the largest in our case. All other custom and out-of-the-box entities were fine.

While googling I came to this article, which proposed to check system proxy that makes sense. But in my case, there was no system proxy on dev box, so it was not the proxy issue. I felt it has something to do with the message size as account entity has around 180 attributes with most of them filled.

Interestingly this guess proved to be correct despite a confusing and misleading error message. It turned out that the message size for the account entity from the CRM was bigger than the default WCF message size. Increasing the received message size for the Service Bus listener by adding maxReceivedMessageSize in the config file of WCF with a higher value than the default, fixed the issue:

<bindings>
  <ws2007HttpRelayBinding>
    <binding name="default" maxReceivedMessageSize="2147483647">
    </binding>
  </ws2007HttpRelayBinding>
</bindings>

Hoping this will save time for the others and will be helpful.

SQL Server 2008 Hidden Gems – Audit Trail Feature

Let’s say you have a requirement to record audit trail of CRUD operations. You are using SQL Server 2008 with either Entity framework or WCF Data Services. What will you do? First you will create a database design for recording audit trail, then you will either implement triggers or interceptors to record all activities. You may want to keep it simple and may spend 4-5 days to develop a simple and efficient audit trail recording module and feel proud like myself :). Wait a minute…. you should not feel proud as you have tried to develop something which is available in SQL Server 2008 Out-of-box waiting for you to just enable it and is much more efficient and detailed then what you have developed.

How???? Have you ever noticed that when you expand the security node in SSMS, there is a folder “Audit”. What is its purpose? It is a SQL Server Audit Trail feature that no one explores and then starts implementing their own ones.

Configuring it as simple as running 2 SQL commands. Below snapshop shows you how to enable audit trail for all CRUD operations by members of an AD Group ‘TestG’ on Production.Products table of AdventureWorks:

Use [master]
Go
Create Server Audit TestAudit
To Application_Log
With (QUEUE_DELAY = 1000, ON_FAILURE = Continue, State=ON);
Go

Use [AdventureWorks]
Go
Create Database Audit Specification ProductsListAccess
For Server Audit TestAudit
ADD (Select, Insert, Update, Delete ON Production.Product by TestG)
With (State = ON);
Go

In above example, we are logging activities in Application Log (Event Viewer). You can also log audit trail in a log file and then use fn_get_audit_file function of SQL Server to read contents e.g.

SELECT * FROM fn_get_audit_file('E:\SqlAudits\*', default, default)

In the example above, we are reading all audit trails stored in E:\SQLAudits. Now what is logged and in what format? Your CRUD operation will be logged in the data field in key value format where each key is separated by a space as shown below:

As shown above, entire TSQL statement is logged as part of the audit trail. Key for the SQL statement is “statement“. You can check audit logs by expanding Server -> Security -> Audits and then selecting “View Audit Logs” from context menu of your Server Audit that you have created using above SQL statements.

No need to write code any more to implement audit trail. Just configure it as you want including events that you want to log, obects which you want to monitor, and storage of these log files.

Finally to take most out of SQL, I will recommend to use Kerberos to pass user login to SQL, then play around with schemas to define permissions on objects and if possible have SQL controlling your data security for easier reporting.

Thanks everyone for implementing all sorts of Audit Trail modules when this is there for you to use….:)

How to configure SQL Server Performance Condition Alert for Log File Size

  1. In SQL Server Management Studio, below SQL Server Agent, right-click Alerts and select New Alert.
  2. Give your alert a name e.g. “Low Log Space Alert”, and from the Type drop-down list, select “SQL Server Performance Condition Alert”.
  3. From the Object drop-down list, select the SQLServer:Databases object.
  4. From the Counter drop-down list, select Percent Log Used.
  5. Select the your database from the Instance drop-down list, and
  6. Set the alert for when the counter rises above 90 by selecting Rises Above from the Alert If Counter drop-down list and entering 90 into the Value text box.
  7. Select the Response page, select the Notify Operators check box, and select the check boxes for the notification options for your operator.
  8. Select the Options page and select the E-mail check box to include the alert error text. Click OK.

Stored Procedure to Reindex Database

There are always many ways of reindexing all tables within a database but what about if you want to make it part of your maintenance plan. It is just configuration in Maintenance Plan wizard but what about if SSIS is not installed or configured. You can achieve same using SQL Agent based jobs and use following SP to reindex entire database:

CREATE PROCEDURE dbo.asp_reindex @database SYSNAME, @fragpercent INT
AS
DECLARE @cmd NVARCHAR(max),
@table SYSNAME,
@schema SYSNAME
–Using a cursor for demonstration purposes.
–Could also do this with a table variable and a WHILE loop
DECLARE curtable CURSOR FOR
SELECT DISTINCT OBJECT_SCHEMA_NAME(object_id, database_id) SchemaName,
OBJECT_NAME(object_id,database_id) TableName
FROM sys.dm_db_index_physical_stats (DB_ID(@database),NULL,NULL,NULL,’SAMPLED’)
WHERE avg_fragmentation_in_percent >= @fragpercent
FOR READ ONLY
OPEN curtable
FETCH curtable INTO @schema, @table
WHILE @@FETCH_STATUS = 0
BEGIN
SET @cmd = ‘ALTER INDEX ALL ON ‘ + @database + ‘.’ + @schema + ‘.’ + @table
+ ‘ REBUILD WITH (ONLINE = ON)’
–Try ONLINE build first, if failure, change to OFFLINE build.
BEGIN TRY
EXEC sp_executesql @cmd
END TRY
BEGIN CATCH
BEGIN
SET @cmd = ‘ALTER INDEX ALL ON ‘ + @database + ‘.’ + @schema + ‘.’
+ @table + ‘ REBUILD WITH (ONLINE = OFF)’
EXEC sp_executesql @cmd
END
END CATCH
FETCH curtable INTO @schema, @table
END
CLOSE curtable
DEALLOCATE curtable
GO

Find all Indexes for a database

If you want to list all indexes within a database excluding primary key and unique indexes, then use following query

SELECT ind.name, ind.index_id, ic.index_column_id, col.name, ind.* , ic.*, col.*
FROM sys.indexes ind

INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id

INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id

INNER JOIN sys.tables t ON ind.object_id = t.object_id

WHERE ind.is_primary_key = 0
AND ind.is_unique = 0
AND ind.is_unique_constraint = 0
AND t.is_ms_shipped = 0
ORDER BY t.name, ind.name, ind.index_id, ic.index_column_id

To List all primary key indexes, you can use following query:

SELECT ind.name, ind.index_id, ic.index_column_id, col.name, ind.* , ic.*, col.*
FROM sys.indexes ind

INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id

INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id

INNER JOIN sys.tables t ON ind.object_id = t.object_id

WHERE ind.is_primary_key = 1
ORDER BY t.name, ind.name, ind.index_id, ic.index_column_id

To list all clustered indexes, you can use following query:

SELECT ind.name, ind.index_id, ic.index_column_id, col.name, ind.* , ic.*, col.*
FROM sys.indexes ind

INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id

INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id

INNER JOIN sys.tables t ON ind.object_id = t.object_id

WHERE ind.type = 1
ORDER BY t.name, ind.name, ind.index_id, ic.index_column_id

To list all non-clustered indexes, use following query:

SELECT ind.name, ind.index_id, ic.index_column_id, col.name, ind.* , ic.*, col.*
FROM sys.indexes ind

INNER JOIN sys.index_columns ic ON ind.object_id = ic.object_id and ind.index_id = ic.index_id

INNER JOIN sys.columns col ON ic.object_id = col.object_id and ic.column_id = col.column_id

INNER JOIN sys.tables t ON ind.object_id = t.object_id

WHERE ind.type = 2
ORDER BY t.name, ind.name, ind.index_id, ic.index_column_id

Hint: Don’t forget to set NEWSEQUENTIALID() as the default expression for your primary key column if you are planning to set uniqueidentifier as its datatype to reduce fragmentation of data. Otherwise it is better to use either IDENTITY column or creation datetime as the clustered index for your table