3
votes

I am building an SSIS package to capture Analysis Services DMV data records into persisted SQL tables for analyzing user sessions.

Here is the process I am trying to create:

The Analysis Services DMV [DISCOVER_SESSIONS] on server 'Server-A' is used as the data source. On 'Server-B' the SSIS package [SSISDB/IsolatedPackages/SSASUsageStats.dtsx] writes to custom DB table [DBA].[dbo].[UsageStatsLogOLAP].

The source is extracted and transformed with Excel Powerquery, loaded to Excel, and transferred to SQL Server table from Excel. SSIS executes the Excel refresh and data transfer to SQL Server. After the SSIS package is complete an Excel dataset is loaded into SQL Server to add the new records. Excel sample dataset is demonstrated below.(NOTE: Excel Powerquery was chosen as the data extract ETL tool because of its simplicity to extract the 'CubeCommand' field [substring analysis], and the 'ADUserNameDisplay' field [Active Directory lookup from ‘SESSION_USER_NAME’]).

This enables the analysis of user OLAP sessions overtime. The SSIS package will be run as a SQL Agent job every 15 minutes to add new sessions to the custom SQL Server table.

Help needed:

Is there a means in SSIS to execute the PowerShell script for SSIS to refresh Excel connections? Also, keeping in mind that credentials to Analysis Services instance needed to authenticate the package? Is there a better method to refresh Excel (instead of PowerShell in SSIS)?

Some refresh options I have looked into...

 SSIS PowerShell script
 SSIS VB script
 SSIS C# script
 3-party SSIS software (CozyRoc, PowerPack, TaskFactory)  

According to the the Task Factory website, the Pro Version ($2,495 - Per Server) includes a component "Excel Power Refresh Task". Does anyone have any experience with this addin? I have not tried downloading this extension.

Please see below for the PoSH scripts used so far. The package is functional but not yet automated. I need help regarding the automation of the Excel refresh to complete the package. Insights are appreciated... Thanks!


Method: Power Shell script command:

The issue with this command is that it opens a second power shell window. If I run this in 2 separate scripts the Excel workbook does update successfully. If I run this as a single script it uses my user credentials (instead of the supplied user credentials)-- the Excel file is saved successfully, but the data data refresh does not succeed.

##### STEP 1 ######
# Run powershell as another user account (DOMAIN\?????), for accessing the GCOP039 OLAP DMV 
cd C:
# REM: Define domain username and password 
$username = 'DOMAIN\bl0040ep'
$password = '!mySecretPwd'
# REM: Convert to a single set of credentials
$securePassword = ConvertTo-SecureString $password -AsPlainText -Force
$credential = New-Object System.Management.Automation.PSCredential $username, $securePassword
# REM: Launch PowerShell (runas) as another user
Start-Process powershell.exe -Credential $credential #-WindowStyle Maximized


##### STEP 2 ######
# Refresh the excel workbook connections and save the updated file
$file = 'C:\SVN\BusinessAnalysts\ExcelTools\DatabaseSSAS_Usage Stats.xlsx'
$x1 = New-Object -ComObject Excel.Application
$x1.Visible = $false
$x1.DisplayAlerts = $False
$enddate = (Get-Date).tostring("dd-MM-yy")
$filename = 'C:\SVN\BusinessAnalysts\ExcelTools\DatabaseSSAS_Usage Stats ' + $enddate + '.xlsx'
$wb = $x1.workbooks.Open($file)
$wb.refreshall()
# REM: Use SLEEP to eliminate the message: "This will cancel a pending data refresh. Continue?"
Start-Sleep -Second 20
$wb.SaveAs($filename)
$wb.Close()
$x1.Quit()
Remove-Variable wb,x1

References:


Method: custom SSIS Extensions (CozyRoc, PowerPack, TaskFactory):

CozyRoc

During development, we attempted using a custom SSIS Extension (‘Excel Source Plus’ task from CozyRoc) to leverage the ‘Recalculation’ setting to refresh the Excel workbook. This did not work as advertised and when I contacted CozyRoc support staff they also verified, stating: “We do not support refresh of external data connections”.

Task Factory

According to the the Task Factory website, the Pro Version ($2,495 - Per Server) includes a component "Excel Power Refresh Task". Does anyone have any experience with this addin? I have not tried downloading this extension.

References:


Screenshots

ps-script_excel-refresh-2-windows.png

ps-script_excel-refresh-2-windows.png

SSISDB-IsolatedPackages-SSASUsageStats.dtsx.png

SSISDB-IsolatedPackages-SSASUsageStats.dtsx.png

exceldataset-DBA.dbo.UsageStatsLogOLAP.png

exceldataset-DBA.dbo.UsageStatsLogOLAP.png

1
I would like to add some more information to this requirement. I found an open source custom SSIS task on code plex SSIS Excel Refresh Task. I downloaded the files, but I am not sure how to install the custom task to. Anyone know? There are quite a few files that comes with the download, including a 'sourceCode' folder and 'Microsoft Excel Refresh Task.sln' Visual Studio Solution file. I found instructions from The Data Queen website, but I think I need more info.SherlockSpreadsheets

1 Answers

1
votes

This sounds like a painful process. If it was me, I'd skip the Excel + Powerquery bits in the SSIS package. Just load the data from the DMV views (with a timestamp) directly into tables. Preferably SQL Server 2016 temporal tables.

Then, set up an Excel spreadsheet outside of SSIS process, for the end users, that pulls data from temporal table queries. Due to the nature of the temporal table queries, the default segment/window on them will always be "latest" (however it's defined for the temporal table...).

You could then parameterize the time segment sent back to the temporal queries to capture different windows - set up an Excel list of "quanta" to send to the temporal query, the user picks a value from the list, which then triggers the data refreshes for that "quantum".

If you can use SQL Server 2016+, do some research into temporal tables and queries. It sounds like they would facilitate what it seems like you're trying to do here.

In the Excel workbook (.xlsm), you can then have it refresh the PowerQuery data upon opening of the workbook, which should always get it to be "latest" data.

Excel isn't usually a very good tool to put into the middle of an automated data analysis process. It is primarily a user end-point tool.

There are plenty of Powershell scripts to "drive" Excel (thru COM Interop) if you need to do it this way. This leaves the Powershell process invoked from SSIS in charge of tearing down the COM objects when the Powershell process ends.

Automating Excel thru COM Interop on a server runs the risk of things not exiting Excel completely, which will result in lots of Excel zombie processes sucking up resources. So that leaves setting up another process that periodically kills of Excel zombie processes... (and there are examples on the nets for how to do this, too) if you can't completely clean up [sic] after Excel.

If you have to create Excel files from a server process (which SSIS would be...), I've had success in the past using the EPPlus .Net DLL with Powershell scripts in SSIS. Powershell gets the data, then invokes EPPlus write data directly to Excel files (.xlsx) w/o requiring or interacting with Excel.exe. (reading works, too).

The "NPOI.dll" library is a .Net port of the Java "POI" library, which can also write to "Excel.Old" (.xls) files as well as .xlsx files. It works similarly to EPPlus.

The big advantage of either of these (which I suspect you noticed that Powerquery doesn't do to data either...) is that you can sidestep the "help" that Excel will apply to the data being loaded into the workbook. (e.g., with EPPlus you can push zip codes or US SSNs into cells as text values - you are in control of cell styles and display format. Excel will "help" you by converting these to numeric data, thus losing any leading 0's... trying to overcome this anti-help involves any number of different kludges).