1
votes

I have been trying to pull data through a linked server in SSMS from an InterSystem Cache Database for a while, this is to enable us to join the data to other source systems in our Data Warehouse.

I have set up an ODBC connection and a linked server to the database and can execute queries through OPENQUERY in Management studio, but the data is huge (> 100million rows). So when I execute a SQL query with a WHERE clause the query just spins. It is like there are no indexes or PK's and the query is completing massive table scans.

If I SELECT TOP 100 with no where clause the query returns data, so I know that all the connections are correct. Any guidance would be greatly appreciated!

I have also tried using SSIS using a data flow from the ODBC connection to an OLE DB destination, but i cancelled the package after seven hours.

Example of the query below:

SELECT * 

FROM OPENQUERY (InterSystem_Cache ,
    'SELECT     pt.Column1,
                pt.Column2,
                pt.Column3,
                pt.Column4,
                pt.Column5,
                pt.Column6,
                pt.Column7,
                pt.Column8,
                pt.Column9,
                pt.Column10,
                pt.Column11,
                tr.Column12,
                tr.Column13,
                tr.Column14,
                te.Column15,
                te.Column16,
                te.Column17,
                te.Column18,
                te.Column19,
                te.Column20,
                rs.Column21,
                rs.Column22,
                rs.Column23,
                re.Column24,
                re.Column25,
                re.Column26,
                tr.Column27,
                tr.Column28,
                re.Column29
        FROM Database1.Table1 tr
            LEFT JOIN Database1.Table2 te on te.Column16 = tr.Column13
            LEFT JOIN Database1.Table3 rs on rs.Column23 = tr.Column28
            LEFT JOIN Database1.Table4 re on re.Column25 = rs.Column22
            LEFT JOIN Database1.Table5 pt on pt.Column6 = re.Column26
        WHERE tr.Column12 = ''2018-10-30'' ')
1
Take a look at the query execution on the second server. It looks for me that you have to use "INNER JOIN" instead of "LEFT". Also, look for waits on both servers to see what your actual problem is. - Slava Murygin
Unfortunately I can't access the cache server, I only have access via an ODBC connection so I am unable to look at the execution plan on the source server. I am wondering if the Cache Database is unable to easily read TSQL or it is just a case of really bad indexing. - shawkins1
We faced the same issue. We still extract data using openquery. I figured out this was Indexing in our case on the base server. - user9192401
I cannot allow the query to run for 7 hours, looks like I might have to ask for a daily flat file shudder! - shawkins1
I've run into this issue with Midas Plus (or whatever they are calling themselves now). Typically there is an index 'table' that you have to join to for optimization. It isn't always obvious or exposed by the vendor, but they should be able to provide you direction. Some vendors will create a customer specific 'index' in your environment for use. You just have to ask. - RollTide

1 Answers

0
votes

Well this looks like an indexing issue and if you don't have an access to the Cache system you will not be able to solve it. If you know that data comes into Cache in chronological order for tr.Column12, or if you are happy with having query results just once per row, you can save max tr.%ID value of last run and add tr.%ID>yourLastImported%ID into WHERE clause since %ID is usually incremental (however specific application can override it). Another thing to try is to run the same query without all JOINs on tr only and see if it improves query time -- there are some optimisations you can do if the problem is actually with accessing child tables.