0
votes

Hopefully you guys can help with what I'm hoping is quite a simple question for those in the know!

I live (well, work) in SAS Enterprise Guide and am trying to perform a simple left join against a table in Teradata.

The table is extremely large (700+ columns, 1.1bn rows) and so far I have been connecting via a LIBNAME statement at the top of my program, followed by the usual PROC SQL to read the data.

The issue I am having is its is extremely slow. I performed the join successfully using 90 rows on the left table and it took 3 hours to complete. The real table I want to use has something like 15,000 rows.

I have tried to connect via the SQL Pass-Through method, but this throws a hosts file error, which I can't fix due to corporate security limitations.

Has anyone had any experience performing this kind of task?

I should mention that I can run a simple select * query in Teradata SQL Assistant is just over 1 minute (16,666,666 obs/s!) so the limitation must be somewhere between SAS/Teradata, or even SAS itself.

I'm sorry I haven't posted actual code snippets as they're on my work machine but this has been bugging me for ages so thought I'd see if I'm missing any tricks.

Thanks in advance for your help.

2
I should mention, I'm only calling the required 4 columns from teradata so its not because I'm calling ~700 unnecessary columns!iain
Are you allowed to create/ load tables on teradata? Most likley your problem is related to a full load of the table as the left join is done on the SAS side.Jetzler
Are both tables in Teradata? Or are you trying to join a SAS table with a Teradata table?Tom
No Tom, the left table is in SAS and the large right one in Teradataiain
Jetzler, you're probably right. Was just posing the question incase I was missing something obvious. The joys of 'big data'! I'll ask the sysadmins if I'm allowed to load anything on! Thanks for your helpiain

2 Answers

1
votes

So you're joining a SAS data set to a Teradata table and want to return the matching records. You'll want to use SAS's DBMASTER= data set option. It designates which of the tables is larger. By telling SAS this, it knows which table to move.

Here I assume librefs have already been assigned and that the Teradata table is larger--more obs--than the SAS data set.

proc sql threads; select tdTable.* from sastables.sasTable1, td.tdTable(dbmaster=yes) where tdTable.idNum=sasTable1.idNum; quit;

If by chance your SAS data set is larger, you'll want to use the MULTI_DATASRC_OPT= option. Either google these terms or look in the SAS/Access to Relational Databases manual. It's pretty good.

Good luck.

0
votes

Have you considered creating a volatile table in Teradata? Since this is created in your spool allocation you shouldn't need explicit permissions to create the table. Once created you can load the SAS data set into the Volatile table and collect statistics on the table's join columns and filter columns. This will help the optimizer understand the demographics about your "small" table. The volatile table will only persist for the duration of your session and is not accessible across multiple sessions.

Then rewrite your SAS code to push-down the SQL to Teradata joining the large table to your volatile table. The results can be returned to SAS and loaded into another data set.

CREATE VOLATILE TABLE MyTable, NO FALLBACK
( ColA SMALLINT NOT NULL,
  ColB VARCHAR(10) NOT NULL
) PRIMARY INDEX (ColA)
ON COMMIT PRESERVE ROWS /* This is important */
;

The primary index is how Teradata distributes the data and accesses the data. Tables distributed on the same column will join "AMP local" and will not require a redistribution. This is not always possible, as your primary index selection has to consider even distribution as well as access path. The primary index does not have to be unique, but can be.

Hope this helps.