I've also posted this question on runsubmit.com, a site outside the SE network for SAS-related questions.
At work there are 2 sas servers I use. When I transfer a sas dataset from one to the other via proc upload, it goes at about 2.5MB/s. However, if I map the drive on one server as a network drive and copy and paste the file across, it runs much faster, around 80MB/s (over the same gigabit connection).
Could anyone suggest what might be causing this and what I can do either to fix it or as a workaround?
There is also a third server I use that cannot map network drives on the other two- SAS is the only available means of transferring files from that one, so I need a SAS-based solution. Although individual transfers from this one run at 2.5MB/s, I've found that it's possible to have several transfers all going in parallel, each at 2.5MB/s.
Would SAS FTP via filenames and a data step be any faster than using proc upload? I might try that next, but I would prefer not to use this- we only have SAS 9.1.3, so SFTP isn't available.
Update - Further details:
- I'm connecting to a spawner, and I think it uses 'SAS proprietary encryption' (based on what I recall seeing in the logs).
- The uploads are Windows client -> Windows remote in the first case and Unix client -> Windows remote in the second case.
- The SAS datasets in question are compressed (i.e. by SAS, not some external compression utility).
- The transfer rate is similar when using proc upload to transfer external files (.bz2) in binary mode.
- All the servers have very fast disk arrays handled by enterprise-grade controllers (minimum 8 drives in RAID 10)
Potential solutions
- Parallel PROC UPLOAD - potentially fast enough, but extremely CPU-heavy
- PROC COPY - much faster than PROC UPLOAD, much less CPU overhead
- SAS FTP - not secure, unknown speed, unknown CPU overhead
Update - test results
- Parallel PROC UPLOAD: involves quite a lot of setup* and a lot of CPU, but works reasonably well.
- PROC COPY: exactly the same transfer rate per session as proc upload, and far more CPU time used.
- FTP: About 20x faster, minimal CPU (100MB/s vs. 2.5MB/s per parallel proc upload).
*I initially tried the following:
local session -> remote session on source server -> n remote sessions on destination server -> Recombine n pieces on destination server
Although this resulted in n simultaneous transfers, they each ran at 1/n of the original rate, probably due to a CPU bottleneck on the source server. To get it to work with n times the bandwidth of a single transfer, I had to set it up as:
local session -> n remote sessions on source server -> 1 remote session each on destination server -> Recombine n pieces on destination server
SAS FTP code
filename source ftp '\dir1\dir2'
host='servername'
binary dir
user="&username" pass="&password";
let work = %sysfunc(pathname(work));
filename target "&work";
data _null_;
infile source('dataset.sas7bdat') truncover;
input;
file target('dataset.sas7bdat');
put _infile_;
run;
proc upload
and started using FTP to transfer datasets. I just figured there's a lot of overhead associated with it much like using aproc
to duplicate a dataset. Doing it in SAS will take many times longer than just copying it via the OS. – Robert Penridge