This link says cuBLAS-XT routines provide out-of-core operation – the size of operand data is only limited by system memory size, not by GPU on-board memory size. This means that as long as input data can be stored on CPU memory and size of output is greater than GPU memory size we can use cuBLAS-XT functions, right?
On the other hand, this link says "In the case of very large problems, the cublasXt API offers the possibility to offload some of the computation to the Host CPU" and "Currenty, only the routine cublasXtgemm() supports this feature. Is this the case for problems that input size is greater than CPU memory size?
I don't get the difference between these two! I appreciate if someone helps me to understand the difference.