CUDA arrays wrap NVIDIA's proprietary array layouts that are optimized for 2D and 3D locality. The translation from coordinates to an address is intentionally obfuscated from developers, since it may change from one architecture to the next. It looks like NVIDIA chose to wrap this translation differently from Kepler to Maxwell, with Kepler implementing a more "RISC-like" approach. The SASS disassembly of the surf2dmemset sample from the CUDA Handbook (https://github.com/ArchaeaSoftware/cudahandbook/blob/master/texturing/surf2Dmemset.cu) shows 6 instructions to write the output:
SUCLAMP PT, R8, R7, c[0x0][0x164], 0x0;
SUCLAMP.SD.R4 PT, R6, R6, c[0x0][0x15c], 0x0;
IMADSP.SD R9, R8, c[0x0][0x160], R6;
SUBFM P0, R8, R6, R8, R9;
SUEAU R9, R9, R8, c[0x0][0x154];
SUSTGA.B.32.TRAP.U8 [R8], c[0x0][0x158], R10, P0;
as compared to one for Maxwell:
SUST.D.BA.2D.TRAP [R2], R8, 0x55;
The "EA" in the Kepler instructions stands for "effective address," it's a more-complicated variant of the LEA (load effective address) instruction in CISC instruction sets.
As for SURED/SUATOM, those must be the surface equivalents to GRED/GATOM. Both perform atomic operations, but the ATOM variants return the previous value of the memory location and the RED variants do not. They don't need different intrinsics; the compiler emits the correct instruction automatically.