I've read this description of the OpenCL 2.x pipe API and leaded through the Pipe API pages at khronos.org. I felt kind of jealous, working in CUDA almost exclusively, of this nifty feature available only in OpenCL (and sorry that CUDA functionality has not been properly subsumed by OpenCL, but that's a different issue), so I thought I'd ask "How come CUDA doesn't have a pipe mechanism". But then I realized I don't even know what that would mean exactly. So, instead, I'll ask:
How do OpenCL pipes work on AMD discrete GPUs / APUs? ...
- What info gets written where?
- How does the scheduling of kernel workgroups to cores effected by the use of pipes?
- Do piped kernels get compiled together (say, their SPIR forms)?
- Does the use of pipes allow passing data between different kernels via the core-specific cache ("local memory" in OpenCL parlance, "shared memory" in CUDA parlance)? That would be awesome.
- Is there a way pipes are "supposed" to work on a GPU, generally? i.e. something the API authors envisioned or even put in writing?
- How do OpenCL pipes work in CPU-based OpenCL implementations?