Is the memory space of value inside Kernel device (global), shared, or local?
It is in the logical local space. Kernel parameters start out in a particular bank of __constant__ memory as part of the kernel launch process. However for most actual usage, the parameter will first be copied to a thread-local register, which is part of the logical local space. Even for SASS instructions that are not LD but can refer to the __constant__ memory, the usage is effectively local, per-thread, just like registers are local, per-thread.
If one thread modifies it, will that modification become visible to other threads?
Modifications in one thread will not be visible to other threads. If you modify it, the modification will be performed (first) on its value in a thread-local register.
Or is the variable located on the stack of each thread, as with variables defined inside the function?
The stack is in the logical local space for a thread, so I'm not sure what is the purpose of that question. A stack from one thread is not shared with another thread. The only way such a variable would show up on the stack in my experience is if it were used as part of a function call process (i.e. not the thread itself as it is initially spawned by the kernel launch process, but a function call originating from that thread).
Also, variables defined inside a function (e.g. local variables) do not necessarily show up on the stack either. This will mostly be a function of compiler decisions. They could be in registers, they could appear (e.g. due to a spill) in actual device memory (but still in the logical local space) or they could be in the stack, at some point, perhaps as part of a function call.
This should be mostly verifiable using the CUDA binary utilities.
valueis passed by value to each kernel, generally its stored in a register. If the kernel has too many registers in use, it may store in a form of global variable, but in any case, when passed by value, the value is local to each kernel. - Ander Biguri