Pixel depth
When you don't have the kinect set up to detect players, it is a simply array of bytes, with two bytes representing a single depth measurement.
So, just like in a 16 bit color image, each sixteen bits represent a depth rather than a color.
If the array were for a hypothetical 2x2 pixel depth image, you might see: [0x12 0x34 0x56 0x78 0x91 0x23 0x45 0x67] which would represent the following four pixels:
AB
CD
A = 0x34 << 8 + 0x12
B = 0x78 << 8 + 0x56
C = 0x23 << 8 + 0x91
D = 0x67 << 8 + 0x45
The << 8
simply moves that byte into the upper 8 bits of a 16 bit number. It's the same as multiplying it by 256. The whole 16 bit numbers become 0x3412, 0x7856, 0x2391, 0x6745. You could instead do A = 0x34 * 256 + 0x12. In simpler terms, it's like saying I have 329 items and 456 thousands of items. If I have that total of items, I can multiply the 456 by 1,000, and add it to the 329 to get the total number of items. The kinect has broken the whole number up into two pieces, and you simply have to add them together. I could "shift" the 456 over to the left by 3 zero digits, which is the same as multiplying by 1,000. It would then be 456000. So the shift and the multiplication are the same thing for whole amounts of 10. In computers, whole amounts of 2 are the same - 8 bits is 256, so multiplying by 256 is the same as shifting left by 8.
And that would be your four pixel depth image - each resulting 16 bit number represents the depth at that pixel.
Player depth
When you select to show player data it becomes a little more interesting. The bottom three bits of the whole 16 bit number tell you the player that number is part of.
To simplify things, ignore the complicated method they use to get the remaining 13 bits of depth data, and just do the above, and steal the lower three bits:
A = 0x34 << 8 + 0x12
B = 0x78 << 8 + 0x56
C = 0x23 << 8 + 0x91
D = 0x67 << 8 + 0x45
Ap = A % 8
Bp = B % 8
Cp = C % 8
Dp = D % 8
A = A / 8
B = B / 8
C = C / 8
D = D / 8
Now the pixel A has player Ap and depth A. The %
gets the remainder of the division - so take A, divide it by 8, and the remainder is the player number. The result of the division is the depth, the remainder is the player, so A now contains the depth since we got rid of the player by A=A/8.
If you don't need player support, at least at the beginning of your development, skip this and just use the first method. If you do need player support, though, this is one of many ways to get it. There are faster methods, but the compiler usually turns the above division and remainder (modulus) operations into more efficient bitwise logic operations so you don't need to worry about it, generally.