3
votes

Writing a Compute Shader to be used in Unity 4. I'm attempting to get 3d noise.

The goal is to get a multidiminsional float3 array into my compute shader from my C# code. Is this possible in a straightforward manner (using some kind of declaration) or can it only be achieved using Texture3D objects?

I currently have an implementation of simplex noise working on individual float3 points, outputting a single float -1 to 1. I ported the code found here for the compute shader.

I would like to extend this to work on a 3D array of float3's (I suppose the closest comparison in C# would be Vector3[,,]) by applying the noise operation to each float3 point in the array.

I've tried a few other things, but they feel bizarre and completely miss the point of using a parallel approach. The above is what I imagine it should look like.

I also managed to get Scrawk's Implemenation working as Vertex Shaders. Scrawk got a 3D float4 array into the shader using a Texture3D. But I wasn't able to extract the floats from the texture. Is that how Compute Shaders work as well? Relying on Textures? I probably have overlooked something concerning getting the values out of the Texture. This seems to be how this user was getting data in in this post. Similar question to mine, but not quite what I'm looking for.

New to shaders in general, and I feel like I'm missing something pretty fundamental about Compute Shaders and how they work. The goal is to (as I'm sure you've guessed) get noise generation and mesh computation using marching cubes onto the GPU using Compute Shaders (or whatever shader is best suited to this kind of work).

Constraints are the Free Trial Edition of Unity 4.

Here's a skeleton of the C# code I'm using:

    int volumeSize = 16; 
    compute.SetInt ("simplexSeed", 10); 

    // This will be a float[,,] array with our density values. 
    ComputeBuffer output = new ComputeBuffer (/*s ize goes here, no idea */, 16);
    compute.SetBuffer (compute.FindKernel ("CSMain"), "Output", output);  

    // Buffer filled with float3[,,] equivalent, what ever that is in C#. Also what is 'Stride'? 
    // Haven't found anything exactly clear. I think it's the size of basic datatype we're using in the buffer?
    ComputeBuffer voxelPositions = new ComputeBuffer (/* size goes here, no idea */, 16); 
    compute.SetBuffer (compute.FindKernel ("CSMain"), "VoxelPos", voxelPositions);    


    compute.Dispatch(0,16,16,16);
    float[,,] res = new float[volumeSize, volumeSize, volumeSize];

    output.GetData(res); // <=== populated with float density values

    MarchingCubes.DoStuff(res); // <=== The goal (Obviously not implemented yet)

And here's the Compute Shader

#pragma kernel CSMain

uniform int simplexSeed;
RWStructuredBuffer<float3[,,]> VoxelPos;  // I know these won't work, but it's what I'm trying
RWStructuredBuffer<float[,,]> Output;     // to get in there. 

float simplexNoise(float3 input)
{
    /* ... A bunch of awesome stuff the pastebin guy did ...*/

    return noise;
}

/** A bunch of other awesome stuff to support the simplexNoise function **/
/* .... */

/* Here's the entry point, with my (supposedly) supplied input kicking things off */
[numthreads(16,16,16)] // <== Not sure if this thread count is correct? 
void CSMain (uint3 id : SV_DispatchThreadID)
{
    Output[id.xyz] = simplexNoise(VoxelPos.xyz); // Where the action starts.     
}
2

2 Answers

1
votes

Using a 1D buffer, index into it as though 3D, by special indexing, on CPU & GPU.

There are only 1-dimensional buffers in HLSL. Use a function / formula to convert from an N-dimensional (say 3D or 2D) index vector to a 1D vector which you use to index into your 1D array.

If we have a 3D array indexed [z][y][x] (see footnote #1 for why), and created an array[Z_MAX][Y_MAX][X_MAX], we can turn [z][y][x] this into a linear index [i].

Here's how its done...

Imagine a block you have cut into slices from top to bottom (so it piles up like a stack of coins), where xy is each layer / slice, running up along z which is the vertical axis. Now for every increment in z (upwards) we know we have x (width) * y (height) elements already accounted for. Now to that total, we need to add how much we have walked in the current 2D slice: for every step in y (which counts elements in rows going from left to right) we know we have x (width) elements already accounted for, so add that to the total. Then to that we had the number of steps within the current row, which is x, add this to total. You now have a 1D index.

i = z * (Y_MAX * X_MAX) + y * (X_MAX) + x; //you may need to use "*_MAX - 1" instead

Footnote #1 I don't use unity's coordinate system here because it is easier to explain by swapping y and z. In this case, [z][y][x] indexing prevents jumps all over memory ; see this article. Unity would swap [z][y][x] for [y][z][x] (to operate primarily on slices laid out this same way.)

Footnote #2 This principle is exactly what uint3 id : SV_DispatchThreadID does as compared with uint3 threadID : SV_GroupThreadID and uint3 groupID : SV_GroupID. See the docs:

SV_DispatchThreadID is the sum of SV_GroupID * numthreads and GroupThreadID.

...So use this instead where possible, given the structure of your program.

Footnote #3 This is the same way that N-dimensional indexing is achieved in C, under the hood.

0
votes

Typically you would use noise to generate something like a heightmap ... is that your intention here? It looks to me like you are generating a value for every point in the array.

I have an image in my head here of you taking a chunk from a voxel engine (16 x 16 x 16 voxels) and generating noise values for all points.

Whereas what I thing you should be doing is making this a 2d problem. Some seudo CPU code might look something like this ...

for(x)
  for(z)
    fill all voxels below ( GenerateY(x,z) )

Based my assumptions being correct I would say you might have your shader wrong for example ...

This will try to run 16 x 16 x 16 threads which is well above the 1024 thread limit for a group, you can have unlimited groups but each group can have no more than 1024 threads.

[numthreads(16,16,16)] // <== Not sure if this thread count is correct? 

What i think you need is something more like [numthreads(16,1,16)] to run the noise function on a 16 x 16 grid of points and raise each point up by noise x maxHeight amount to give you the point you want.

Your dispatch call would look something like this ...

compute.Dispatch(0,1,0,0);

... this would result in a single thread group producing height map values for 16 x 16 points. Once you get that far you can scale up.

All this combined with your mention of marching cubes suggests you are doing exactly the same thing I am, building a voxel engine on the GPU where the raw voxel data is generated in GPU ram then a mesh generated from it.

I have this part of the process cracked, the hard part is the next stage, generating a mesh / scene object from the resulting voxel array. Depending on your approach you'll probably want to really comfortable with ray marching or AppendBuffers next.

Good luck!

Flat buffer usage:

Lets say i want array of 128*128*128 voxels and a chunk is 32*32*32 voxels then i do this ...

//cpu code 
var size = 128*128*128;
var stride = sizeof(float);
ComputeBuffer output = new ComputeBuffer (size, stride);
computeShader.SetBuffer (0, "voxels", output);
computeshader.Dispatch(0, 4,4,4);

//gpu code
#pragma kernel compute
RWStructuredBuffer<float> voxels;

[numthreads(32,1,32)] // group is your chunk index, thread is you voxel within the chunk
void compute (uint3 threadId : SV_GroupThreadID, uint3 groupId : SV_GroupID)
{
    uint3 threadIndex =  groupId * uint3(32, 1, 32) + threadId;
   //TODO: implement any marching cubes / dual contouring functions in
   //      here somewhere
   uint3 endIndex = uint(32, 0, 32) + threadIndex;

   float height = Noise();
   int voxelPos = voxPos.x+ voxPos.y*size+voxPos.z*size*size;

   // chunks are 32 * 32 blocks of columns the whole height of the volume
   for(int y = threadIndex.y; y < endIndex.y; y++)
   {
      if(y < height)
      {
         voxels[voxelPos] = 1; // fill this voxel
      }
          else
          {
                 voxels[voxelPos] = 0; // dont fill this voxel
          }
   }

This should produce (although this is all from the ram in my head so it might not be spot on) a 128*128*128 voxel array in a buffer on the GPU that contains something "terrain like".

I guess you can take it from there to do what you need, you could prob drop the "if" in the compute shader if your noise function was passed the xyz values from threadIndex (the voxel position).

Let me know if you find a neat way of cracking this, it's something i'm still working on myself.

My code works (well almost) something like this ...

component start ... call compute to gen voxel buffer. call compute to gen vertex buffer from voxelbuffer.

draw (each frame) ... render vertex buffer with material