0
votes

I'm fairly new to compute shaders and I've just started an implementation of one for an Nbody simulation and I've come across a problem that I can't solve on my own.

Here's everything that is contained in the compute file and the entry point is ParticleComputeShader. I am only dispatching 1 thread and creating 1024 in the shader. There are only 1024 particles while I debug and tweak it so each thread has it's own particle to relate to.

The problem seems to be distance != 0.0f and the calculation related to the distance. Before I had the check in it was returning the position as 1.QNaN so it was dividing by 0 somewhere in the code. My thoughts on this is that I'm incorrectly accessing the StructuredBuffer using j and it's screwing up the next few calculations.

Another note: Position.w is the mass of the particle.

struct ConstantParticleData
{
    float4 position;
    float4 velocity;
};

struct ParticleData
{
    float4 position;
    float4 velocity;
};

namespace Constants
{
    float BIG_G = 6.674e-11f;
    float SOFTEN = 0.01f;
}

StructuredBuffer<ConstantParticleData> inputConstantParticleData : register( t0 ); 
RWStructuredBuffer<ParticleData> outputParticleData : register( u0 );


[numthreads(1024, 1, 1)]
void ParticleComputeShader( int3 dispatchThreadID : SV_DispatchThreadID )
{
    float3 acceleration = float3(0.0f, 0.0f, 0.0f);

    for(int j = 0; j < 1024; j++)
    {
        float3 r_ij;
        r_ij.x = inputConstantParticleData[j].position.x - inputConstantParticleData[dispatchThreadID.x].position.x;
        r_ij.y = inputConstantParticleData[j].position.y - inputConstantParticleData[dispatchThreadID.x].position.y;
        r_ij.z = inputConstantParticleData[j].position.z - inputConstantParticleData[dispatchThreadID.x].position.z;

        float distance = 0.0f;
        distance = length(r_ij);



        if(distance != 0.0f)
        {
            float bottomLine = pow(distance, 2) + pow(Constants::SOFTEN, 2);

            acceleration += Constants::BIG_G * ((inputConstantParticleData[j].position.w * r_ij) / 
                            pow(bottomLine, 1.5));
        }
    }

    acceleration = acceleration / inputConstantParticleData[dispatchThreadID.x].position.w;

    outputParticleData[dispatchThreadID.x].velocity = inputConstantParticleData[dispatchThreadID.x].velocity +
                                                      float4(acceleration.x, acceleration.y, acceleration.z, 0.0f);

    outputParticleData[dispatchThreadID.x].position = inputConstantParticleData[dispatchThreadID.x].position +
                                               float4(outputParticleData[dispatchThreadID.x].velocity.x,
                                                      outputParticleData[dispatchThreadID.x].velocity.y,
                                                      outputParticleData[dispatchThreadID.x].velocity.z,
                                                      0.0f);


}

Any help will be appreciated. The shader works for simple input -> output and only started to begin giving troubles when I tried to use more of the input buffer than inputConstantParticleData[dispatchThreadID.x] at any one time.

2
You said you're dispatching only one thread while debugging, but why is numthreads(1024,1,1) if that's the case?Adam Miles
I thought you could dispatch 1 and have that create the 1024 threads within the shader itself. I thought those values were independent. I probably misrepresented what I meant; I dispatch only 1 instance (1, 1, 1) but I have the shader run 1024 (1024, 1, 1) threads. If that's not how it works, I'd love to hear more but that's how I interpreted the MSDN documentation.DHudson
You're correct, though the correct terminology in the initial question would have been "I am only dispatching 1 thread group", rather than "1 thread". I understand now though.Adam Miles
Thanks for the information, Adam. Does the example code in the question seem to have an obvious point of fault or is that not enough information in the main question to garner a proper overview? Thanks again!DHudson
Have you tried adding a branch inside the loop to simply skip over the case where j == dispatchThreadID.x? That'll be the case where it calculates the distance between itself and itself and could perhaps result in a weird distance like negative 0.0 perhaps. If skipping the "don't compare against myself" case fixes it, then we can figure out a proper fix.Adam Miles

2 Answers

1
votes

You can't really set global variables in HLSL. The compiler allows them because in case you use the shader through FX, that will set the globals up for you through constant buffers. Glad to see you solved it, just wanted to post why having the float defined as a local variable fixed the issue.

0
votes

The problem with this code was that the Namespace variable Constants::BIG_G was not working or being used correctly. Moving this to inside of the function and just declaring it simply as float BIG_G fixed the problems I was having.