1
votes

I have a node server running on Google Cloud Run. Now I want to enable stackdriver tracing. When I run the service locally, I am able to get the traces in the GCP. However, when I run the service as Google Cloud Run, I am getting an an error:

"@google-cloud/trace-agent ERROR TraceWriter#publish: Received error with status code 403 while publishing traces to cloudtrace.googleapis.com: Error: The request is missing a valid API key." 

I made sure that the service account has tracing agent role.

First line in my app.js

require('@google-cloud/trace-agent').start();

running locally I am using .env file containing

GOOGLE_APPLICATION_CREDENTIALS=<path to credentials.json>

According to https://github.com/googleapis/cloud-trace-nodejs These values are auto-detected if the application is running on Google Cloud Platform so, I don't have this credentials on the gcp image

1
I'm not seeing anything obviously wrong. Perhaps augment your question with the description of how you setup your Cloud Run service and maybe provide a trivial sample that we could use to test/recreate?Kolban
What is the identity of your Cloud Run and what are the roles on this identity?guillaume blaquiere
@guillaumeblaquiere I am using default service account, but added cloud tracing agent role to it. When I was using my service this morning again, everything seems to work fine. So, I think I can close the issue, but I am just wondering, does it take time before a service uses new roles that are assigned to it's account?bluecitylights
It's most of time very quick, but it can take up to 5 minutes (maximum that I have observed). I never experiment more, but it's not impossible!guillaume blaquiere

1 Answers

1
votes

There are two challenges to using this library with Cloud Run:

  1. Despite the note about auto-detection, Cloud Run is an exception. It is not yet autodetected. This can be addressed for now with some explicit configuration.
  2. Because Cloud Run services only have resources until they respond to a request, queued up trace data may not be sent before CPU resources are withdrawn. This can be addressed for now by configuring the trace agent to flush ASAP
const tracer = require('@google-cloud/trace-agent').start({
  serviceContext: {
    service: process.env.K_SERVICE || "unknown-service",
    version: process.env.K_REVISION || "unknown-revision"
  },
  flushDelaySeconds: 1,
});

On a quick review I couldn't see how to trigger the trace flush, but the shorter timeout should help avoid some delays in seeing the trace data appear in Stackdriver.

EDIT: While nice in theory, in practice there's still significant race conditions with CPU withdrawal. Filed https://github.com/googleapis/cloud-trace-nodejs/issues/1161 to see if we can find a more consistent solution.