1
votes

I am trying to start an ECS task with 2 containers, app and xray as sidecar. Following several blog posts and examples, I am still facing the error that no XRAY segments are visible in the Xray AWS console. And obviously the error is, that the application cannot connect to the xray-sidecar to send the xray segments. The line in the app container log showing this: [Error] write udp 127.0.0.1:35963->127.0.0.1:2000: write: connection refused

Overall ECS service/task setup:

  • ecsTaskRole has the policy AWSXRayDaemonWriteAccess attached
  • network mode in task definition is bridge => according to instructions, then I have to set env variable AWS_XRAY_DAEMON_ADDRESS in my app container, as well as link it to xray-daemon container
  • using dynamic port mapping: 2000/tcp and 2000/udp in the xray container, and 9000/tcp in the app

The taskdefinition you'll find below. I also tried to avoid dynamic port mapping, but the error was the same.

What do I miss , so that I can see my xray segments withing the AWS XRay console ?

  • is the IAM AWSXRayDaemonWriteAccess for the ecsTaskRole not sufficient ?!?!
  • app container config missing something, so that it can connect to xray-daemon container via the env property AWS_XRAY_DAEMON_ADDRESS ?!?! I already tried to set it to different values, like 0.0.0.0:2000 , 127.0.0.1:2000 .... which one is the way to go ?

=== app container log ===

2020-06-06 14:36:36rate: 0.050000
2020-06-06 14:36:361591446996579912116 [Trace] SamplingStrategy decided: true
2020-06-06 14:36:361591446996579934272 [Trace] Closing segment named jukebox-front
2020-06-06 14:36:361591446996579987354 [Error] write udp 127.0.0.1:35963->127.0.0.1:2000: write: connection refused
2020-06-06 14:36:362020/06/06 12:36:36 ping requested, reponding with HTTP 200
2020-06-06 14:36:362020/06/06 12:36:36 ping requested, reponding with HTTP 200
2020-06-06 14:36:361591446996564675990 [Trace] Beginning segment named jukebox-front
2020-06-06 14:36:361591446996564705526 [Trace] Determining ShouldTrace decision for:

=== xray container log ===

2020-06-06 13:27:212020-06-06T11:27:21Z [Info] HTTP Proxy server using X-Ray Endpoint : https://xray.eu-central-1.amazonaws.com
2020-06-06 13:27:212020-06-06T11:27:21Z [Info] Starting proxy http server on 0.0.0.0:2000
2020-06-06 13:27:012020-06-06T11:27:01Z [Info] Using region: eu-central-1
2020-06-06 13:26:412020-06-06T11:26:41Z [Info] Using buffer memory limit of 19 MB
2020-06-06 13:26:412020-06-06T11:26:41Z [Info] 304 segment buffers allocated
2020-06-06 13:26:412020-06-06T11:26:41Z [Info] Initializing AWS X-Ray daemon 3.2.0

The ECS task definition looks like:

{
  "executionRoleArn": "arn:aws:iam::xxxxxxxx:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::xxxxxxxx:role/ecsTaskRole",
  "containerDefinitions": [
    {      
      "logConfiguration": {
        "logDriver": "awslogs",
        "secretOptions": null,
        "options": {
          "awslogs-group": "/ecs/td-jukebox",
          "awslogs-region": "eu-central-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "portMappings": [
        {
          "hostPort": 0,
          "protocol": "tcp",
          "containerPort": 9000
        }
      ],
      "environment": [
        {
          "name": "AWS_XRAY_DAEMON_ADDRESS",
          "value": "0.0.0.0:2000"
        },
        {
          "name": "PORT",
          "value": "9000"
        }
      ],
      "memoryReservation": 128,
      "image": "<<app-image>>:latest",
      "essential": true,
      "links": [
        "xray-daemon"
      ],
      "name": "jukebox"
    },
    {
      "dnsSearchDomains": null,
      "environmentFiles": null,
      "logConfiguration": {
        "logDriver": "awslogs",
        "secretOptions": null,
        "options": {
          "awslogs-group": "/ecs/td-jukebox",
          "awslogs-region": "eu-central-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "portMappings": [
        {
          "hostPort": 0,
          "protocol": "udp",
          "containerPort": 2000
        },
        {
          "hostPort": 0,
          "protocol": "tcp",
          "containerPort": 2000
        }
      ],
      "command": [
        "\"-t\"",
        "\"0.0.0.0:2000\""
      ],
      "cpu": 32,
      "memoryReservation": 256,
      "image": "amazon/aws-xray-daemon",
      "essential": true,
      "name": "xray-daemon"
    }
  ],
  "compatibilities": [
    "EC2"
  ],
  "family": "td-jukebox",
  "requiresCompatibilities": [
    "EC2"
  ],
  "networkMode": "bridge"
 }

Any help to resolve the write: connection refused error highly appreciated ;)

===========

Update

I edited the task definition and added additional parameters for the XRay daemon to startup, to increase the verbosity in log output (as suggested by @shariqmaws ):

-l,dev,-f,/var/log/xray.log,-o,-n,eu-central-1

After redeploying the service and exec'ing into the xray-daemon container, the /var/log/xray.log contains:

tail -f /var/log/xray.log
2020-06-06T16:33:07Z [Debug] ARN of the AWS resource running the daemon:
2020-06-06T16:33:07Z [Debug] No Metadata set for telemetry records
2020-06-06T16:33:07Z [Debug] Using Endpoint: https://xray.eu-central-1.amazonaws.com
2020-06-06T16:33:07Z [Debug] Telemetry initiated
2020-06-06T16:33:07Z [Info] HTTP Proxy server using X-Ray Endpoint : https://xray.eu-central-1.amazonaws.com
2020-06-06T16:33:07Z [Debug] Using Endpoint: https://xray.eu-central-1.amazonaws.com
2020-06-06T16:33:07Z [Debug] Batch size: 50
2020-06-06T16:33:07Z [Info] Starting proxy http server on 0.0.0.0:2000
2020-06-06T16:34:07Z [Debug] Skipped telemetry data as no segments found
2020-06-06T16:35:07Z [Debug] Skipped telemetry data as no segments found
2020-06-06T16:36:07Z [Debug] Skipped telemetry data as no segments found
2020-06-06T16:37:07Z [Debug] Skipped telemetry data as no segments found
2020-06-06T16:38:07Z [Debug] Skipped telemetry data as no segments found
2020-06-06T16:39:07Z [Debug] Skipped telemetry data as no segments found
2020-06-06T16:40:07Z [Debug] Skipped telemetry data as no segments found

hmmmm, unfortunately not really an error happening here... This leads me to the assumption that maybe the app-container does something weird to connect to the xray-daemon ?!?!

Within the app-container, the XRAY related env variables look good:

 env | grep XRAY
XRAY_DAEMON_PORT_2000_UDP_PROTO=udp
XRAY_DAEMON_PORT_2000_TCP=tcp://172.17.0.7:2000
XRAY_DAEMON_PORT_2000_UDP_ADDR=172.17.0.7
XRAY_DAEMON_ENV_AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/0e9e6ca8-c6d1-4f72-975e-aa3d99fdb64e
XRAY_DAEMON_ENV_ECS_CONTAINER_METADATA_URI_V4=http://169.254.170.2/v4/8a6cd6e9-bda7-4579-8415-471a435d14b4
XRAY_DAEMON_PORT_2000_UDP_PORT=2000
AWS_XRAY_DAEMON_ADDRESS=xray-daemon:2000
XRAY_DAEMON_ENV_ECS_CONTAINER_METADATA_URI=http://169.254.170.2/v3/8a6cd6e9-bda7-4579-8415-471a435d14b4
XRAY_DAEMON_PORT_2000_TCP_ADDR=172.17.0.7
XRAY_DAEMON_PORT_2000_UDP=udp://172.17.0.7:2000
XRAY_DAEMON_PORT_2000_TCP_PORT=2000
XRAY_DAEMON_PORT=tcp://172.17.0.7:2000
XRAY_DAEMON_ENV_AWS_EXECUTION_ENV=AWS_ECS_EC2
XRAY_DAEMON_NAME=/ecs-td-jukebox-13-jukebox-d0ddb483ebf899aee801/xray-daemon
XRAY_DAEMON_PORT_2000_TCP_PROTO=tcp

The IP 172.17.0.7 is the correct one, matching the IP of the xray-daemon container.

===========

Update 2

I was testing a plain echo command within the app-container to send data via UDP 2000 to the xray-daemon, by executing echo "test data" > /dev/udp/172.17.0.7/2000 ...and it worked...at least it was successfull to send it, but of course the format of the xray-segment was wrong.
Based on that I received in the xray-daemon logfile:

2020-06-06T18:53:45Z [Warn] Missing header or segment: test data
2020-06-06T18:54:07Z [Debug] Send 1 telemetry record(s)
2020-06-06T18:55:07Z [Debug] Send 1 telemetry record(s)

Althoug it says Send 1 telemetry record, in the AWSXray console nothing appears...still no data. And after waiting a couple of minutes in the xray-daemon logfile appears:

2020-06-06T19:03:07Z [Debug] Failed to send telemetry 1 record(s). Re-queue records. SerializationError: failed to unmarshal response error
        status code: 400, request id:
caused by: UnmarshalError: failed decoding error message
        00000000  3c 68 74 6d 6c 3e 0d 0a  3c 68 65 61 64 3e 3c 74  |<html>..<head><t|
00000010  69 74 6c 65 3e 34 30 30  20 42 61 64 20 52 65 71  |itle>400 Bad Req|
00000020  75 65 73 74 3c 2f 74 69  74 6c 65 3e 3c 2f 68 65  |uest</title></he|
00000030  61 64 3e 0d 0a 3c 62 6f  64 79 20 62 67 63 6f 6c  |ad>..<body bgcol|
00000040  6f 72 3d 22 77 68 69 74  65 22 3e 0d 0a 3c 63 65  |or="white">..<ce|
00000050  6e 74 65 72 3e 3c 68 31  3e 34 30 30 20 42 61 64  |nter><h1>400 Bad|
00000060  20 52 65 71 75 65 73 74  3c 2f 68 31 3e 3c 2f 63  | Request</h1></c|
00000070  65 6e 74 65 72 3e 0d 0a  3c 2f 62 6f 64 79 3e 0d  |enter>..</body>.|
00000080  0a 3c 2f 68 74 6d 6c 3e  0d 0a                    |.</html>..|

caused by: invalid character '<' looking for beginning of value
2020-06-06T19:04:07Z [Debug] Send 2 telemetry record(s)
2020-06-06T19:05:07Z [Debug] Send 1 telemetry record(s)
2020-06-06T19:06:07Z [Debug] Send 1 telemetry record(s)
2020-06-06T19:07:07Z [Debug] Send 1 telemetry record(s)

I guess, this is based on the string I sent, which if of course not a valid xray segment.

===========

Update 3

Excerpt of the go code, to verify the instrumentalisation

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "os"
    "strings"

    "github.com/aws/aws-xray-sdk-go/xray"
    "github.com/pkg/errors"
)

...

func getXRAYAppName() string {
    appName := os.Getenv("XRAY_APP_NAME")
    if appName != "" {
        return appName
    }
    return "front"
}


type pingHandler struct{}

func (h *pingHandler) ServeHTTP(writer http.ResponseWriter, request *http.Request) {
    log.Println("ping requested, responding with HTTP 200")
    writer.WriteHeader(http.StatusOK)
}

func main() {
    xraySegmentNamer := xray.NewFixedSegmentNamer(getXRAYAppName())
    http.Handle("/ping", xray.Handler(xraySegmentNamer, &pingHandler{}))
    log.Fatal(http.ListenAndServe(":"+getServerPort(), nil))
}

Any more hints to troubleshoot why my app cannot connect to the xray sidecar ?!?!

2

2 Answers

0
votes

Can you try changing:

  "environment": [
    {
      "name": "AWS_XRAY_DAEMON_ADDRESS",
      "value": "0.0.0.0:2000"
    },

to

  "environment": [
    {
      "name": "AWS_XRAY_DAEMON_ADDRESS",
      "value": "xray-daemon:2000"
    },
0
votes

many thanks @MartinLöper, @shariqmaws for your feedback. Finally, after some fiddling, the problem got solved. Most probably there was an error with the initial version of the image. I decided to increase the loglevel of the xray-daemon within the code to "debug". After recreating the docker image, pushing it to the registry and redeploying the ECS service I suddenly received only positive log messages.....and also the XRay console showed my service....hoooray.

To confirm, the following environment setting to talk to the xray-daemon is correct and works:

  "environment": [
    {
      "name": "AWS_XRAY_DAEMON_ADDRESS",
      "value": "xray-daemon:2000"
    },