How to debug an Elixir application in production?

votes

This is not particularly about my current problem, but more like in general. Sometimes I have a problem that only happens in production configuration, and I'd like to debug it there. What is the best way to approach that in Elixir? Production runs without a graphical environment (docker).

In dev I can use IEX.pry, but since mix is unavailable in production, that does not seem to be an option.

For Erlang https://stackoverflow.com/a/21413344/1561489 mentions dbg and redbug, but even if they can be used, I would need help on applying them to Elixir code.

erlangelixirphoenix-framework

Use specific logs, you can always replace the default backend for logs - console - for something suitable for your needs. – PatNowak

Add Logger.error everywhere you want to debug, re-deploy and watch logs. I also want to hear about better approach. – denis.peplin

You can also try to identify exact differences between your production system and your debug system and eliminate them. As far as I know there's no such thing as a "debug" build in terms of BEAM code so the code should be the same between the two environments. – Onorio Catenacci

How are you running the app in production? – webdeb

3 Answers

votes

First, start a local node running iex on your dev machine using iex -S mix. If you don't want the application that's running locally to cause breakpoints to be activated, you need to disable the app from starting locally. To do this, you can simply comment out the application function in mix.exs or run iex -S mix run --no-start.

Next, you need to connect to the remote node running on docker from iex on your dev node using Node.connect(:"remote@hostname"). In order to do this, you have to make sure both the epmd and the node ports on the remote machine are reachable from your local node.

Finally, once your nodes are connected, from the local iex, run :debugger.start() which opens the debugger with the GUI. Now in the local iex, run :int.ni(<Module you want to debug>) and it will make the module visible to the debugger and you can go ahead and add breakpoints and start debugging.

You can find a tutorial with steps and screenshots here.

votes

In the case that you are running your production on AWS, then you should first and foremost leverage CloudWatch to your advantage. In your elixir code, configure your logger like this:

config :logger,
  handle_otp_reports: true,
  handle_sasl_reports: true,
  metadata: [:application, :module, :function, :file, :line]

config :logger,
  backends: [
    {LoggerFileBackend, :shared_error}
  ]

config :logger, :shared_error,
  path: "#{logging_dir}/verbose-error.log",
  level: :error

Inside your Dockerfile, configure an environment variable for where exactly erl_crash.dump gets written to, such as: ERL_CRASH_DUMP=/opt/log/erl_crash.dump

Then configure awslogs inside a .config file under .ebextensions as follows:

files:
  "/etc/awslogs/config/stdout.conf":
    mode: "000755"
    owner: root
    group: root
    content: |
      [erl_crash.dump]
      log_group_name=/aws/elasticbeanstalk/your_app/erl_crash.dump
      log_stream_name={instance_id}
      file=/var/log/erl_crash.dump

      [verbose-error.log]
      log_group_name=/aws/elasticbeanstalk/your_app/verbose-error.log
      log_stream_name={instance_id}
      file=/var/log/verbose-error.log

And ensure that you set a volume to your docker under Dockerrun.aws.json

  "Logging": "/var/log",
  "Volumes": [
    {
      "HostDirectory": "/var/log",
      "ContainerDirectory": "/opt/log"
    }
  ],

After that, you can inspect your error messages under CloudWatch. Now, if you are using ElasticBeanstalk(which my example above implicitly implies) with Docker deployment as opposed to AWS ECS, then the logs of std_input are redirected by default to /var/log/eb-docker/containers/eb-current-app/stdouterr.log inside CloudWatch.

The main purpose of erl_crash.dump is to at least know when your application crashed, thereby taking the container down. AWS EB will normally restart the container, thus keeping you ignorant about the restart. This understanding can also be obtained from other docker related logs, and you can configure alarms to listen for them and be notified accordingly when your docker had to restart. But another advantage of logging erl_crash.dump to CloudWatch is that if need be, you can always export it later to S3, download the file and import it inside :observer to do analysis of what went wrong.

If after consulting the logs, you still require a more intimate interaction with your production application, then you need to leverage remsh to your node. If you use distillery, you would configure the cookie and the node name of your production application with your release like this:

inside rel/confix.exs, set cookie:

environment :prod do
  set include_erts: false
  set include_src: false
  set cookie: :"my_cookie"
end

and under rel/templates/vm.args.eex you set variables:

-name <%= node_name %>
-setcookie <%= release.profile.cookie %>

and inside rel/config.exs, you set release like this:

release :my_app do
  set version: "0.1.0"

  set overlays: [
    {:template, "rel/templates/vm.args.eex", "releases/<%= release_version %>/vm.args"}
  ]

  set overlay_vars: [
    node_name: "[email protected]",
  ]

Then you can directly connect to your production node running inside docker by first ssh-ing inside the EC2-instance that houses the docker container, and run the following:

CONTAINER_ID=$(sudo docker ps --format '{{.ID}}')
sudo docker exec -it $CONTAINER_ID bash -c "iex --name [email protected] --cookie my_cookie"

Once inside, you can then try to poke around or if need be, at your own peril inject modified code dynamically of the module you would like to inspect. An easy way to do that would be to create a file inside the container and to invoke a Node.spawn_link target_node, fn Code.eval_file(file_name, path) end

In the case your production node is already running and you do not know the cookie, you can go inside your running container and do a ps aux > t.log and do a cat t.log to figure out what random cookie has been applied and use accordingly.

Docker serves as an impediment to the way epmd is able to communicate with other nodes. The best therefore would be to rather create your own AWS AMI image using Packer and do bare metal deployments instead.

Amazon has recently released a new feature to AWS ECS, AWS VPC Networking Mode, which perhaps may facilitate inter-container epmd communication and thus connecting to your node directly. I have not tried it out as yet, I may be wrong.

In the case that you are running on a provider other than AWS, then figuring out how to get easy access to your remote logs with some SSM agent or some other service is a must.

votes

I would recommend using some sort of exception handling tools, so far I am having great experiences on Sentry.