19
votes

I'm looking at adding StatsD data collection to my grails application and looking around at existing libraries and code has left me a little confused as to what would be a good scalable solution. To put the question into context a little I'm working on an online gaming type project where I will naturally be monitoring user interactions with the game engine, these will naturally cluster around particular moments in time where X users will be performing interactions within the window of a second or two, then repeating after a 10-20 second pause.

Here is my analysis of the options that are available today.

Etsy StatsD client example

https://github.com/etsy/statsd/blob/master/examples/StatsdClient.java

The "simplest thing that could possibly work" solution, I could pull this class into my project and instanciate a singleton instance as a spring bean and use it directly. However after noticing that the grails-statsd plugin creates a pool of client instances I started wondering about the scalability of this approach.

It seems that the doSend method could become a bottleneck if many threads are trying to send events at the same time, however as I understand it, due to the fire and forget nature of sending UDP packets, this should happen quickly, avoiding the huge overhead that we usually associate with network connections.

grails-statsd plugin

https://github.com/charliek/grails-statsd/

Someone has already created a StatsD plugin for grails that includes some nice features, such as the annotations and withTimer method. However I see that the implementation there is missing some bug fixes from the example implementation such as specifying the locale on calls to String.format. I'm also not a huge fan of pulling in apache commons-pool just for this, when a standard Executor could achieve a similar effect.

java-statsd-client

https://github.com/tim-group/java-statsd-client/

This is an alternative pure java library that operates asynchronously by maintaining its own ExecutorService. It supports the entire StatsD API, including sets and sampling, but doesn't provide any hooks for configuring the thread pool and queue size. In the case of problems, for non-critical things such as monitoring, I think I would prefer a finite queue and losing events than having an infinite queue that fills up my heap.

Play statsd plugin

https://github.com/vznet/play-statsd/

Now I can't use this code directly in my grails project but I thought it was worth a look to see how things were implemented. Generally I love the way the code in StatsdClient.scala is built up, very clean and readable. Also appears to have the locale bug, but otherwise feature complete with the etsy sample. Interestingly, unless there is some scala magic that I've not understood, this appears to create a new socket for each data point that is sent to StatsD. While this approach nicely avoids the necessity for an object pool or executor thread I can't imagine it's terribly efficient, potentially performing DNS lookups within the request thread that should be returning to the user as soon as possible.

The questions

  1. Judging by the fact that all the other implementations appear to have implemented another strategy for handling concurrency, can I assume that the Etsy example is a little too naïve for production use?
  2. Does my analysis here appear to be correct?
  3. What are other people using for statsd in java/groovy?

So far it looks like the best existing solution is the grails plugin as long as I can accept the commons-pool dependency, but right now I'm seriously considering spending Sunday writing my own version that combines the best parts of each implementation.

3

3 Answers

9
votes

Speaking as the primary committer of the java-statsd-client, as well as someone who uses this library in production, I'd like to attempt to allay your fears regarding "having an infinite queue that fills up my heap."

I think you pretty much nailed it with your analysis of the Etsy StatsD client example when you said "due to the fire and forget nature of sending UDP packets, this should happen quickly, avoiding the huge overhead that we usually associate with network connections."

It is my understanding that, the way that the java-statsd-client is currently implemented, the constraint for the build-up of a large queue of outbound messages is the speed of fire-and-forget UDP packet sending. I'm not an expert in this area, but I'm unaware of any way in which this could block such that an infinite queue might build up.

When you originally did your evaluation, there were a number of outstanding issues with the java-statsd-client (e.g. Locale/character encoding ambiguities, and a lack of sampling support), but these have recently been addressed. What remains is the question of whether there is a genuine risk of filling up the heap. I'd be keen to hear thoughts from the community on this matter, and, if the consensus is that there is an issue, I would be delighted to explore the introduction of a limiting queue into the library.

1
votes

After sleeping on this for a week I think I'm going to go ahead and use the existing grails StatsD plugin. The rationale for this being that although I could achieve a similar effect using an Executor for handling concurrency, without using an object pool this would still be bound to a single client/socket instance, in theory a rather obvious bottleneck in the application. Therefore if I need a pool anyway, I may as well use one where someone else has done all the hard work :)

0
votes

I came across StatsD over SLF4J during a similar search for a pure Java StatsD client and compared it to Java StatsD Client, which you mentioned had several issues. Just based on reading the source, I came up with this breakdown relating to the issues.

EDIT: the table below has been updated for version 3.0.1 of java-statsd-client in which many of the original issues have been addressed.

                          |   java-statsd-client   |   statsd-over-slf4j
——————————————————————————+————————————————————————+————————————————————
messages support sampling |          yes           |        yes
——————————————————————————+————————————————————————+————————————————————
actual sampling performed |   no, left to caller   | yes, using java.util.Random
——————————————————————————+————————————————————————+————————————————————
nonblocking impl worker   |  single daemon thread  | single daemon thread
——————————————————————————+————————————————————————+————————————————————
nonblocking impl queue    |       unbounded        | caller-specified bound
——————————————————————————+————————————————————————+————————————————————
String.format locale      |         none*          |     Locale.US
——————————————————————————+————————————————————————+————————————————————
charset for message bytes |        UTF-8**         | default, can be overridden

* no localisation is applied
** this is the charset that StatsD reads with