2
votes

We have a massive amount of sub-minute stock price tick data stored in a InfluxDB instance on a 32 GB (memory) server, with plenty of storage. Unfortunately we are having memory issues. The following tuning has been done:

cache_snapshot_memory_size         => 6553600,
cache_snapshot_write_cold_duration => '1m',
max_series_per_database            => 10000000,
cluster_write_timeout              => '10s',

The number of series is about 650000, and almost not growing.

Simplified, our schema currently stores bid and ask prices in a single measurement orderbook with (non-indexed) fields like bid, ask, bid_volume, ask_volume, etc., in addition to a few (indexed) tags. All have small cardinality except one, the ticker tag.

Would we see a lowered memory footprint if we had one orderbook measurement per ticker? orderbook.aapl, orderbook.googl, orderbook.abc, etc.

For the moment we have about 300 tickers, but this can grow to as much as 10000 in a few years.

When retrieving data we always use a filter on the ticker.

References:

1

1 Answers

2
votes

Answers from #influxdb at gophers.slack.com:

  • The method you are proposing there is non-performant. We HIGHLY advise the use of tags. That is the way the database assumes users will model their data. Adding metadata to measurements is an antipattern

  • splitting up the unique tags into unique measurements shouldn't help your memory consumption significantly