2
votes

I am building a dashboard using InfluxDB. I have a source which generates approx. 2000 points per minute. Each point has 5 tags, 6 fields. There is only one measurement.

Everything works fine for about 24hrs but as the data size grows, I am not able to run any queries on influx. Like for example, right now I have approx 48hrs of data and even a basic select brings down the influx db,

select count(field1) from measurementname

It times out with the error:

ERR: Get http://localhost:8086/query?db=dbname&q=select+count%28field1%29+from+measuementname: EOF


Configuration:

  • InfluxDB version: 0.10.1 default configuration
  • The OS Version:Ubuntu 14.04.2 LTS
  • Configuration: 30GB RAM, 4 VCPUs, 150GB HDD

Some Background:

I have a dashboard and a web app querying the influxdb. The webapp lets a user query the DB based on tag1 or tag2.

Tags:

  • tag1 - unique for each record. Used in a where clause in the web app to get the record based on this field.
  • tag2 - unique for each record. Used in a where clause in the web app to get the record based on this field.
  • tag3 - used in group by. Think of it as departmentid tying a bunch of employees.
  • tag4 - used in group by. Think of it as departmentid tying a bunch of employees.
  • tag5 - used in group by. Values 0 or 1 or 2.
1

1 Answers

7
votes

Pasting answer from [email protected] mailing list: https://groups.google.com/d/msgid/influxdb/b4fb503e-18a5-4bd5-84b1-632dc4950747%40googlegroups.com?utm_medium=email&utm_source=footer

tag1 - unique for each record.
tag2 - unique for each record.

This is a poor schema. You are creating a new series for every record, which puts a punishing load on the database. Each series must be indexed, and the entire index currently must reside in RAM. I suspect you are running out of memory after 48 hours because of series cardinality, and the query is just the last straw, not the actual cause of the low RAM situation.

It is very bad practice to use a unique value in tags. You can still use fields in the WHERE clause, they just aren't as performant, and the damage to your system is much less than having a unique series for every point.

https://docs.influxdata.com/influxdb/v0.10/concepts/schema_and_data_layout/ https://docs.influxdata.com/influxdb/v0.10/guides/hardware_sizing/#when-do-i-need-more-ram