Can SQLite handle 90 million records?

Question

Or should I use a different hammer to fix this problem.

I've got a very simple use-case for storing data, effectively a sparse matrix, which I've attempted to store in a SQLite database. I've created a table:

create TABLE data ( id1 INTEGER KEY, timet INTEGER KEY, value REAL )

into which I insert a lot of data, (800 elements every 10 minutes, 45 times a day), most days of the year. The tuple of (id1,timet) will always be unique.

The timet value is seconds since the epoch, and will always be increasing. The id1 is, for all practical purposes, a random integer. There is probably only 20000 unique ids though.

I'd then like to access all values where id1==someid or access all elements where timet==sometime. On my tests using the latest SQLite via the C interface on Linux, a lookup for one of these (or any variant of this lookup) takes approximately 30 seconds, which is not fast enough for my use case.

I tried defining an index for the database, but this slowed down insertion to completely unworkable speeds (I might have done this incorrectly though...)

The table above leads to very slow access for any data. My question is:

Is SQLite completely the wrong tool for this?
Can I define indices to speed things up significantly?
Should I be using something like HDF5 instead of SQL for this?

Please excuse my very basic understanding of SQL!

Thanks

I include a code sample that shows how the insertion speed slows to a crawl when using indices. With the 'create index' statements in place, the code takes 19 minutes to complete. Without that, it runs in 18 seconds.

#include <iostream>
#include <sqlite3.h>

void checkdbres( int res, int expected, const std::string msg ) 
{
  if (res != expected) { std::cerr << msg << std::endl; exit(1); } 
}

int main(int argc, char **argv)
{
  const size_t nRecords = 800*45*30;

  sqlite3      *dbhandle = NULL;
  sqlite3_stmt *pStmt = NULL;
  char statement[512];

  checkdbres( sqlite3_open("/tmp/junk.db", &dbhandle ), SQLITE_OK, "Failed to open db");

  checkdbres( sqlite3_prepare_v2( dbhandle, "create table if not exists data ( issueid INTEGER KEY, time INTEGER KEY, value REAL);", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");
  checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );
  checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");
  checkdbres( sqlite3_prepare_v2( dbhandle, "create index issueidindex on data (issueid );", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");
  checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );
  checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");
  checkdbres( sqlite3_prepare_v2( dbhandle, "create index timeindex on data (time);", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");
  checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );
  checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");

  for ( size_t idx=0; idx < nRecords; ++idx)
  {
    if (idx%800==0)
    {
      checkdbres( sqlite3_prepare_v2( dbhandle, "BEGIN TRANSACTION", -1, & pStmt, NULL ), SQLITE_OK, "Failed to begin transaction");
      checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute begin transaction" );
      checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize begin transaction");
      std::cout << "idx " << idx << " of " << nRecords << std::endl;
    }

    const size_t time = idx/800;
    const size_t issueid = idx % 800;
    const float value = static_cast<float>(rand()) / RAND_MAX;
    sprintf( statement, "insert into data values (%d,%d,%f);", issueid, (int)time, value );
    checkdbres( sqlite3_prepare_v2( dbhandle, statement, -1, &pStmt, NULL ), SQLITE_OK, "Failed to build statement");
    checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );
    checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");

    if (idx%800==799)
    {
      checkdbres( sqlite3_prepare_v2( dbhandle, "END TRANSACTION", -1, & pStmt, NULL ), SQLITE_OK, "Failed to end transaction");
      checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute end transaction" );
      checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize end transaction");
    }
  }

  checkdbres( sqlite3_close( dbhandle ), SQLITE_OK, "Failed to close db" ); 
}

How much historical data do you need to access at any given time? You could archive older data into another table for persistence and save time on querying "relevant" data. — Babak Naffas
I <ideally> need to access all the historical data. If needs be, I can split this into a dataset per year, but I hoped SQLite might save me from having to manage such details. — Brian O'Kennedy
Could you please post the entire code where you use sqlite3* functions (omitting the other parts)? If the process "crawled to a complete halt", something is definitely not right. — sereda
Do you need the same granularity for historical data? If not, RRDB may be interesting. — ninjalj
Why do you need to lookup id1==someid? Is this so that you can SELECT "random" data at a later point? From your scenario, it seems like you do not require someid to be unique, so record #3 at 2010-01-01, and record #79832759385 at 2010-06-06, might have the same id1 field. Likewise, you have collisions on HHMMSS for multiple records inserted per second. So, in case 1, you select id==someId and get zero, one, or many sparse records; and in case 2, you get zero, one, or many adjacent records. If these are your needs, you may benefit from different runtime query methods. — maxwellb

Robert Harvey Robert Harvey · Accepted Answer · 2010-07-01T19:22:06

Are you inserting all of the 800 elements at once? If you are, doing the inserts within a transaction will speed up the process dramatically.

See http://www.sqlite.org/faq.html#q19

SQLite can handle very large databases. See http://www.sqlite.org/limits.html

Can SQLite handle 90 million records?

8 Answers