1
votes

I'm investigating whether ClickHouse is a good option for OLAP purposes. To do so, I replicated some queries I have running on PostgreSQL, using ClickHouse's sintax.

All the queries I have ran are much faster than Postgres', but the ones that perform text search run out of memory. Below is the error code and the stack trace.

clickhouse_driver.errors.ServerException: Code: 241. DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 524288 bytes), maximum: 9.31 GiB.

The script for the query is:

SELECT COUNT(*)
FROM ObserverNodeOccurrence as occ 
LEFT JOIN
    ObserverNodeOccurrence_NodeElements as occ_ne
    ON occ._id = occ_ne.occurrenceId
WHERE
    occ_ne.snippet LIKE '<img>'

The query above counts the number of entries of the column snippet which contain an HTML image (<img>). This column contains HTML snippets, hence searching text becomes quite expensive. A close/mid term goal is to parse this column and convert it into a set of other columns (e.g. contains_img, contains_script, etc.). But, for now, I would like to be able to run such query without running out of memory.

My question(s) is(are):

  • how can I successfully execute text-search queries on such column without running out of memory?
    • Is there a way to force the query planner to use disk as soon as it runs out of memory?
    • I am using MergeTree engine. Is there another engine that's able to split the load between ram and disk?

Full stack trace:

clickhouse_driver.errors.ServerException: Code: 241.
DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 524288 bytes), maximum: 9.31 GiB. Stack trace:

0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x22) [0x781c272]
1. /usr/bin/clickhouse-server(MemoryTracker::alloc(long)+0x8ba) [0x71bbb4a]
2. /usr/bin/clickhouse-server(MemoryTracker::alloc(long)+0xc5) [0x71bb355]
3. /usr/bin/clickhouse-server() [0x67aeb4e]
4. /usr/bin/clickhouse-server() [0x67af010]
5. /usr/bin/clickhouse-server() [0x67e5af4]
6. /usr/bin/clickhouse-server(void DB::Join::joinBlockImpl<(DB::ASTTableJoin::Kind)1, (DB::ASTTableJoin::Strictness)2, DB::Join::MapsTemplate<DB::JoinStuff::WithFlags<DB::RowRefList, false> > >(DB::Block&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::NamesAndTypesList const&, DB::Block const&, DB::Join::MapsTemplate<DB::JoinStuff::WithFlags<DB::RowRefList, false> > const&) const+0xe1c) [0x68020dc]
7. /usr/bin/clickhouse-server(DB::Join::joinBlock(DB::Block&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, DB::NamesAndTypesList const&) const+0x1a5) [0x67bc415]
8. /usr/bin/clickhouse-server(DB::ExpressionAction::execute(DB::Block&, bool) const+0xa5d) [0x6d961dd]
9. /usr/bin/clickhouse-server(DB::ExpressionActions::execute(DB::Block&, bool) const+0x45) [0x6d97545]
10. /usr/bin/clickhouse-server(DB::ExpressionBlockInputStream::readImpl()+0x48) [0x6c52888]
11. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
12. /usr/bin/clickhouse-server(DB::FilterBlockInputStream::readImpl()+0xd9) [0x6c538b9]
13. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
14. /usr/bin/clickhouse-server(DB::ExpressionBlockInputStream::readImpl()+0x2d) [0x6c5286d]
15. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6635628]
16. /usr/bin/clickhouse-server(DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::loop(unsigned long)+0x139) [0x6c7f409]
17. /usr/bin/clickhouse-server(DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::thread(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long)+0x209) [0x6c7fc79]
18. /usr/bin/clickhouse-server(ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*, std::shared_ptr<DB::ThreadGroupStatus>, unsigned long&>(void (DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>::*&&)(std::shared_ptr<DB::ThreadGroupStatus>, unsigned long), DB::ParallelInputsProcessor<DB::ParallelAggregatingBlockInputStream::Handler>*&&, std::shared_ptr<DB::ThreadGroupStatus>&&, unsigned long&)::{lambda()#1}::operator()() const+0x7f) [0x6c801cf]
19. /usr/bin/clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x1af) [0x71c778f]
20. /usr/bin/clickhouse-server() [0xb2ac5bf]
21. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fc5b50826db]
22. /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fc5b480988f]
1
I think the only way is to increase the memory limit. But in your case you can rewrite query and it will be work without out of memory. - Danila Ganchar

1 Answers

0
votes

Run Clickhouse-Client in terminal

set max_bytes_before_external_group_by=20000000000;  --20 GB for external group by
set max_memory_usage=40000000000; --40GB for memory limit