1
votes

I'm a beginner for c++. I wrote a program to extract data from one DB and store those data to another DB. I just want to add multiple threads to speed up the process. I hope to do this in two ways.

  1. Extract data from the first DB and store those data in memory. (In this case, I need to those data in two std::vector types)
  2. while extracting data from the database, if vector size is more than 10000, two threads need to invoke and need to start, get data from two vectors(separately) and store those data in the second database.

Consider the below example. It's a simple code to demonstrate the above scenario. There is a for-loop with huge iterations. I need to start two threads for this code to extract data from dataOne and dataTwo vectors (separate threads for both) and store those data in dataThree and dataFour vectors when the i = 10000.

using namespace std;

int main(){

   std::vector<std::vector<int>> dataOne;
   std::vector<std::vector<int>> dataTwo;

   std::vector<std::vector<int>> dataThree;
   std::vector<std::vector<int>> dataFour;

   for(int i=0; i < 10000000; i++){
       std::vector<int> temp = {1,2,3};
       dataOne.push_back(temp);          //store data in vector-one 

       std::vector<int> temp2 = {3,4,5};
       dataTwo.push_back(temp2);        //store data in vector-two      
   }
}

when i=10000, there should be three threads running,

  • Thread one - Getting data from dataOne vector and store in dataThree

  • Thread two - Getting data from dataTwo vector and store in dataFour

  • Thread main - process the for-loop in main function

anyone can help me to solve this?

3
Unrelated, but you don't need using namespace std; since you use std:: prefix everywhereKillzone Kid
@KillzoneKid and in fact using namespace std is a bad idea to start withPaul
@Paul tell this to Stroustrup :)))Killzone Kid
@KillzoneKid I guess he's used it for convenience, just like everyone else. But sooner or later that little convenience will haunt youPaul

3 Answers

0
votes

Just use std::tread: cplusplus std::thread

I just report the example:

// thread example
#include <iostream>       // std::cout
#include <thread>         // std::thread

void foo() 
{
  // do stuff...
}

void bar(int x)
{
  // do stuff...
}

int main() 
{
  std::thread first (foo);     // spawn new thread that calls foo()
  std::thread second (bar,0);  // spawn new thread that calls bar(0)

  std::cout << "main, foo and bar now execute concurrently...\n";

  // synchronize threads:
  first.join();                // pauses until first finishes
  second.join();               // pauses until second finishes

  std::cout << "foo and bar completed.\n";

  return 0;
}
0
votes

The other answer already answers your question directly. (And I wonder if this has not already been answered elsewhere.)

However, for your paricular problem (connecting with DBs), depending on the exact details, you might want to consider other options. For example, a lot of time will be "wasted" waiting for data transfer from/to DB. An alternative is using an asynchronous API, if it is available. Then, a single thread can handle many connections.

(If many threads are just moving data, that might saturate memory/bus transfer capacity and be as slow as single thread...)

0
votes

I'm a beginner for c++. I wrote a program to extract data from one DB and store those data to another DB. I just want to add multiple threads to speed up the process. I hope to do this in two ways.

It is likely you're doing this in a highly ineffective fashion - whether you use one thread or many. Assuming that by "extracting data" you mean using a DBMS' native protocol or ODBC, this is extremely slow - it requires serialization, packeting, travel through various buffers, then the application of the client-side of the protocol and de-serialization. And that's ignoring the potential overhead of data layout changes on the server. ... and that's just one part of it; you go through the whole thing again for the second DBMS.

You should really try to use a DBMS' native/internal export functionality, and then the other DBMS import / bulk load functionality. Alternatively, if the second DBMS supports the first one's native storage format, you might be able avoid the export part altogether.