Please look at this code.
Single-threaded program: http://pastebin.com/KAx4RmSJ. Compiled with:
g++ -lrt -O2 main.cpp -o nnlv2
Multithread with openMP: http://pastebin.com/fbe4gZSn Compiled with:
g++ -lrt -fopenmp -O2 main_openmp.cpp -o nnlv2_openmp
I tested it on a dual core system (so we have two threads running in parallel). But multi-threaded version is slower than the single-threaded one (and shows unstable time, try to run it few times). What's wrong? Where did I make mistake?
Some tests:
Single-thread:
Layers Neurons Inputs --- Time (ns)
10 200 200 --- 1898983
10 500 500 --- 11009094
10 1000 1000 --- 48116913
Multi-thread:
Layers Neurons Inputs --- Time (ns)
10 200 200 --- 2518262
10 500 500 --- 13861504
10 1000 1000 --- 53446849
I don't understand what is wrong.