1
votes

I am trying to plot some execution times of certain datasets 1, 2 and 3 using from 1 up to 12 threads. Based on hints from other collaborators, I've managed to plot something, but is still raw and needs a few modifications.

test.dat files looks like this:

7.979446
7.979782
7.980070
7.980209
7.980716
7.981428
7.982284
7.986656
7.987722
8.001152
56.394068
56.411380
56.417835
56.425779
56.426430
56.442289
56.447586
56.453845
56.480448
56.500974
89.251694
89.278013
89.281708
89.299754
89.300965
89.307755
89.342808
89.348788
89.374555
89.443212
4.002836
4.003149
4.003460
4.003638
4.003821
4.004005
4.004230
4.005415
4.005717
4.006178
28.228176
28.239830
28.240788
28.249284
28.256000
28.258078
28.262026
28.264375
28.268416
28.273207
44.659865
44.668623
44.671320
44.681847
44.686959
44.694352
44.699392
44.708551
44.709311
44.731599
2.672576
2.673274
2.673376
2.673550
2.673601
2.673844
2.674989
2.675926
2.676808
2.677280
18.832229
18.836510
18.836564
18.839250
18.846318
18.847400
18.849118
18.850751
18.859785
18.867247
29.795157
29.798674
29.803137
29.804717
29.806593
29.808391
29.823173
29.825312
29.830610
29.834788
2.006869
2.006896
2.007402
2.007455
2.007571
2.007576
2.007797
2.008056
2.008327
2.010466
14.123902
14.137294
14.143547
14.145535
14.145802
14.149070
14.150863
14.153041
14.158910
14.163958
22.366110
22.377555
22.381285
22.382443
22.397755
22.402830
22.405512
22.407027
22.408854
22.428611
1.607613
1.608379
1.608383
1.608388
1.608727
1.608875
1.608951
1.609643
1.609970
1.610055
11.317486
11.324436
11.326964
11.327802
11.328852
11.329350
11.331660
11.333145
11.344123
11.347258
17.916997
17.924367
17.927473
17.929957
17.931164
17.941486
17.946694
17.954824
17.960349
17.964670
1.342421
1.342613
1.342790
1.343102
1.343529
1.343624
1.343710
1.343854
1.345136
1.347415
9.441679
9.443007
9.450755
9.452555
9.454940
9.455060
9.456859
9.457250
9.460143
9.471149
14.927076
14.951473
14.953387
14.954076
14.960740
14.971557
14.972433
14.988446
14.998727
15.000089
1.152602
1.152828
1.152872
1.153022
1.153024
1.153126
1.153146
1.153598
1.154386
1.154684
8.101228
8.104056
8.111444
8.112540
8.120765
8.122927
8.123258
8.124685
8.126094
8.126341
12.814569
12.828172
12.840957
12.841054
12.841083
12.844525
12.848143
12.848671
12.863897
12.884744
1.009917
1.010277
1.010382
1.010674
1.011499
1.011569
1.011727
1.011749
1.012070
1.012181
7.094522
7.095445
7.107636
7.113403
7.113982
7.115782
7.115788
7.117909
7.118149
7.119460
11.213377
11.244623
11.246764
11.248645
11.251625
11.257971
11.274399
11.276169
11.281887
11.288210
0.899219
0.899579
0.899652
0.899709
0.899769
0.899830
0.900219
0.900231
0.900431
0.900500
6.328375
6.331140
6.332462
6.338253
6.338744
6.341830
6.346616
6.351038
6.352929
6.367343
10.013683
10.019828
10.022535
10.023297
10.055958
10.060335
10.062904
10.065623
10.066142
10.071990
0.810524
0.810584
0.810863
0.811074
0.811251
0.811642
0.811673
0.812207
0.812218
0.812428
5.683542
5.684000
5.686904
5.688421
5.698619
5.699549
5.704962
5.716741
5.717077
5.720054
9.013459
9.014580
9.026935
9.027847
9.028682
9.033721
9.048791
9.053777
9.054141
9.063900
0.738528
0.738681
0.739168
0.739263
0.739292
0.739330
0.739389
0.739759
0.739885
0.740058
5.178512
5.181581
5.182815
5.190749
5.192971
5.194582
5.195547
5.196638
5.199621
5.203723
8.204998
8.205462
8.217376
8.217634
8.232874
8.234680
8.242527
8.261343
8.267276
8.267413
0.678074
0.678121
0.678458
0.678525
0.679359
0.679681
0.679727
0.679983
0.680982
0.682098
4.743285
4.746411
4.746927
4.753897
4.758229
4.759577
4.761998
4.767569
4.772117
4.772698
7.533035
7.533890
7.539902
7.546736
7.552226
7.556848
7.557569
7.558419
7.565937
7.579489

It is organized this way: First 10 lines are execution times for dataset 1 and 1 thread; the next 10 are times for dataset 2 and 1 thread; the 10 next dataset 3, 1 thread, and so on for up to 12 number of threads.

The gnuplot script looks like this:

set boxwidth 0.7 relative
set style fill solid 1.0 noborder
set xtics ("1" 15, "2" 75, "3" 135, "4" 195, "5" 255, "6" 315, "7" 375, "8" 435, "9" 495, "10" 555, "11" 615, "12" 675)
unset key
set terminal png size 800,600 enhanced font "Helvetica,10"
set output 'output.png'
set xlabel "Number of threads"
set ylabel "Execution time"
plot for [i=1:12] 'test.dat' using ($0+(i-1)*60):1 every ::((i-1)*30)::(i*30-1) with boxes lt i

There are a few issues with this plot:

  • The datasets: as we can see there are 10 executions for each dataset, for each number of threads. Dataset 1 are the lowest values, ds 2 the medium values and ds 3 the highest values. Instead of one color for each number of threads, there should only be 3 colors, say red for dataset 1, green for dataset 2 and blue for dataset 3 and these should be informed via set key upper right.
  • The spacing: plot begins glued to left y axis. There should be a spacing of, say, 15 units before the first histogram and the left y axis, a few units between datasets, again 15 units in between the numbers of threads and 15 units between the last histogram and the right y axis.
  • The xtics: xtics should take in account the spacing aforementioned. Is it possible to loop the setting of those xtics?

Thanks in advance.

UPDATE

Matthew, based on your assertion of point 1, here are the execution times sorted by dataset (dataset 1: lines 1 - 120; dataset 2: lines 121 - 240; dataset 3: lines 241 - 360); first 10 lines of the given ds stand for execution times for 1 thread, the 10 next for 2 threads, and so on.

8.001152
7.981428
7.986656
7.979782
7.980070
7.987722
7.980716
7.980209
7.982284
7.979446
4.003821
4.003638
4.003149
4.005415
4.003460
4.002836
4.005717
4.006178
4.004005
4.004230
2.673844
2.673601
2.675926
2.674989
2.673274
2.677280
2.676808
2.673376
2.672576
2.673550
2.008327
2.007571
2.007797
2.007576
2.010466
2.008056
2.007402
2.006869
2.006896
2.007455
1.608951
1.609970
1.608875
1.608379
1.608383
1.608388
1.607613
1.608727
1.609643
1.610055
1.343102
1.342790
1.347415
1.342613
1.343710
1.343529
1.345136
1.343854
1.342421
1.343624
1.153126
1.153022
1.152828
1.154386
1.152602
1.152872
1.153024
1.154684
1.153598
1.153146
1.011499
1.012181
1.011727
1.012070
1.011569
1.009917
1.011749
1.010674
1.010277
1.010382
0.899709
0.900500
0.900231
0.899769
0.899652
0.900219
0.900431
0.899219
0.899579
0.899830
0.811642
0.811074
0.812207
0.810524
0.812218
0.811673
0.810863
0.812428
0.811251
0.810584
0.738681
0.739885
0.740058
0.739330
0.739168
0.739263
0.739292
0.738528
0.739389
0.739759
0.679359
0.678121
0.680982
0.682098
0.679681
0.678525
0.679727
0.679983
0.678458
0.678074
56.425779
56.417835
56.426430
56.500974
56.447586
56.411380
56.453845
56.480448
56.442289
56.394068
28.258078
28.249284
28.264375
28.273207
28.228176
28.268416
28.240788
28.256000
28.262026
28.239830
18.847400
18.849118
18.846318
18.836564
18.859785
18.839250
18.867247
18.832229
18.850751
18.836510
14.150863
14.149070
14.158910
14.137294
14.145802
14.145535
14.123902
14.153041
14.143547
14.163958
11.333145
11.327802
11.347258
11.317486
11.324436
11.331660
11.329350
11.344123
11.326964
11.328852
9.454940
9.452555
9.460143
9.450755
9.457250
9.471149
9.455060
9.441679
9.456859
9.443007
8.126341
8.123258
8.124685
8.122927
8.111444
8.120765
8.104056
8.126094
8.101228
8.112540
7.107636
7.115788
7.095445
7.113982
7.118149
7.094522
7.117909
7.113403
7.115782
7.119460
6.346616
6.338744
6.328375
6.338253
6.341830
6.331140
6.332462
6.351038
6.367343
6.352929
5.704962
5.683542
5.699549
5.716741
5.698619
5.688421
5.717077
5.686904
5.684000
5.720054
5.178512
5.192971
5.195547
5.196638
5.182815
5.181581
5.194582
5.203723
5.190749
5.199621
4.772698
4.761998
4.743285
4.746927
4.746411
4.758229
4.772117
4.767569
4.759577
4.753897
89.251694
89.348788
89.281708
89.278013
89.299754
89.443212
89.300965
89.374555
89.307755
89.342808
44.681847
44.709311
44.668623
44.659865
44.699392
44.686959
44.671320
44.708551
44.731599
44.694352
29.803137
29.806593
29.830610
29.825312
29.808391
29.823173
29.804717
29.798674
29.795157
29.834788
22.407027
22.405512
22.402830
22.397755
22.382443
22.428611
22.408854
22.381285
22.366110
22.377555
17.931164
17.924367
17.929957
17.954824
17.941486
17.960349
17.916997
17.964670
17.927473
17.946694
14.972433
14.927076
14.953387
14.971557
14.960740
14.954076
14.988446
14.998727
14.951473
15.000089
12.814569
12.844525
12.848671
12.863897
12.841083
12.828172
12.841054
12.840957
12.848143
12.884744
11.244623
11.213377
11.288210
11.257971
11.281887
11.274399
11.276169
11.246764
11.251625
11.248645
10.022535
10.055958
10.013683
10.062904
10.071990
10.065623
10.023297
10.066142
10.019828
10.060335
9.053777
9.048791
9.014580
9.054141
9.063900
9.013459
9.028682
9.026935
9.033721
9.027847
8.217634
8.204998
8.232874
8.205462
8.267276
8.267413
8.261343
8.242527
8.234680
8.217376
7.533035
7.556848
7.558419
7.557569
7.533890
7.565937
7.579489
7.546736
7.539902
7.552226

I'm also posting what the plot looked like before so you can measure up how much you've helped in improving:

Original raw plot

1
I expanded my answer to provide a plot command for the sorted dataset, so that it is complete and covers both forms of the dataset you have provided. Note that with the sorted data set the every selection is simpler, there is only one for (as we don't have to deal with interweaving), and the title specification is simpler. The x-coordinate computation becomes slightly more complex, but overall it is a simpler command. - Matthew

1 Answers

1
votes

I believe that this will accomplish what you want:

set xlabel "Number of threads"
set ylabel "Execution time"

set style fill solid 1.0 noborder

set boxwidth 0.7

set xtics ("1" 19.5)
set for[i=2:12] xtics add (sprintf("%d",i) (i-1)*55+19.5) 

set key top right

set xrange[-15:660]

plot for [i=0:2] for [j=0:9] 'test.dat' using ($0*55+j+i*15):1 every 30::(i*10+j) with boxes lt (i+1) t (j==0)?sprintf("Data Set %d",i+1):""

Here we are inserting the 15 units at both edges, between threads, and 5 units between datasets. I have also set the boxwidth to be absolute instead of relative.

The result is the following:

enter image description here

Addressing your points:

Point 3

Yes, it is possible to look the setting of the x-tics, using the set for syntax. Here, I specify the first xtic explicitly. This has the effect of shutting off all tics but the ones that I specify, and then I use the set xtics add command to add the additional xtics. If I hadn't explicitly set the first, I would have gotten the automatically generated tics in addition to mine.

Alternatively, we could have built up the command in a string and then executed it, using

tcommand = "("
do for[i=1:12] {tcommand = sprintf("%s \"%d\" %f,",tcommand,i,(i-1)*55+19.5)}
tcommand = tcommand[1:strlen(tcommand)-1].")"
set xtics @tcommand

Here the contents of tcommand will be

( "1" 19.500000, "2" 74.500000, "3" 129.500000, "4" 184.500000, "5" 239.500000, "6" 294.500000, "7" 349.500000, "8" 404.500000, "9" 459.500000, "10" 514.500000, "11" 569.500000, "12" 624.500000)

However, this only works if string macros are supported.

Note the width of a thread section is 3*10 + 2*5 + 15 = 55 as there are three datasets taking up 10 spaces, 2 spaces between them of 5 spaces, and one gap of 15 spaces. The 19.5 puts the tic mark directly in the center of the center dataset.

Point 2

This can easily be handled by adjusting the xrange. If we place the first box at 0, we can start the xrange at -15 to leave the left gap. As there are 12 thread units using 55 spaces each (including gaps), we can place the end of the xrange at 660.

Point 1

This is the trickier part, and I suspect there may be a better way. In particular, the data structure makes this more difficult. If all the values for dataset one were listed followed by all of them for set 2 and so on, it would be easier. The fact that they are intermixed makes it harder.

Here the line numbers for the various thread counts and data points occur like the following:

               Data set 1
     1 thread   2 threads   3 threads ...
1st  0          30          60
2nd  1          31          61
...

               Data set 2
     1 thread   2 threads   3 threads ...
1st  10         40          70
2nd  11         41          71
...

The every command does not support something like "10 consecutive lines starting every 30 points", so we need to instead do two loops. The first loop (i) will be over the data sets. The second (j) will be over the lines in the dataset (first line, second line, etc).

Thus we can read every 30th line starting at (i*10+j). For example, when i and j are both 0, we will read the 0th line, the 30th line, the 60th and so on, ie the first line in each thread set for dataset 1. Thus we can place these values at $0*55+j+i*15 where $0 ranges from 0 to 11, and thus can be used to select the thread group.

Because this means we will have 10 curves for each dataset, we set the title on only the first curve (setting it to empty suppresses it).

Here, I have used the default line types. To set a specific color for each set, the plot command can be modified, or the line types can be redefined.

EDIT:

For the sorted dataset provided by the OP, the plot command (the remaining commands are left the same) can be simplified to

plot for [i=0:2] 'test.dat' u (i*15 + floor($0/10)*55 + int($0)%10):1 every ::(i*120)::((i+1)*120-1) w boxes t sprintf("Data set %d",i+1)

Here we need only to loop over the datasets. There is no need to loop over the lines, as we can read each dataset in one block of consecutive lines. For each data set, the every notation selects the 120 corresponding to it (lines 0 - 119, 120 - 239, 240 - 359).

To compute the x-coordinate, we compute an offset for the dataset (0 for the first, 15 for the second, and 30 for the third), corresponding to the number of 10 unit blocks used by the previous set plus the spacing of 5 units. To that we add the offset for the thread block. To get that we divide the line number (0-indexed) by 10 (number of measurements per thread) and floor the result. Thus the first ten lines gives us 0, the next 10 lines gives us 1, and so on. This tells us how many thread blocks to skip and we multiply that by the size of a thread block (55). Finally, we add the value for the line in the measurement set. We use modular arithmetic to get the actual line number with respect to the thread block instead of the dataset (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, ...). Thus i*15 + floor($0/10)*55 + int($0)%10 gives us the x-coordinate for a measurement. Even though $0 has only whole number values, it is treated as a float, so we cast to an int in order to take the modulus.

Adding the label is much simpler, as we only need to loop over each dataset once instead of 10 times, so we don't have to worry about skipping the later loops.