2
votes

Ok, so here's the deal. I am starting my undergrad thesis in computational materials science and I am trying to put some scripts together to help prepare for data analysis.

I have been preparing a GAWK script that will basically take some data (arranged in 4 columns) and snag two of them and plot them in GNUPLOT. To achieve this end, I read in the data file which has multiple timesteps and their associated data in it, split the file into individual .dat files for each timestep.

From there I just generate a basic input script for GNUPLOT and plot each timestep as they occur in the data file.

The issue is that for some reason all of the plots generated are exactly the same plot (in this case always the first time step) but they are saved as the correct timestep.

I have already gone through and tracked each variable/filename throughout the script and finally determined that the issue is somehow with GNUPLOT being called from a script. I took out my system command and wrote a short bash script that calls gnuplot from a for loop:

#!/bin/bash
for file in ./*gnu
do
   gnuplot $file
done

And that still causes the same issue where all of the plots are the same. I then just ran the command gnuplot *gnu from the command line in the directory containing the .gnu files and it worked.

I guess I am just wondering if there is some buffer that I need to be flushing or if I am just missing something?

The GAWK script is given below. I am new to this still, so if you wish to comment on the script with some constructive criticism I would greatly appreciate that as well.

#!/opt/local/bin/gawk -v inputf=$1 -f                                                   

# Write gnuplot files and plot RDF data                                                 
function plot_rdf(timestep, Load_RDF_dat)
{
# Set number of digits in filenames to 6 so data is organized                           
    if (timestep < 10){
        pad_timestep="00000"timestep;
    }
    else if (timestep < 100){
        pad_timestep="0000"timestep;
    }
    else if (timestep < 1000){
        pad_timestep="000"timestep;
    }
    else if (timestep < 10000){
        pad_timestep="00"timestep;
    }
    else if (timestep < 100000){
        pad_timestep="0"timestep;
    }
    else{
        pad_timestep=timestep;
    }

# Give output filenames                                                                 
       gnu_file="plot_RDF_"pad_timestep".gnu";
       png_file="RDF_"pad_timestep".png";

# Create input files for gnuplot                                                        
       print "set output \""png_file"\"" >> gnu_file;
       print "set terminal png" >> gnu_file;
       print "plot './"Load_RDF_dat"' u 1:2" >> gnu_file;
       close(gnu_file);
       system("gnuplot "gnu_file);
}


# Main part of script                                                                   
{
# Parse the RDF data and save it to GNUPLOT readable files                              
    while(getline < inputf){
       if ($1 == "#"){
           # skips the three commented header lines                                     
           next;
       }
       else if (NF == 2){
           timestep=$1;
           bin_num=$2;
           print "Reading timestep "timestep;
           RDF_dat="RDF_"timestep".dat";
           next;
       }
       else if (NF == 4){
           print $2" "$3 >> RDF_dat;
           if ($1 == bin_num){
               plot_rdf(timestep, RDF_dat);
               close(RDF_dat);
           }
           next;
       }
    }
    close(inputf);
    close(RDF_dat);
 }

A snippet of the data file I'm reading in is:

# Time-averaged data for fix rdf
# TimeStep Number-of-rows
# Row c_allrdf[1] c_allrdf[2] c_allrdf[3]
500 100
1 0.005 0 0
2 0.015 0 0
3 0.025 0 0
4 0.035 0 0
5 0.045 0 0
6 0.055 1.16597 0.00133333
7 0.065 2.08865 0.00466667
8 0.075 1.56958 0.008
9 0.085 0.733433 0.01
10 0.095 0.587288 0.012
600 100
1 0.005 0 0
2 0.015 0 0
3 0.025 2.79219 0.000666667
4 0.035 2.86766 0.002
5 0.045 0 0.002
6 0.055 0.582985 0.00266667
7 0.065 2.08865 0.006
8 0.075 0.62783 0.00733333
9 0.085 0.488955 0.00866667
10 0.095 1.17458 0.0126667

There is usually 100 sets of data in each timestep section, but I figured I'd shorten here just so you'd get the idea.

2
"and finally determined that the issue is somehow with GNUPLOT being called from a script" - don't you feel how weird that conclusion is? the problem is in your script, not calling gnuplot from a script.Karoly Horvath

2 Answers

0
votes

I'm not sure that I can answer your question -- however, I will say that when I modified your datafile slightly, It (seemed to) work just fine for me.

Here's my modified version of your datafile:

# Time-averaged data for fix rdf
# TimeStep Number-of-rows
# Row c_allrdf[1] c_allrdf[2] c_allrdf[3]
500 100
1 0.005 0 0
2 0.015 0 0
3 0.025 0 0
4 0.035 0 0
5 0.045 0 0
6 0.055 1.16597 0.00133333
7 0.065 2.08865 0.00466667
8 0.075 1.56958 0.008
9 0.085 0.733433 0.01
10 0.095 0.587288 0.012
100 0.095 0.56 0.014     #<-added this line
600 100
1 0.005 0 0
2 0.015 0 0
3 0.025 2.79219 0.000666667
4 0.035 2.86766 0.002
5 0.045 0 0.002
6 0.055 0.582985 0.00266667
7 0.065 2.08865 0.006
8 0.075 0.62783 0.00733333
9 0.085 0.488955 0.00866667
10 0.095 1.17458 0.0126667
100 0.095 1.179 0.12      #<-added this line

Those lines were necessary to "trigger" the gnuplot plotting function because of the lines:

   if ($1 == bin_num){
       plot_rdf(timestep, RDF_dat);
       close(RDF_dat);
   }

Since bin_num is taken from the second field in the "header". (e.g. 600 100).

I'm not sure if you had that correctly set up in your full datafile or not. Also, I called the script as:

gawk -f test.awk -v inputf=test.dat test.dat

which completely ignores your shebang in the beginning, but I've read that a lot of systems have trouble splitting those up correctly.

Finally, what version of gnuplot do you have? If you have 4.6, you can forgo a lot of this pain, pretty much skipping the gawk script entirely and replacing it with a much more simple one.

0
votes

As mgilson notes, you might have been failing to call plot_rdf because of not having $1 == bin_num. Note that invoking awk with the data file name on the command line makes it easy to use awk's built-in file reading loop. This is illustrated in the following rewrite of your awk program. Also note:
• using > instead of >> in two places
• closing RDF_dat before running gnuplot, instead of after
• using pad_timestep = sprintf("%06d", timestep); instead of clumsy series of if statements

For the following, I put the program into file so-gnuplot-awk, data as-is into file data-so-gnuplot, and invoked the program via

awk -f so-gnuplot-awk data-so-gnuplot

Program:

# Parse the RDF data and save it to GNUPLOT readable files
BEGIN { dopen=0 }

NF==2 {
    if (dopen) plot_rdf(timestep, RDF_dat);
    timestep = $1;
    print "Reading timestep "timestep;
    RDF_dat="RDF_"timestep".dat";
    printf "" > RDF_dat     # Init empty file
    dopen = 1;
}

NF == 4 {  if (dopen) print $2" "$3 >> RDF_dat; }

# Write gnuplot files and plot RDF data
function plot_rdf(timestep, Load_RDF_dat) {
# Set output filenames & create gnuplot command file
    pad_timestep = sprintf("%06d", timestep);
    gnu_file="plot_RDF_"pad_timestep".gnu";
    png_file="RDF_"pad_timestep".png";
    print "set output \""png_file"\"" > gnu_file; # Use > first
    print "set terminal png" >> gnu_file;
    print "plot './"Load_RDF_dat"' u 1:2" >> gnu_file;
    close(gnu_file);
    close(RDF_dat);
    print "Plotting with "RDF_dat" into "png_file
    system("gnuplot "gnu_file);
    dopen=0
}

END { if (dopen) plot_rdf(timestep, RDF_dat); }