1
votes

Edit: I should first clarify, when waitpid does not work, it does not work for all processes. As suggested I printed out the return value of waitpid and received interesting results. Firstly, during the unsuccessful runs, waitpid() returns 0 even when WIFEXITED(stats) returns 1. How could the child process have no change in status but return completed?

Secondly, I used a dummy program that prints out a string argument every 1 second for a specified number of times. (This is how I tracked if a program completed). I noticed that during successful runs, the waitpid value was not printed out during context switching, but after all the processes finished running.

like this: (here assuming each prog takes 2 quotas to complete) "prog1 run" "prog2 run" "prog3 run" "prog1 run" "prog2 run" "prog3 run" waitpid: 0 waitpid: 0 waitpid: 0 ...

on the other hand, an unsuccessful run gave me this: "prog1 run" waitpid: 0 program termination detected "prog2 run" waitpid: 0 program termination detected "prog3 run" waitpid: 0 program termination detected

TLDR: is it possible for waitpid(child_PID, stat, WNOHANG) to give a different WIFEXITED(stat) in different runnings of the same program?

I am coding a round robin scheduler. The parent process forks n child processes, which each run a process in the scheduler. Using signals SIGCONT and SIGSTOP, as well as the usleep() function, the parent is able to allocate a specified time quotas for each of the child processes to run sequentially in a cycle. At the end of each quota, the parent checks to see if any process has completed. It does so using the waitpid(child_PID, stat, WNOHANG); and then WIFEXITED(stat). If the process has completed, the parent will not allocate any more time quotas for that process in subsequent cycles.

However, I noticed that in every other time I run the code, WIFEXITED(stat) gives me a 1 after the first cycle of quotas, even thought I have ensured that every process should run much longer than said quota. I know for a fact the programs should not have been completed, because my test programs involve printing a specified number of lines before they exit. Strangest of all is the WIFEXITED gives me the wrong results exactly every OTHER run, and on the first cycle.

I have included the code in case anyone is patient enough to read it. Hopefully, reading the code is not necessary to understand the problem. For those kind enough to read it, thank you this means a lot, and perhaps you might know why my program does not terminate? When t runs correctly, it schedules all the processes correctly and runs them until all of them terminates, but does not terminate itself.

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
#include <stdbool.h>

int main(int argc, char *argv[]) {
  int tick_interval = 10000;
  char opt;
  while ((opt = getopt(argc, argv, "t:")) != -1) {
    switch (opt) {
    case 't':
      tick_interval = atoi(optarg);
      break;
    default:
      goto usage;
    }
  }

  if (optind >= argc) {
    goto usage;
  }

  char *filepath = argv[optind];//filepath to textfile containing name of programs and arguments.
  int n;
  FILE * fp;
  char * line = NULL;
  size_t len = 0;
  ssize_t read;

  printf("parent PID: %d\n", getpid());
  fp = fopen(filepath, "r");
  if (fp == NULL)
      exit(EXIT_FAILURE);
  int PID;
  int *prog_tracker = malloc(0);
  int line_counter = 0;
  int word_counter;
  int word_length;
  char ***lines = malloc(0);
  while ((read = getline(&line, &len, fp)) != -1) {
      //printf("round %d\n", line_counter);
      word_counter = 0;
      word_length = 0;
      lines = realloc(lines, (++line_counter) * sizeof(char**));
      lines[line_counter - 1] = malloc(0);
      int char_counter;
      bool is_new = 1;
      for (char_counter = 0; char_counter < read; char_counter ++) {
          if (is_new) {
              is_new = 0;
              lines[line_counter - 1] = realloc(lines[line_counter - 1], ++word_counter * sizeof(char*));
              lines[line_counter - 1][word_counter - 1] = malloc(0);
          }
          lines[line_counter - 1][word_counter - 1] = realloc(lines[line_counter - 1][word_counter - 1], ++word_length * sizeof(char));
          if (line[char_counter] == ' '||line[char_counter] == '\0' || line[char_counter] == '\n' || line[char_counter] == EOF) {
              is_new = 1;
              lines[line_counter - 1][word_counter - 1][word_length - 1] = '\0';
              word_length = 0;
          } else {
              lines[line_counter - 1][word_counter - 1][word_length - 1] = line[char_counter];
          }
      }
      //first line states number of cores to be used. To be implemented. Ignored for now.
      if (line_counter != 1) {
          PID = fork();
          if (PID != 0) {
              printf("PID: %d created at: %d\n", PID, line_counter);
              kill(PID, SIGSTOP);
              prog_tracker = realloc(prog_tracker, (line_counter - 1)  * sizeof(int));
              prog_tracker[line_counter - 2] = PID;
          } else {
              char *arguments[word_counter + 1];
              int counter;
              for (counter = 0; counter < word_counter; counter ++) {
                  arguments[counter] = lines[line_counter - 1][counter];
              }
              arguments[word_counter] = NULL;
              execv(arguments[0], arguments);//child processes implement processes in file.
              break;
          }
      }
  }
  free(lines);
  fclose(fp);
  if (line)
      free(line);

  if (PID != 0) {
      printf("parent running %d\n", getpid());
      int proc_num = 0;
      int prog_num = line_counter - 1;
      printf("prog_num: %d\n", prog_num);
      while (prog_num != 0) { //The while loop should break when all programs have finished, but it does not.
          kill(prog_tracker[proc_num], SIGCONT);
          usleep(tick_interval * 1000);
          kill(prog_tracker[proc_num], SIGSTOP);
          int stat;
           printf("status: %d", waitpid(prog_tracker[proc_num], &stat, WNOHANG)); //I now print out the return of waitpid.
          printf("%d\n", WIFEXITED(stat));
          if (WIFEXITED(stat)) {
              //printf("%d\n", WIFEXITED(stat));
              printf("program termination detected\n");
              prog_tracker[proc_num] = 0;
              prog_num -= 1;
              printf("processes left %d\n", prog_num);
          }
          proc_num = (++proc_num) % (line_counter - 1);
          while(prog_tracker[proc_num] == 0) {
              proc_num = (++proc_num) % (line_counter - 1);
          }
       }
       printf("All programs ended.");//This never gets printed!
  }
  return 0;
2
You should check waitpid()'s return value to make sure it was successful before trying to use stat. - Shawn
With your relatively big program it is difficult to see possible errors. You should simplify your code to create a minimal reproducible example. I suggest to replace the processing of command line arguments and input file with fixed values. Add debug output to see to which PID the result of waitpid belongs. Check the return value of all functions, maybe a function reports an error. - Bodo
the posted code does not compile! amongst other things, it is missing a final } - user3629249
The posted code causes the compiles to output LOTS of warnings, including this critical one: untitled1.c:108:20: warning: operation on ‘proc_num’ may be undefined [-Wsequence-point] and again on line 110 - user3629249
OT: suggest, when compiling, to always enable the warnings, then fix those warnings. ( for gcc, at a minimum use: -Wall -Wextra -Wconversion -pedantic -std=gnu11 ) Note: other compilers use different options to produce the same results - user3629249

2 Answers

2
votes

this kind of code:

        PID = fork();
        if (PID != 0) 
        {   << then running parent process or error occurred
            printf("PID: %d created at: %d\n", PID, line_counter);
            kill(PID, SIGSTOP);
            prog_tracker = realloc(prog_tracker, (line_counter - 1)  * sizeof(int));
            prog_tracker[line_counter - 2] = PID;
        }

if the call to fork() was successful, is killing the child process. Probably not what you want.

1
votes

waitpid() can and most likely will give inconsistent results in the example. If you would get its status variable initialized properly ("int stat = 0;") like you almost always should, you would find out why right away as the code would not work as expected at all. This is but a single example why at least somewhat bug-resistant coding style is a must for a C programmer!

The reason for the behaviour is that if the requested process is still running then waitpid() just returns the value of 0 and does not even touch your "stat" variable so it remains uninitialized. You can probably do some dirty hack to get around the issue for example by initializing "stat" each time with something like 0xFF but the proper way to go is to check the return value of waitpid(... WNOHANG) and only process the rest if the return value is not 0. In most cases this is actually the only thing you need to check and "WIFEXITED(stat)" is not even needed unless you are really interested in how exacltly and with what return value your child process has terminated.