0
votes

I have completed the assignment (yes it is for a programming class), but I am afraid I didn't go about it in the most efficient way possible. It is basically the uniq program, it will compare adjacent lines in a file and only print one copy of any repeated lines. A few notes: printUniq() is my own function that takes into account various flags, readline() is another function that reads a line of arbitrary length into a char * buffer using malloc and realloc. Here is the part I am worried about:

if(prevline != NULL)
{
  while(thisline != NULL)
  {
     while(thisline != NULL && strcmp(prevline, thisline) == 0)
     {
        count++;
        free(prevline);
        prevline = thisline;
        thisline = readline(stream);
     }
     printUniq(prevline, cflag, dflag, uflag, count);
     count = 1;
     free(prevline);
     if (thisline != NULL)
     {
        prevline = thisline;
        if((thisline = readline(stream)) == NULL)
        {
           printUniq(prevline, cflag, dflag, uflag, count);
        }
     }  
  }

Is there a better way to structure this program? I hate having to check thisline for NULL three times in a loop. The first NULL check in the outer while loop is necessary, and the next check in the nested while is needed in case the last lines are duplicates. The next check after the call to free basically checks if the "Duplicate loop" was exited because of thisline being null, and if not, it will allow the program to get another line. Then the next check is only there for the very last line in the file, because if it weren't there, when readline returns a null (there were no more lines in the file), the loop exits and the prevline was never printed.

Anyways, any help is appreciated.

3

3 Answers

2
votes

I sugest to read the file in only one place, since it will make code more manageable. Maybe something like this might work:

prevline = NULL;
count = 1;
while ((thisline = readline(stream)) != NULL) // will stay in the loop for as long as it reads from file
{
    if (prevline == NULL)
    { // this is the first read from file
         prevline = thisline;
         continue;

    }

    if (strcmp(thisline, prevline) == 0)
    {
         count++;
    } else // found a different line
         if (count > 1) // but after I already counted several identical
         {    // so I will print the line
                printUniq(prevline, cflag, dflag, uflag, count);
                count = 1;
         }
    free(prevline);
    prevline = thisline;
}
if (count > 1) and (prevline != NULL)
{
     printUniq(prevline, cflag, dflag, uflag, count);
}
free(prevline);
1
votes

Using (thisLine = readline()) != NULL as the condition of the loop, and only read one line at a time, would mean that the loop stopped at the end of the file, and the body of the loop could only be entered when thisLine is valid.

It could either read prevLine outside the loop, or deal with no previous line inside the loop:

if ((thisLine = readline()) != NULL) {
    char* prevLine = thisLine;                // got one line
    while ((thisLine = readline()) != NULL) {
       if (strcmp(...) == 0) {
           ...
       } else {
           ...
       }
       ...
    }
    ... deal with prev_line, no need for if because it *must* have been read.
}

vs

prevLine = NULL;
while ((thisLine = readline()) != NULL) {
    if (prevLine == NULL) { // first line?
       ...
    } else if (strcmp(...) == 0) {
       ...
    } else {
       ...
    }
       ...
    prev_line = this_line
}
if (prev_line != NULL) {
   ...
} else {          // only one line in the file?
   ...
}

Dealing with the one line outside the loop makes the the first approach simpler. The flow seems very clear to me. Is there one line? Is there a second line? Okay, uniqueness has meaning ...

In the second approach, dealing with one line inside the loop means the maintenance developer will be looking at the first-line test for every line of input, which IMHO is more for them to worry about.

Code after the loop is necessary because the last two or more lines of the file might be repeated, and that case needs to call printUnique too.

Also one-line-file logic is after the loop (which is less clear, IMHO) and would require mre logic. This logic is needed if the program is intended to emulate other uniq functionality like printing every line with a count.

The clear benefit of the second approach is reading the file in one place, which is in general a good tactic. IMHO, if readline is written properly, it doesn't matter much.

Summary: the first approach needs less logic, and the order of events is more explicit, hence it is simpler to understand. The second reads the file in one place, but needs the last repeating group to be handled outside the loop, so it is even longer. It is also more logic if uniq in general is being programmed.

Note: Both of these flows work.

0
votes

The suggestion by Mihai can be further simplified by using a for() loop instead of a while loop, and by using the continue statement (most of) the nested conditions and duplicated logic can be avoided:

dupstate = 0; uniqcount = 0; totdup = 0; linenum = 0;
for (prevline=NULL; thisline=readline(stream); free(prevline),prevline=thisline) {
{
    linenum++;
    if (!prevline || strcmp(thisline, prevline))
    {
         uniqcount++; dupstate = 0;
         printUniq(thisline, cflag, dflag, uflag, uniqcount);
         continue;
    }
    totdup++;

    if (!dupstate++) /* only the first dup is reported */
    {    
         printDup(thisline, cflag, dflag, uflag, totdup); 
    }
 }

 free(prevline);