3
votes

Goal:

  1. Make global array of string (dictionary) given that size can be computed only in function load().
  2. Print dictionary on the screen with function print().

My approach:

Create global pointer to string, create array of strings in load() and assign local array to global pointer.

Problem:

If I try to print global array (and local as well) inside load(), everything's fine, but in case of printing with print(), segfault occurs somewhere in the end of array. GDB and valgrind outputs seem cryptic to me. I give up. What's wrong?

Source and dictionary are here.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

// length of the longest word in dictionary
#define LENGTH 45
// dictionary file
#define DICTIONARY "large"

// prototypes
void load(const char* dictionary);
void print(void);

// global dictionary size
int dict_size = 0;
// global dictionary
char **global_dict;

int main(void)
{
    load(DICTIONARY);
    print();
    return 0;  
}

/**
 * Loads dictionary into memory.
 */
void load(const char* dictionary)
{
    // open dictionary file
    FILE *dict_file = fopen(dictionary, "r");    
    
    // compute size of dictionary
    for (int c = fgetc(dict_file); c != EOF; c = fgetc(dict_file))
    {
        // look for '\n' (one '\n' means one word)
        if (c == '\n')
        {
            dict_size++;
        }
    }
    
    // return to beginning of file
    fseek(dict_file, 0, SEEK_SET);
    
    // local array
    char *dict[dict_size];
    
    // variables for reading
    int word_length = 0;
    int dict_index = 0;
    char word[LENGTH + 1];   
    
    // iteration over characters
    for (int c = fgetc(dict_file); c != EOF; c = fgetc(dict_file))
    {
        // allow only letters
        if (c != '\n')
        {
            // append character to word
            word[word_length] = c;
            word_length++;
        }
        // if c = \n and some letters're already in the word
        else if (word_length > 0)
        {
            // terminate current word
            word[word_length] = '\0';
            
            //write word to local dictionary
            dict[dict_index] = malloc(word_length + 1);
            strcpy(dict[dict_index], word);
            dict_index++;
            
            // prepare for next word
            word_length = 0;
        }
    }
    
    // make local dictioinary global
    global_dict = dict;
}

/**
 * Prints dictionary.
 */
void print(void)
{
    for (int i = 0; i < dict_size; i++)
        printf("%s %p\n", global_dict[i], global_dict[i]);
}
3
I only know the answer in unpure c. - Iharob Al Asimi
This should go as an example of one of the most neatly formatted questions. - WedaPashi
@WedaPashi It was formated by the OP, but users also shaped it like @MohitJain. - Iharob Al Asimi
Yes, I saw the edit history. Nicely done @MohitJain - WedaPashi
1) always check (!=NULL) the returned value from fopen() to assure the operation was successful. 2) always check (!=-1) the returned value from fseek() to assure the operation was successful. 3) always check (!=NULL) the returned value from malloc() to assure the operation was successful - user3629249

3 Answers

6
votes

Answer is simple, you are assigning the pointer to a variable that is local to load() and it's deallocated when load() returns, hence it's invalid in print(), that leads to undefined behavior.

You even commented it

// local array <-- this is your comment not mine
char *dict[dict_size];

You have two options:

  1. Don't use a global variable, the pattern you used with a global variable has no benefits whatsoever, instead it's very dangerous. You could return the dynamically allocated poitner from the function load() and then pass it to print().

  2. Allocate the array of pointers using malloc().

    global_dict = malloc(dict_size * sizeof(*global_dict));
    

Why I don't like global variables?

  • Because you can do what you did in your program without even recieving a warning from the compiler.

But of course, you will not do this kind of thing when you gain experience, so it's more your fault than it is of global variables, but It's common to see programmers that are still learning, use global variables to solve the problem of sharing data among functions, which is what parameters are for.

So using global variables + not knowing how to handle them properly is bad, instead learn about function parameters and you will solve every problem that required global variables to pass data through different functions in your program without using global variables.

This is your own code, I removed the global_dict variable, and used dynamic memory allocation inside load(), also I performed some error checking on malloc(), you should improve that part if you want the code to be robust, the rest is self explanatory


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

// length of the longest word in dictionary
#define LENGTH 45
// dictionary file
#define DICTIONARY "large"

// prototypes
char **load(const char *dictionary);
void print(char **);

int main(void)
{
    char **dictionary;

    dictionary = load(DICTIONARY);
    if (dictionary == NULL)
        return -1;
    print(dictionary);

    /* Don't forget to free resources, 
       you might need to do it while
       the program is still running,
       leaked resources might quickly 
       become a problem.
     */
    for (int i = 0 ; dictionary[i] != NULL ; ++i)
        free(dictionary[i]);
    free(dictionary);

    return 0;  
}
/**
 * Loads dictionary into memory.
 */
char **load(const char *dictionary)
{
    // open dictionary file
    FILE  *dict_file;    
    size_t dict_size;
    char **dict;
    char   word[LENGTH + 1];
    size_t word_length;
    size_t dict_index;
    dict_file = fopen(dictionary, "r");
    if (dict_file == NULL) /* you should be able to notify this */
        return NULL; /* failure to open file */
    // compute size of dictionary
    for (int c = fgetc(dict_file); c != EOF; c = fgetc(dict_file))
    {
        // look for '\n' (one '\n' means one word)
        if (c == '\n')
        {
            dict_size++;
        }
    }
    // return to beginning of file
    fseek(dict_file, 0, SEEK_SET);

    // local array
    dict = malloc((1 + dict_size) * sizeof(*dict));
    /*                    ^ add a sentinel to avoid storing the number of words */
    if (dict == NULL)
        return NULL;

    // variables for reading
    word_length = 0;
    dict_index = 0;
    // iteration over characters
    for (int c = fgetc(dict_file); c != EOF; c = fgetc(dict_file))
    {
        // allow only letters
        if (c != '\n')
        {
            // append character to word
            word[word_length] = c;
            word_length++;
        }
        // if c = \n and some letters're already in the word
        else if (word_length > 0)
        {
            // terminate current word
            word[word_length] = '\0';

            //write word to local dictionary
            dict[dict_index] = malloc(word_length + 1);
            if (dict[dict_index] != NULL)
            {
                strcpy(dict[dict_index], word);
                dict_index++;
            }
            // prepare for next word
            word_length = 0;
        }
    }
    dict[dict_index] = NULL;
    /* We put a sentinel here so that we can find the last word ... */
    return dict;
}

/**
 * Prints dictionary.
 */
void print(char **dict)
{
    for (int i = 0 ; dict[i] != NULL ; i++)
        printf("%s %p\n", dict[i], (void *) dict[i]);
}
1
votes

In load() you create and fill a local variable char *dict[dict_size]; and then you only assign the pointer global_dict to that variable. But as soon as load() returns, you must not access any local variables any more. That's stealing the hotel keys as it is greatly explained in that answer

0
votes

Problem explained (inspired by @Ingo Leonhardt)

Problem here:

char *dict[dict_size];

With such a declaration memory for array was allocated automatically (it's untouchable only in load()) and after load() call dict's memory is accessible for overwriting. It seems that overwriting takes place somewhere in the end of the program (accidentally).

Solution (inspired by @iharob)

char **dict = malloc((1 + dict_size) * sizeof(*dict));

Thus I dynamically allocate memory for dict (untouchable until end of program or free(global_dict))

P.S. I agree that global variables're dangerous, but solving problem with such a design's a constraint of the assignment.