0
votes

I am writing a C program that looks at all files in the current directory using DIR, opendir(), and readdir(), then assigning them to a dirent struct as below.

int main(int argc, char *argv[])
{
    DIR *d;
    char *dir_name = ".";
    struct stat s;

    d = opendir(dir_name);

    while (1) {
        struct dirent *entry;  

        entry = readdir(d);

        if (!entry) 
            break;

        //how to check if this is a text file before printing?
        printf ("%s\n", entry->d_name);
    }
    closedir(d)
}

What I need to find out is how to test the file to see if it is a text file. I thought of using stat() to look at the mode. I can exclude directories this way. For binaries I thought I could look for executable bits, but that would be a problem for scripts, for instance, which are executable text files.

Any suggestions on how I might be able to programmaticaly filter for only text files?

2
What do you mean by a text file? One that only contains ascii? Or that contains written text encoded somehow? - Paul Hankin
You can only make guesses based on file extension or statistics on the file contents. All files are binary. "Text" is just an interpretation based on an encoding. - Mat
Are you allowed to invoke a command line tool? If so, just invoke file and then parse its output. - Dave Newman
I thought about file, and that would be perfect if I could do it within the program. - Acroyear
@user2227422 you will need to read up on fork and exec to be able to invoke file from your code. I like the libmagic suggestion better however. - Dave Newman

2 Answers

3
votes

By using libmagic and looking into mime types

-1
votes

I gave up trying to make libmagic work, and just decided to use the following algo to loop through the contents of a file looking for directories and non ascii characters. There is probably some kind flaw here, but it seems to work on the files I have tested it on.

is_text = 1;

while ((r = read(fs, &ch, sizeof(ch))) != 0) {
  if (r < 0) {
    is_text = 0;
    break;
  }
  if (ch < 0) {
    is_text = 0;
    break;
  }
}