0
votes

I have the string "pid:8792 byr:2000 cols:hkjdp\n" and I only want to extract the number after the byr:. I thought that it could be done with extracting a formatted string with sscanf(str,"byr:%d",&number);. But unfortunately you can't do that since there are other characters before and after the number so I saw that you could use some sort of regex like in this question asked How to use regex in sscanf

so I tried something like this: sscanf(passport, "%*[^byr:]:%[^\h]%*[^\n]", byr); where byr is now defined as char *byr;. But you can't use regular regex expressions like \hfor whitespace for example. Long Story short: Is there any way for me to parse many strings using sscanf and always extract that number after byr: and where can I find a cheatsheet for all those characters to use in a formatted string? (Of course I know about the obvious %f %d %s %cand so on but these dont really do much in this case.

1
Maybe you need a proper regular expression library? - tadman
Or just use other means? Such as strstr followed with strtoul - Eugene Sh.
You could use sscanf and a format like "pid:%*d byr:%d cols:%*s". A * in a scanf format string like that means "scan, but don't assign to anything". - Steve Summit
But no, scanf does not and cannot do true regular expressions, as that other question explains. - Steve Summit
man scanf contains a complete description of scanf conversion specifications. You won't find regular expressions there because scanf does not do regular expressions. In C, escape sequences like \n are turned into the corresponding character by the compiler, long before a function like scanf is actually executed. In a scanf format string, whitespace characters are treated identically, as described in man scanf; they match any sequence of whitespace characters in the input. (In most regex libraries, whitespace is \s, not \h. But neither of those are valid C escape sequences.) - rici

1 Answers

0
votes

scanf is not well-suited to this situation. It does not provide anything like the general-purpose regexp support you'd need.

If all you care about is the "byr:" part, a completely different approach is to use strstr to search for that specific string. Here is an example:

char *str = "pid:8792 byr:2000 cols:hkjdp\n";
char *tag = "byr:";
char *p = strstr(str, tag);
if(p != NULL) {
    int n = atoi(p + strlen(tag));
    printf("%s %d\n", tag, n);
}

str = "pid:8792 uid:412 byr:3000 cols:etfvq\n";
p = strstr(str, tag);
if(p != NULL) {
    int n = atoi(p + strlen(tag));
    printf("%s %d\n", tag, n);
}

tag = "uid:";
p = strstr(str, tag);
if(p != NULL) {
    int n = atoi(p + strlen(tag));
    printf("%s %d\n", tag, n);
}

Normally, atoi is disrecommended, because it quietly ignores trailing garbage, but here that's just what you want.

One drawback of this technique is that it would wrongly find, say, an "abyr:" tag.