1
votes

I want to do 2 things at the same time: Match a string against a pattern and extract groups.

The string consists of white spaces and digits. I want to match the string against this pattern. Additionally I want to extract the digits (not numbers, single digits only) using std::smatch.

I tried a lot, but no success.

For the dupe hunters: I checked many many answers on SO, but I could not find a solution.

Then I tried to use the std::sregex_token_iterator. And the result was also baffeling me. In

#include <string>
#include <regex>
#include <vector>
#include <iterator>

const std::regex re1{ R"(((?:\s*)|(\d))+)" };

const std::regex re2{ R"(\s*(\d)\s*)" };

int main() {
    std::string test("   123 45 6   ");
    std::smatch sm;

    bool valid1 = std::regex_match(test, sm, re1);
    std::vector<std::string> v(std::sregex_token_iterator(test.begin(), test.end(), re2), {});
    return 0;
}

The vector contains not only the digits, but also spaces. I would like to have digits only.

The smatch does not contain any digits.

I know, that I can first remove all whitespaces from the string, but there should be a better, one step solution.


What is the proper regex to 1. match the string against my described pattern and 2. extract all single digits into the smatch?

1
Try \s*\d, it will match 0+ spaces and then single digitMichał Turczyn
@MichałTurczyn: Sorry, does not work.Armin Montigny

1 Answers

2
votes

The pattern you need to validate is

\s*(?:\d\s*)*

See the regex demo (note I added ^ and $ to make the pattern match the whole string at the regex testing site, since you use equivalent regex_match in the code, it requires a full string match).

Next, once your string is validated with the first regex, you just need to extract any single digit:

const std::regex re2{ R"(\d)" };
// ...
std::vector<std::string> v(std::sregex_token_iterator(test.begin(), test.end(), re2), {});

Full working snippet:

#include <string>
#include <regex>
#include <vector>
#include <iterator>
#include <iostream>

const std::regex re1{ R"(\s*(?:\d\s*)*)" };

const std::regex re2{ R"(\d)" };

int main() {
    std::string test("   123 45 6   ");
    std::smatch sm;

    bool valid1 = std::regex_match(test, sm, re1);
    std::vector<std::string> v(std::sregex_token_iterator(test.begin(), test.end(), re2), {});
    for (auto i: v)
        std::cout << i << std::endl;

    return 0;
}

Output:

1
2
3
4
5
6

Alternative solution using Boost

You may use a regex that will match all digits separately only if the whole string consists of whitespaces and digits using

\G\s*(\d)(?=[\s\d]*$)

See the regex demo.

Details

  • \G - start of string or end of the preceding successful match
  • \s* - 0+ whitespaces
  • (\d) - a digit captured in Group 1 (we'll return only this value when passing 1 as the last argument in boost::sregex_token_iterator iter(test.begin(), test.end(), re2, 1))
  • (?=[\s\d]*$) - there must be any 0 or more whitespaces or digits and then the end of string immediately to the right of the current location.

See the whole C++ snippet (compiled with the -lboost_regex option):

#include <iostream>
#include <vector>
#include <boost/regex.hpp>

int main()
{
    std::string test("   123 45 6   ");
    boost::regex re2(R"(\G\s*(\d)(?=[\s\d]*$))");
    boost::sregex_token_iterator iter(test.begin(), test.end(), re2, 1);
    boost::sregex_token_iterator end;
    std::vector<std::string> v(iter, end);
    for (auto i: v)
        std::cout << i << std::endl;

    return 0;
}