1
votes

I'm trying to match a pattern in a bunch of files with grep. The files contain G-code (CNC machine code). Every number should have a letter associated with it (example: X4.5, G71, Z-0.75). Many files have typos and are missing the letters. I'm trying to use grep to identify these files by matching any decimal numbers in the file that are not immediately preceded by a letter. However I do not want to match the same pattern if the pattern occurs within parenthesis. Anything in parenthesis is a comment and should not be matched by the regex.

test text:

%
O01934 (AWC C011469)
(MATL: 4.0 X 2.0 X A020)
N90 G00 4.2 z0.1
Z0.1125 F0.004 
N150 X2.2 .01 (inline comment)
0.03

Line 3 technically contains the pattern I'm looking for but I don't want to match it because it's within parenthesis.

Lines 4, 6, 7 are examples of the pattern I'm trying to match. Numbers not preceded by a letter and not inside of parenthesis.

I've been on regextester.com for well over an hour and I've got a headache now. Maybe someone more seasoned with regex can help.

The best pattern I could figure out is ([[:space:]]|^)-?[[:digit:]]*\.[[:digit:]]+([[:space:]]|$). Which matches what I want on 4, 6, and 7. But also matches the numbers in the comment on line 3. I can't figure out how to match one but not the other.

1
Sounds like you want pcregrep -o '\([^()]*\)(*SKIP)(*F)|(?<!\S)-?\d*\.\d+(?!\S)' file. Though you say "Numbers not preceded by a letter", your regex you tried matches the numbers in between whitespace only. Please precise.Wiktor Stribiżew
Can you highlight what it is that should be matched, it's hard to tell ? Never mind, I'll learn CNC software and try to answer in a few years .user12097764
Thank you. It was not my intention to be imprecise. My regex was the best I could come up with. I don't even understand your regex, but it works. Thank you!PatMcTookis
@x15 I cannot highlight text in a code block. "Lines 4,6,7: numbers not immediately preceded by a letter and not within parenthesis." If you can't tell what I'm getting at with that description then you probably can't help me with this problem anyway.PatMcTookis

1 Answers

1
votes

Your regex can be fixed and used as

pcregrep -o '\([^()]*\)(*SKIP)(*F)|(?<!\S)-?\d*\.\d+(?!\S)' file

The \([^()]*\)(*SKIP)(*F) part matches any substring inside closest parentheses and omits this match, thus ignoring any possible matches inside parentheses.

If you need to only avoid matches after a letter replace (?<!\S) with (?<!\p{L}).