3
votes

Ok. I need to ignore a list of files from the version control, except for files in three certain folders (let's call them Folder1, Folder2 and Folder3). I can list all folders I need to ignore as a plain list, but I consider this as not an elegant way, so I wrote the following regex:

.*/(Bin|bin)/(?!Folder1/|Folder2/|Folder3/).*

My thoughts were as follows, from left to right:

  • .* - Any number of any characters.
  • / - Slash symbol, which separates folders from one another.
  • (Bin|bin) - Folder with "Bin" or "bin" name.
  • / - Slash symbol, which separates folders from one another.
  • (?!Folder1/|Folder2/|Folder3/) - Folder name is not "Folder1/" and is not "Folder2/" and is not "Folder3/". This part was the most complicated, but I googled it somehow. I don't understand why should it work, but it works during the tests.
  • .* - Any number of any characters.

This expression works perfectly when I test it at regex101.com with a couple of text strings, representing paths to files, but nothing works when I put it in my .hgignore file, as follows:

syntax: regexp
.*/(Bin|bin)/(?!Folder1/|Folder2/|Folder3/).*

For some reason it ignores all files and sub-folders in all "Bin" and "bin" folders. How can I accomplish my task?

P.S. As soon as I know, Mercurial/TortoiseHG uses Python/Perl regular expressions.

Many thanks in advance.

1
You say nothing works, yet you say it ignores fies/sub-folders in bin. What is it ? It could be the ignore regex should be a positive not a negative. Try to change it to (?:fold1|2|3)user557597
The right words would be "It doesn't work as needed". If I change regex to positive, folders 1, 2 and 3 will match the regex. And I need it to match all folders and files except files in these folders. ".hgignore" file is used by control version system to list files which not be versioned, i.e. ignored by Mercurial. I want all files in all Bin and bin folders (except for folders1-3) to be ignored, so they should match to regex. Files in folders1-3 shoudn't be ignored, so the should not match the regex.Igor Nikiforov
Something is not right. If the regex matches, the file will be processed ? Otherwise it won't ? If it is Python, try this regex ^(?:(?!(?:Bin|bin)/(?:Folder1|Folder2|Folder3)/).)*$ which extends the negation out to the bin folder where it should be. Otherwise it will match all other folders.user557597
Oook. Mercurial is the version control system, right? By default it tracks changes in all files/folders you put in repository folder, right? Sometimes one does NOT need Mercurial to track some files and folders (as these files would differ for all repository users). Sometimes you could have plenty of such files (my case). Mercurial has dedicated file called ".hgignore", where you could use regex to select all files you want Mercurial to NOT track (i.e. ignore). So, if a file matches regex - it is ignored by Mercurial. If a file doesn't match a regex, it is not ignored by Mercurial.Igor Nikiforov
I want Mercurial to ignore all files in all Bin and bin folders except for ".../Bin/Folder1/...", ".../Bin/Folder2/..." and ".../Bin/Folder3/...". None of regex you suggested do the job. ^ and & are also not applicable as Bin and bin folders could be at any place within folders/files hierarchy.Igor Nikiforov

1 Answers

2
votes

To adjust the question a bit to make it clearer (at least to me), we have any number of /bin/somename/... and .../bin/anothername/... names that should be ignored, along with three sets of .../bin/folder1/..., .../bin/2folder/..., and .../Bin/third/... set of names that should not be ignored.

Hence, we want a regular expression that (without anchoring) will match the names-to-be-ignored but not the ones-to-be-kept. (Furthermore, glob matching won't work, since it's not as powerful: we'll either match too little or too much, and Mercurial lacks the "override with later un-ignore" feature of Git.)

The shortest regular expression for this should be:

/[Bb]in/(?!(folder1|2folder|third)/)

(The part of this regex that actually matches a string like /bin/somename/... is only the /bin/ part, but Mercurial does not look at what matched, only whether something matched.)

The thing is, your example regular expression should work, it's just a longer variant of this same thing with not-required but harmless (except for performance) .* added at the front and back. So if yours isn't working, the above probably won't work either. A sample repository, with some dummy files, that one could clone and experiment with, would help diagnose the issue.


Original (wrong) answer (to something that's not the question)

The shortest regular expression for the desired case is:

/[Bb]in/Folder[123]/

However, if the directory / folder names do not actually meet this kind of pattern, we need:

/[Bb]in/(somedir|another|third)/

Explanation

First, a side note: the default syntax is regexp, so the initial syntax: regexp line is unnecessary. As a result, it's possible that your .hgignore file is not in proper UTF-8 format: see Mercurial gives "invalid pattern" error for simple GLOB syntax. (But that would produce different behavior, so that's probably a problem. It's just worth mentioning in any answer about .hgignore files malfunctioning.)

Next, it's worth noting a few items:

  • Mercurial tracks only files, not directories / folders. So the real question is whether any given file name matches the pattern(s) listed in .hgignore. If they do match, and the file is currently untracked, the file will not be automatically added with a sweeping "add everything" operation, and Mercurial will not gripe that the file is untracked.

  • If some file is already tracked, the fact that its name matches an ignore pattern is irrelevant. If the file a/b/c.ext is not tracked and does match a pattern, hg add a/b/c.ext will add it anyway, while hg add a/b will en-masse add everything in a/b but won't add c.ext because it matches the pattern. So it's important to know whether the file is already tracked, and consider what you explicitly list to hg add. See also How to check which files are being ignored because of .hgignore?, for instance.

  • Glob patterns are much easier to write correctly than regular expressions. Unless you're doing this for learning or teaching purposes, or glob is just not powerful enough, stick with the glob patterns. (In very old versions of Mercurial, glob matching was noticeably slower than regexp matching, but that's been fixed for a long time.)

  • Mercurial's regexp ignore entries are not automatically anchored: if you want anchored behavior, use ^ at the front, and $ at the end, as desired. Here, you don't want anchored behavior, so you can eliminate the leading and trailing .*. (Mercurial refers to this as rooted rather than anchored, and it's important to note that some patterns are anchored, but .hgignore ones are not.)

  • Python/Perl regexp (?!...) syntax is the negation syntax: (?!...) matches if the parenthesized expression doesn't match the string. This is part of the problem.

  • We need not worry about capturing groups (see capturing group in regex) as Mercurial does nothing with the groups that come out of the regular expression. It only cares if we match.

  • Path names are really slash-separated components. The leading components are the various directories (folders) above the file name, and the final component is the file name. (That is, try not to think of the first parts as folders: it's not that it's wrong, it's that it's less general than "components", since the last part is also a component.)

What we want, in this case, is to match, and therefore "ignore", names that have one component that matches either bin or Bin followed immediately by another component that matches Folder1, Folder2, or Folder3 that is followed by a component-separator (so that we haven't stopped at /bin/Folder1, for instance, which is a file named Folder1 in directory /bin).

The strings bin and Bin both end with a common trailing part of in, so this is recognizable as (B|b)in, but single-character alternation is more easily expressed as a character class: [Bb], which eliminates the need for parentheses and vertical-bars.

The same holds for the names Folder1, Folder2, and Folder3, except that their common string leads rather than trails, so we can use Folder[123].

Suppose we had anchored matches. That is, suppose Mercurial demanded that we match the whole file name, which might be, say, /foo/hello/bin/Folder2/bar/world.ext. Then we'd need .*/[Bb]in/Folder[123]/.*, because we'd need to match any number of characters to skip over /foo/hello before matching /bin/Folder2/, and again skip over any number of characters to match bar/world.ext, in order to match the whole string. But since we don't have anchored matches, we'll find the pattern /bin/Folder2/ within the whole string, and hence ignore this file, using the simpler pattern without the leading and trailing .*.