0
votes

I need to find names which contain three number 7 in the random order.

My attempt

We need to find first names which do not contain seven

ls | grep [^7]

Then, we could remove these matches from the whole space

ls [remove] ls | grep [^7]

The problem in my pseudo-code starts to repeat itself quickly.

How can you find the names which contain three 7s in the random order by AWK/Python/Bash?

[edit] The name can contain any number of letters and it contains words of three 7s.

5
Can you give some example filenames - it'll help clarify which should match and which shouldn't. - Andy
@Andy: Examples of filenames which should be matched are d41d8Zcd978fABe98009978ecf8427e, 7d41d8CoA00b204e7980U0998ecf8427e and 77d41d8CD98f00204E9800998Ecf8427e - Léo Léopold Hertz 준영
@Andy: Examples of filenames which should not be matched are d41d8cd98f00b2049800998eCf842e, d41d8cd98F00b204e9800998ecf8427e and 77d41d8cd98f00b204E79800998ecf8427e. - Léo Léopold Hertz 준영

5 Answers

7
votes

I don't understand the part about "random order". How do you differentiate between the "order" when it's the same token that repeats? Is "a7b7" different from "c7d7" in the order of the 7s?

Anyway, this ought to work:

 ls *7*7*7*

It just let's the shell solve the problem, but maybe I didn't understand properly.

EDIT: The above is wrong, it includes cases with more than four 7s which is not wanted. Assuming this is bash, and extended globbing is enabled, this works:

ls *([^7])7*([^7])7*([^7])7*([^7])

This reads as "zero or more characters which are not sevens, followed by a seven, followed by zero or more characters that are not sevens", and so on. It's important to understand that the asterisk is a prefix operator here, operating on the expression ([^7]) which means "any character except 7".

5
votes

I'm guessing you want to find files that contain exactly three 7's, but no more. Using gnu grep with the extends regexp switch (-E):


ls | grep -E '^([^7]*7){3}[^7]*$'

Should do the trick.

Basically that matches 3 occurrences of "not 7 followed by a 7", then a bunch of "not 7" across the whole string (the ^ and $ at the beginning and end of the pattern respectively).

2
votes

Something like this:

printf '%s\n' *|awk -F7 NF==4
2
votes

A Perl solution:

$ ls | perl -ne 'print if (tr/7/7/ == 3)'
3777
4777
5777
6777
7077
7177
7277
7377
7477
7577
7677
...

(I happen to have a directory with 4-digit numbers. 1777 and 2777 don't exist. :-)

1
votes

Or instead of doing it in a single grep, use one grep to find files with 3-or-more 7s and another to filter out 4-or-more 7s.

ls -f | egrep '7.*7.*7' | grep -v '7.*7.*7.*7'

You could move some of the work into the shell glob with the shorter

ls -f *7*7*7* | grep -v '7.*7.*7.*7'

though if there are a large number of files which match that pattern then the latter won't work because of built-in limits to the glob size.

The '-f' in the 'ls' is to prevent 'ls' from sorting the results. If there is a huge number of files in the directory then the sort time can be quite noticeable.

This two-step filter process is, I think, more understandable than using the [^7] patterns.

Also, here's the solution as a Python script, since you asked for that as an option.

import os
for filename in os.listdir("."):
    if filename.count("7") == 4:
        print filename

This will handle a few cases that the shell commands won't, like (evil) filenames which contain a newline character. Though even here the output in that case would likely still be wrong, or at least unprepared for by downstream programs.