1
votes

I have a string in Clojure and I'd like to name and extract various parts of a match. The standard way to do this is:

(re-seq #"\d{3}-\d{4}" "My phone number is 000-1234")
;; returns ("000-1234")

However I want to be able to name and access just the matched parts.

Here's an example:

(def mystring "Find sqrt of 6 and the square of 2")
(def patterns '(#"sqrt of \d" #"square of \d"))

When I match on mystring with my list of patterns, I'd like a result to be something like of {:sqrt 6, :root 2}.

Update

I found a 3rd party package called https://github.com/rufoa/named-re that supported named groups, but I was hoping there was a solution within a core library.

3

3 Answers

4
votes

you can do it using named groups of java's regular expressions. the problem is that there is no api to get all the groups' names, so you will have to get them from your regexp:

(defn find-named [re s]
  (let [m (re-matcher re s)
        names (map second (re-seq #"\(\?<([\w\d]+)>" (str re)))]
    (when (.find m)
      (into {} (map (fn [name]
                      [(keyword name) (.group m name)])
                    names)))))

in repl:

user> (find-named #"sqrt of (?<sqrt>\d).*?square of (?<root>\d)"
                  "Find sqrt of 6 and the square of 2")
{:sqrt "6", :root "2"}

user> (find-named #"sqrt of (?<sqrt>\d).*?square of (?<root>\d)"
                  "Find sqrt of 6 and the square of fff")
nil

update:

the conversation led me to the thought, that you don't really need named groups here, but rather named patterns:

user> 
(defn get-named [patterns s]
  (into {} (for [[k ptrn] patterns]
             [k (second (re-find ptrn s))])))
#'user/get-named

user> (get-named {:sq #"sqrt of (\d)"
                  :rt #"square of (\d)"}
                 "Find sqrt of 6 and the square of 2")
{:sq "6", :rt "2"}

user> (get-named {:sq #"sqrt of (\d)"
                  :rt #"square of (\d)"}
                 "Find sqrt of 6 and the square of xxx")
{:sq "6", :rt nil}
1
votes

You need to capture the pattern you want, e.g.:

(re-seq #"sqrt of (\d)" "Find sqrt of 6")

Or if you want the first group match:

(def matcher #"sqrt of (\d)" "Find sqrt of 6")
(re-find matcher)
(second (re-groups matcher))

See the docs for re-groups.

As far as naming captured groups, I didn't look too carefully at the library you mentioned in the question but I would think the only practical difference is in assigning the capturing group a name rather than it just being referenced by its numeric left-to-right position (starting from 1) in the regex.

1
votes

Depending on what you intend to do with the ‘named matches’ you may also find it useful to simply destructure the matches and bind them to symbols.

For a single match:

(if-let [[_ digit letter] (re-find #"(\d)([a-z])" "1x 2y 3z")]
  [digit letter])  ; => ["1" "x"]

For multiple matches:

(for [[_ digit letter] (re-seq #"(\d)([a-z])" "1x 2y 3z")]
  [digit letter])  ; => (["1" "x"] ["2" "y"] ["3" "z"])