Filtering unnecessary keys in huge clojure map

4

votes

I got a really big ass nested map in Clojure and I am searching for the most idiomatic way to kick out keys, which should not provided to the frontend (yes, this clojure service is running in the backend).

The datastructure looks like:

As you can see, I got a map with keys, whereby some keys got lists with maps, which have some lists again...so I know -> not pretty.

BUT...is there some way of describing the data I want to get so that every keys I do not want, get filtered out?

Thx

dictionaryclojurenested

Maybe, but I is this the best way? I could also do it iteratively...but maybe a schema can be written, which I somehow can apply to the data. – Tobias S

What do you want to extract from the input? You've pasted a big ol' map and said you want only some of it, but what part? How do you decide what is worth including? – amalloy

10

votes

yet another way, without using any external libs, employing clojure.walk:

(defn remove-deep [key-set data]
  (clojure.walk/prewalk (fn [node] (if (map? node)
                                     (apply dissoc node key-set)
                                     node))
                        data))


user> (remove-deep [:i :l] data)
;;=> {:a 1, :b 2, :c 3, :d [{:e 5} {:f 6, :g {:h 8, :j 10}}]}

user> (remove-deep [:f :p] data)
;;=> {:a 1, :b 2, :c 3, :d [{:e 5} {:g {:h 8, :i 9, :j 10}, :l [{:m 11, :n 12} {:m 16, :n 17}]}]}

The pre/postwalk is there for the exact use case you have: walk down the heterogenous collections, transforming values if necessary

5

votes

If you want to filter for the top-level keys only you can use select-keys

If you want to remove deeply nested keys you can use specter. For example to remove all values under :h under :g under every items in the vector under :d just write:

user> (setval [:d ALL :g :h] NONE data)

4

votes

The simplest is to use clojure.walk/postwalk. I'm assuming you don't need to worry about any key combinations like "remove :i only if it's a child of :f".

Here is an example:

(ns tst.demo.core
  (:use demo.core tupelo.core tupelo.test)
  (:require [clojure.walk :as walk]))

(def data
  {:a 1
   :b 2
   :c 3
   :d [{:e 5}
       {:f 6
        :g {
            :h 8
            :i 9
            :j 10}
        :l [{
             :m 11
             :n 12
             :p {:q 13
                 :r 14
                 :s 15
                 }}
            {:m 16
             :n 17
             :p {:q 18
                 :r 19
                 :s 20
                 }}]}]})

(defn remove-keys [data keys]
  (let [proc-node  (fn [node]
                     (spyx node))
        result (walk/postwalk proc-node data) ]
    (spyx-pretty result)))

(def bad-keys #{:b :f :i :p :n})

(dotest
  (remove-keys data bad-keys))

This shows the recursive processing of postwalk, with output:

Testing tst.demo.core
node => :a
node => 1
node => [:a 1]
node => :b
node => 2
node => [:b 2]
node => :c
node => 3
node => [:c 3]
node => :d
node => :e
node => 5
node => [:e 5]
node => {:e 5}
node => :f
node => 6
node => [:f 6]
node => :g
node => :h
node => 8
node => [:h 8]
node => :i
node => 9
node => [:i 9]
node => :j
node => 10
node => [:j 10]
node => {:h 8, :i 9, :j 10}
node => [:g {:h 8, :i 9, :j 10}]
node => :l
node => :m
node => 11
node => [:m 11]
node => :n
node => 12
node => [:n 12]
node => :p
node => :q
node => 13
node => [:q 13]
node => :r
node => 14
node => [:r 14]
node => :s
node => 15
node => [:s 15]
node => {:q 13, :r 14, :s 15}
node => [:p {:q 13, :r 14, :s 15}]
node => {:m 11, :n 12, :p {:q 13, :r 14, :s 15}}
node => :m
node => 16
node => [:m 16]
node => :n
node => 17
node => [:n 17]
node => :p
node => :q
node => 18
node => [:q 18]
node => :r
node => 19
node => [:r 19]
node => :s
node => 20
node => [:s 20]
node => {:q 18, :r 19, :s 20}
node => [:p {:q 18, :r 19, :s 20}]
node => {:m 16, :n 17, :p {:q 18, :r 19, :s 20}}
node => [{:m 11, :n 12, :p {:q 13, :r 14, :s 15}} {:m 16, :n 17, :p {:q 18, :r 19, :s 20}}]
node => [:l [{:m 11, :n 12, :p {:q 13, :r 14, :s 15}} {:m 16, :n 17, :p {:q 18, :r 19, :s 20}}]]
node => {:f 6, :g {:h 8, :i 9, :j 10}, :l [{:m 11, :n 12, :p {:q 13, :r 14, :s 15}} {:m 16, :n 17, :p {:q 18, :r 19, :s 20}}]}
node => [{:e 5} {:f 6, :g {:h 8, :i 9, :j 10}, :l [{:m 11, :n 12, :p {:q 13, :r 14, :s 15}} {:m 16, :n 17, :p {:q 18, :r 19, :s 20}}]}]
node => [:d [{:e 5} {:f 6, :g {:h 8, :i 9, :j 10}, :l [{:m 11, :n 12, :p {:q 13, :r 14, :s 15}} {:m 16, :n 17, :p {:q 18, :r 19, :s 20}}]}]]
node => {:a 1, :b 2, :c 3, :d [{:e 5} {:f 6, :g {:h 8, :i 9, :j 10}, :l [{:m 11, :n 12, :p {:q 13, :r 14, :s 15}} {:m 16, :n 17, :p {:q 18, :r 19, :s 20}}]}]}
result => 
{:a 1,
 :b 2,
 :c 3,
 :d
 [{:e 5}
  {:f 6,
   :g {:h 8, :i 9, :j 10},
   :l
   [{:m 11, :n 12, :p {:q 13, :r 14, :s 15}}
    {:m 16, :n 17, :p {:q 18, :r 19, :s 20}}]}]}

You can see that maps are first turned into vectors of key-value pairs like [:n 17]. So, when you get a 2-vec like that, just look at the first item and return a nil if you don't like it:

(defn len-2-vec? [node]
  (and (sequential? node)
    (= 2 (count node))))

(defn remove-keys [data bad-keys]
  (let [proc-node (fn [node]
                    (if (and (len-2-vec? node)
                          (contains? bad-keys (first node)))
                      (do
                        (spyx :removed node)
                        nil)
                      node))
        result (walk/postwalk proc-node data) ]
    (spyx-pretty result)))

(def bad-keys #{:b :f :i :p :n})

(dotest
  (remove-keys data bad-keys))

and output:

Testing tst.demo.core
:removed    node => [:b 2]
:removed    node => [:f 6]
:removed    node => [:i 9]
:removed    node => [:n 12]
:removed    node => [:p {:q 13, :r 14, :s 15}]
:removed    node => [:n 17]
:removed    node => [:p {:q 18, :r 19, :s 20}]

(remove-keys data bad-keys) => 
{:a 1, 
 :c 3, 
 :d [{:e 5} 
     {:g {:h 8, 
          :j 10}, 
      :l [{:m 11}
          {:m 16}]}]}

Ran 2 tests containing 0 assertions.
0 failures, 0 errors.

Don't forget the Clojure CheatSheet.

Here is the doc for spyx.

3

votes

This might be using more "manual lifting" than required, but a simple recursive function handles this well:

(defn filter-nested [root keys-to-remove]
  (let [should-remove? (set keys-to-remove)

        ; A recursive function to search through the map
        f (fn rec [node]
            (reduce-kv (fn [acc k v]
                         (cond
                           ; If it's in the set, remove the key from the node
                           (should-remove? k) (dissoc acc k)

                           ; If the value is a map, recursively search it too
                           (map? v) (assoc acc k (rec v))

                           ; If it's a vector, map a recursive call over the vector
                           (vector? v) (assoc acc k (mapv rec v))

                           ; Else do nothing
                           :else acc))
                       node
                       node))]
    (f root)))

(filter-nested data #{:l})
=> {:a 1, :b 2, :c 3, :d [{:e 5} {:f 6, :g {:h 8, :i 9, :j 10}}]}

Once you take into consideration the explanatory comments, it isn't as big as it looks. f (named rec internally) is a recursive function that dissocs keys from the found map when they're in the supplied list of keys. When the value it finds is a map or vector, it recurses to search them as well.

2

votes

Instead of using blacklist, we wanted to have some kind of whitelist. In production it is not a very good idea to work with blacklist - if for some reason the response object may be get extended. Therefore we now use https://github.com/metosin/spec-tools with the strip-extra-keys-transformer like:

(ns sexy.helper.transformer
  (:require [spec-tools.core :as st]
            [spec-tools.data-spec :as ds]))

(def my-abc {:a "12345"
               :b "00529"
               :c [{:d "Kartoffel"
                    :e 5}
                   {:d "Second Item"
                    :e 9999}]})

(def the-abc
  {:a string?
   :c [{:d string?}]})

(def abc-spec
  (ds/spec ::abc the-abc))

(st/conform abc-spec my-abc st/strip-extra-keys-transformer)

Filtering unnecessary keys in huge clojure map

5 Answers