3
votes

I just started learning Clojure and functional programming and I'm having a difficult time trying to implement the following task:

I have a vector of vectors like this [[a b] [a c] [b c] [c d] [d b]]. And I need to iterate through it removing the items that appears on the second column that had already appeared on the second column. For example the items [b c] and [d b] (because both c and b already appeared on the second column). I managed to get a function that remove one item at the time, but I need to iterate through the vector for each item checking and removing the items. How can I achieve that? I thought about using recursion to achieve that, but every attempt ended up in failure Sorry if this is a trivial question, but I am stuck with that.

For example

Input: [[a b] [a c] [b c] [c d] [a d] [b e]]

Ouput (Expected): [[a b] [a c] [c d] [b e]]

Removed items: [[b c] [a d]]

As you can see, both c and d had already appeared on the previous items [a c] and [c d] respectively, so I have to remove the items [b c] and [a d].

So far, I have the following code

This function returns a vector of items to be removed. In our scenario, it returns the vector [[b c] [a d]]

(defn find-invalid [collection-data item-to-check]
    (subvec (vec (filter #(= (second %) item-to-check) collection-data)) 1))

(defn find-invalid [collection-data item-to-check]
    (subvec (vec (filter #(= (second %) item-to-check) collection-data)) 1))

This other function removes one item at a time from the original vector by a given index of the item

(defn remove-invalid [collection-data item-position]
    (vec (concat (subvec collection-data 0 item-position) (subvec collection-data (inc item-position)))))

This last function is what I did to test this logic

(defn remove-invalid [original-collection ]
    (dorun (for [item original-collection]
       [
        (dorun (for [invalid-item (find-invalid original-collection (second item))]
                 [
                  (cond (> (count invalid-item) 0)
                        (println (remove-invalid original-collection (.indexOf original-collection invalid-item)))
                        )
                  ]))
        ])))

I think recursion could solve my problem, but I would appreciate any help to get that done :).

Thanks in advance.

5
great question. I am sure a clojure pro will come to the rescue =] - sova

5 Answers

3
votes

One way to implement this would be to use reduce:

(first (reduce (fn [[result previous] [a b]]
                 [(if (contains? previous b)
                    result
                    (conj result [a b]))
                  (conj previous b)])
               [[] #{}]
               '[[a b] [a c] [b c] [c d] [d b]]))
;=> [[a b] [a c] [c d]]

We want to keep track of the result we've built up so far (result) and the set of items we've previously found in the second column (previous). For each new item [a b], then, we check whether previous contains the second item, b. If it does, we don't add anything to our result. Otherwise, we conj the new item [a b] onto the end of result. We also conj the second item, b, into previous. Since previous is a set, this won't do anything if previous already contained b. Finally, after the reduce completes, we take the first item from the result, which represents our final answer.

2
votes

If I understand your question correctly, this should do it:

(defn clear [v]
  (loop [v v existing #{} acc []]
    (if (empty? v)
      acc
      (recur (rest v)
             (conj existing (second (first v)))
             (if (some existing [(ffirst v)]) acc (conj acc (first v)))))))

Solved with loop / recur. If I got some time I will see if I can use something like reduce or whatever function is appropriate here.

This filters: [["a" "b"] ["a" "c"] ["b" "c"] ["c" "d"] ["d" "b"]] to [["a" "b"] ["a" "c"]].

2
votes

If you can rely on the duplicates being successive as in the example, go with

(->> '[[a b] [a c] [b c] [c d] [a d] [b e]]
     (partition-by second)
     (map first))
;-> ([a b] [a c] [c d] [b e])

Otherwise implement a distinct-by transducer based on Clojures distinct transducer.

(sequence (distinct-by second)
          '[[a b] [a c] [b c] [c d] [a d] [b e]])

;-> ([a b] [a c] [c d] [b e])

Implementation

(defn distinct-by [f]
   (fn [rf]
     (let [seen (volatile! #{})]
       (fn
         ([] (rf))
         ([result] (rf result))
         ([result input]
          (let [vinput (f input)] ; virtual input as seen through f
            (if (contains? @seen vinput)
              result
              (do (vswap! seen conj vinput)
                  (rf result input)))))))))
1
votes

The following is similar to @Elogent's answer, but uses :as clauses to avoid reconstructing things:

(defn filtered [stuff]
  (second
   (reduce
    (fn [[seconds ans :as sec-ans] [x y :as xy]]
      (if (seconds y)
        sec-ans
        [(conj seconds y) (conj ans xy)]))
    [#{} []]
    stuff)))

For example,

(filtered '[[a b] [a c] [b c] [c d] [d b]])
;[[a b] [a c] [c d]]
0
votes

just for fun: these ones do not preserve the result's order, but if it is ok with you, they're quite expressive (the duplicates can be in any order, unlike the partition-by variant above):

one is to just group everything by second value, and take first item from every val:

(map (comp first val)
     (group-by second '[[a b] [a c] [b c] [c d] [a d] [b e]]))

;; => ([a b] [a c] [c d] [b e])

there is also a nice way to do it, using sorted sets:

(into (sorted-set-by #(compare (second %1) (second %2)))
      '[[a b] [a c] [b c] [c d] [a d] [b e]])
;; => #{[a b] [a c] [c d] [b e]}

and one more, also not preserving the order:

(vals (into {} (map (juxt second identity)
                    (rseq '[[a b] [a c] [b c] [c d] [a d] [b e]]))))
;; => ([b e] [c d] [a c] [a b])

but yes, loop/recur is always faster i guess:

(defn remove-dupes [v]
  (loop [[[_ i2 :as pair] & xs :as v] v present #{} res []]
    (cond (empty? v) res
          (present i2) (recur xs present res)
          :else (recur xs (conj present i2) (conj res pair)))))