1
votes

Working on my first clojure project. I am fetching a large set of links with titles which I have put into a vector of maps which looks like:

[{:title: "A",:id 1,:link "https://example1.com"}
 {:title: "B",:id 2,:link "https://example2.com"}
 {:title: "C",:id 3,:link "https://example3.com"}
 {:title: "AA",:id 4,:link "https://example4.com"}
 {:title: "AB",:id 5,:link "https://example5.com"}
 {:title: "AC",:id 6,:link "https://example6.com"}
 {:title: "ABC",:id 7,:link "https://example7.com"}
 {:title: "AAAA",:id 8,:link "https://example8.com"}]

I am now trying to find similar matching titles for each title within this same vector of maps. If they are sufficiently similar then I want to add their id to a new key in the map which contains something to hold a group of all similar id's. I was thinking a vector in a key would be idea for this.

For example: On first pass it would take title A and match it against title B, C, D, AA, BB, CC, ABC, AAAA. It would be close enough to AA, AB, AC, ABC, and AAAA we would say they are similar and add A's id into a new key for AA, AB, AC, ABC, and AAAA. I want to call this key :tags and ideally it would be a vector so it can easily contain multiple id's with the :tags key. The updated vector of maps would look like:

[{:title: "A",:id 1,:link "https://example1.com"}
 {:title: "B",:id 2,:link "https://example2.com"}
 {:title: "C",:id 3,:link "https://example3.com"}
 {:title: "AA",:id 4,:link "https://example4.com", :tags [1]}
 {:title: "AB",:id 5,:link "https://example5.com", :tags [1]}
 {:title: "AC",:id 6,:link "https://example6.com", :tags [1]}
 {:title: "ABC",:id 7,:link "https://example7.com", :tags [1]}
 {:title: "AAAA",:id 8,:link "https://example8.com", :tags [1]}]

Then we would take title B on a second iteration and match it against all the titles, we can skip title A since we already compared them and if they were similar there would be a tag key with a 1 in its vector. During this pass it would only match titles AB & ABC. When this occurs I want to update the tags vector with the additional id so the vector of maps would now look like:

[{:title: "A",:id 1,:link "https://example1.com"}
 {:title: "B",:id 2,:link "https://example2.com"}
 {:title: "C",:id 3,:link "https://example3.com"}
 {:title: "AA",:id 4,:link "https://example4.com", :tags [1]}
 {:title: "AB",:id 5,:link "https://example5.com", :tags [1 2]}
 {:title: "AC",:id 6,:link "https://example6.com", :tags [1]}
 {:title: "ABC",:id 7,:link "https://example7.com", :tags [1 2]}
 {:title: "AAAA",:id 8,:link "https://example8.com", :tags [1]}]

The order of the id's in the tags vector does not matter. After that pass it would move onto title C which would match 2 titles AC, and ABC. For each title it matches the id of title C would be added to the vector so our updated vector of maps looks like:

[{:title: "A",:id 1,:link "https://example1.com"}
 {:title: "B",:id 2,:link "https://example2.com"}
 {:title: "C",:id 3,:link "https://example3.com"}
 {:title: "AA",:id 4,:link "https://example4.com", :tags [1]}
 {:title: "AB",:id 5,:link "https://example5.com", :tags [1 2]}
 {:title: "AC",:id 6,:link "https://example6.com", :tags [1 3]}
 {:title: "ABC",:id 7,:link "https://example7.com", :tags [1 2 3]}
 {:title: "AAAA",:id 8,:link "https://example8.com", :tags [1]}]

I am trying to avoid using simple loops to make this happen since I think there is a good way using the core clojure functions to do this. I was wondering if anyone had any simple ideas on how best achieve this without falling back on imperative programming loops.

I can easily see this being a simple for loop with 2 vars one for the original dataset and one for the modified. It would loop through the original and modify each matching title map tags key in the modified data set with the id's from the originals keys that match. This is not functional though and goes against best practices in clojure.

Any ideas on how this can be done in clojure using functional programming?

Thanks for the help!

1
The data you are starting with is invalid. Try adding quotes to your :title and :link values so they are strings. As they are, those values are symbols that Clojure can't resolve.jmargolisvt
Thanks for the tip, I think I just represented the data here incorrectly. The Link and the title are strings but I am using println to show the data. When I make the link I am passing it to the str function so I am pretty sure they are strings but I did not represent that data here correctly. Is there a better way to show the core data than using println?Polar Bear
ah got it, I just ended up outputting the data instead of using println, they are all strings, let me update the example.Polar Bear
That's not quite right either. For example, the first line should look like this: {:title "A", :id 1,:link "https://example1.com"}jmargolisvt
ok so if I use (println (vector { :link (str "HELLO") :title (str "HEE") :id 2 })) Then it shows up as [{:link HELLO, :title HEE, :id 2}] Without the println it shows as: (vector { :link (str "HELLO") :title (str "HEE") :id 2 }) Outputs [{:link "HELLO", :title "HEE", :id 2}] Seems that println is removing the strings from the map entries. Updating the scenario to reflect how it should actually look.Polar Bear

1 Answers

2
votes

here is some straightforward variant:

first let's make a function to check links similarity (in this case i would just check if one title contains another)

(defn similar? [{title1 :title} {title2 :title}]
  (and (not= title1 title2)
       (clojure.string/includes? title1 title2)))

and now the transforming function:

(defn tag-links [links]
  (mapv (fn [link]
          (reduce #(update %1 :tags (fnil conj #{}) (:id %2))
                  link
                  (filter #(similar? link %) links)))
        links))

it maps every link record the following way: finds all the similar links, and then updates tags key (creating an empty set for it, if it is absent), adding every found similar id with reduce.

in repl:

(def links '[{:title "A", :id 1, :link "https://example1.com"}
             {:title "B", :id 2, :link "https://example2.com"}
             {:title "C", :id 3, :link "https://example3.com"}
             {:title "AA", :id 4, :link "https://example4.com"}
             {:title "AB", :id 5, :link "https://example5.com"}
             {:title "AC", :id 6, :link "https://example6.com"}
             {:title "ABC", :id 7, :link "https://example7.com"}
             {:title "AAAA", :id 8, :link "https://example8.com"}])

(clojure.pprint/pprint (tag-links links))

;;[{:title "A", :id 1, :link "https://example1.com"}
;; {:title "B", :id 2, :link "https://example2.com"}
;; {:title "C", :id 3, :link "https://example3.com"}
;; {:title "AA", :id 4, :link "https://example4.com", :tags #{1}}
;; {:title "AB", :id 5, :link "https://example5.com", :tags #{1 2}}
;; {:title "AC", :id 6, :link "https://example6.com", :tags #{1 3}}
;; {:title "ABC", :id 7, :link "https://example7.com", :tags #{1 3 2 5}}
;; {:title "AAAA", :id 8, :link "https://example8.com", :tags #{1 4}}]

you can also do without reduce this way:

(defn tag-links [links]
  (mapv (fn [link]
          (if-let [similar (seq (keep 
                                  #(when (similar? link %) (:id %))
                                  links))]
            (assoc link :tags similar)
            link))
        links))

in repl:

user> (clojure.pprint/pprint (tag-links links))

[{:title "A", :id 1, :link "https://example1.com"}
 {:title "B", :id 2, :link "https://example2.com"}
 {:title "C", :id 3, :link "https://example3.com"}
 {:title "AA", :id 4, :link "https://example4.com", :tags (1)}
 {:title "AB", :id 5, :link "https://example5.com", :tags (1 2)}
 {:title "AC", :id 6, :link "https://example6.com", :tags (1 3)}
 {:title "ABC", :id 7, :link "https://example7.com", :tags (1 2 3 5)}
 {:title "AAAA", :id 8, :link "https://example8.com", :tags (1 4)}]

and if you allow empty tags (which i think is more consistent), you can even do without if-let:

(defn tag-links [links]
  (mapv (fn [link]
          (assoc link :tags (keep #(when (similar? link %) (:id %))
                                  links)))
        links))

user> (clojure.pprint/pprint (tag-links links))

[{:title "A", :id 1, :link "https://example1.com", :tags ()}
 {:title "B", :id 2, :link "https://example2.com", :tags ()}
 {:title "C", :id 3, :link "https://example3.com", :tags ()}
 {:title "AA", :id 4, :link "https://example4.com", :tags (1)}
 {:title "AB", :id 5, :link "https://example5.com", :tags (1 2)}
 {:title "AC", :id 6, :link "https://example6.com", :tags (1 3)}
 {:title "ABC", :id 7, :link "https://example7.com", :tags (1 2 3 5)}
 {:title "AAAA", :id 8, :link "https://example8.com", :tags (1 4)}]

notice, that you can change the behavior of the code just by modifying the similar? function.