1
votes

I have the following data:

({:seriesId "series 0", :episodeId "0"}
 {:seriesId "series 1", :episodeId "1"}
 {:seriesId "series 1", :episodeId "2"}
 {:seriesId "series 2", :episodeId "3"}
 {:seriesId "series 2", :episodeId "4"}
 {:seriesId "series 2", :episodeId "5"})

And would like to associate each episode to its series, like this:

[{:put-request
  {:item {:seriesId "series 0", :episodeCount 1, :episodeIds #{"0"}}}}
 {:put-request
  {:item {:seriesId "series 1", :episodeCount 2, :episodeIds #{"1" "2"}}}}
 {:put-request
  {:item {:seriesId "series 2", :episodeCount 3, :episodeIds #{"3" "4" "5"}}}}]

Currently I am stuck with the following:

[{:put-request
  {:item {:seriesId "series 0", :episodeCount 1, :episodeIds #{"0"}}}}
 {:put-request
  {:item {:seriesId "series 1", :episodeCount 1, :episodeIds #{"1"}}}}
 {:put-request
  {:item {:seriesId "series 1", :episodeCount 1, :episodeIds #{"2"}}}}
 {:put-request
  {:item {:seriesId "series 2", :episodeCount 1, :episodeIds #{"3"}}}}
 {:put-request
  {:item {:seriesId "series 2", :episodeCount 1, :episodeIds #{"4"}}}}
 {:put-request
  {:item {:seriesId "series 2", :episodeCount 1, :episodeIds #{"5"}}}}]

I am using the create-or-update-series function. I don't know how to find/get a previously added series (if added!) using the seriesId. I tried many things but these were dead-end tracks.

(ns clojure-sscce.core
  (:gen-class)
  (:require clojure.pprint))

(defn create-or-update-series
  ([episodes]
    (create-or-update-series episodes []))
  ([episodes result]
    (if (zero? (count episodes))
      result
      (create-or-update-series (rest episodes)
        (conj result {
          :put-request {
            :item {
              :seriesId (:seriesId (first episodes))
              :episodeCount 1
              :episodeIds #{(:episodeId (first episodes))}}}})))))

;; Tests
(defn -main [& args]
  (let 
    [series0 (mapv (fn [episode-id] {
      :seriesId "series 0"
      :episodeId (str episode-id)}) (range 0 1))
    series1 (mapv (fn [episode-id] {
      :seriesId "series 1"
      :episodeId (str episode-id)}) (range 1 3))
    series2 (mapv (fn [episode-id] {
      :seriesId "series 2"
      :episodeId (str episode-id)}) (range 3 6))]

    (clojure.pprint/pprint
      (concat series0 series1 series2))

    (clojure.pprint/pprint 
      (create-or-update-series (concat series0 series1 series2)))))

Note that {:put-request {:item { ... is needed because the new maps are expected to be PUT to DynamoDB.

Would love your help!

2
Are you attached to using strings for values? If you are ok using integers, that might make this task easier.jmargolisvt
Yes, seriesId and episodeId must be strings.Stéphane Bruckert

2 Answers

1
votes

So if we want to look at the "create-or-update" problem as such, there's a couple ways we can go about implementing that. Like your attempt we're going to need to recursively make a collection of series, but like group-by it's probably better to make it a map, keyed on the series ID. This way when we find a new episode in the input we can easily and efficiently find the series it belongs to in the collection.

First, let's make a little convenience function to update such a map for just one episode. It should:

  • Take a series map and an episode.
  • Look up the right series, if it's there, or else create one.
  • Add the episode to the series and the series to the series map.

Here's my approach:

(defn- update-series-map [series-map {:keys [seriesId episodeId] :as episode}]
  (let[current-series (get series-map seriesId 
                           {:seriesId seriesId :episodeIds #{} :episodeCount 0})
       updated-series (-> current-series
                          (update-in [:episodeCount] inc)
                          (update-in [:episodeIds] conj episodeId))]
    (assoc series-map seriesId updated-series)))

Here we can use the if-not-found parameter of get to create an appropriate empty series if the series doesn't have an entry yet, otherwise we get the entry that's there. In either case we then have to update the entry to add the episode - we have to conj the episode ID into the episode set and inc the episode count. I used update-in to do both of these, but if you're on Clojure 1.7+ update is better for cases like this where we don't go down a deeper key sequence than 1 key.

With that building block we can make something to loop through several episodes. We can do it with a multi-arity recursive approach like in create-or-update-series:

(defn group-by-series-multiarity 
  ([episodes]
   (group-by-series-multiarity {} episodes))
  ([series-map 
    [ep & more]]
   (if (seq more)
     (recur (update-series-map series-map ep) more)
     (vals (update-series-map series-map ep)))))

In structure this is basically the same. I use recur rather than recurring by name mainly as an optimization. Explicit calls use up call stack space, while recur can avoid that. Checking for emptiness with seq is another small optimization, since we don't have to loop through the remaining episodes in order to count them.

At the end it needs a little cleanup, because we don't want the whole map we've created, only the values. That's why I do vals at the end.

Alternatively we could use loop as the target for our recur. This can be nice if our "public API" doesn't fit with the way we do our recursion:

(defn group-by-series-looping[episodes]
  (loop[series-map {}
        [ep & more] episodes]
    (if (seq more)
      (recur (update-series-map series-map ep) more)
      (vals (update-series-map series-map ep)))))

loop basically works like creating a local helper function (in this case with arity 2) and using recur in that.

We could also notice that these recursive functions follow a well-known pattern called 'left fold' or 'reduction' and abstract that pattern using higher-order functions:

(defn group-by-series-reducing [episodes]
  (vals (reduce update-series-map {} episodes)))

Note how reduce basically takes care of the whole loop from group-by-series-looping if we just give it the reducing function it should use (update-series-map) and the initial value {}.

5
votes

group-by is pretty good for things like this. Here's one try in combination with a for comprehension:

(defn group-by-series [episodes]
  (let [grouped (group-by :seriesId episodes)]
    (for [[series eps-in-series] grouped]
      {:seriesId series 
       :episodeCount (count eps-in-series)
       :episodeIds (into #{} (map :episodeId eps-in-series))})))

(group-by-series example-data)
;=> ({:seriesId "series 0", :episodeCount 1, :episodeIds #{"0"}} 
;    {:seriesId "series 1", :episodeCount 2, :episodeIds #{"1" "2"}}
;    {:seriesId "series 2", :episodeCount 3, :episodeIds #{"3" "4" "5"}})

You can add the DynamoDB stuff right in the for comprehension if you want, or make a wrapping function and map it across them.