0
votes

I am new to clojure and want to do this correctly. I have two data sources of date stamped data from two CSV files. I have pulled them in a put them in vector of vectors format. I would like to do a join(outer join) sort of combining of the data.

;--- this is how I am loading the data for each file.... works great ---
(def csvfile (slurp "table.csv"))
(def csvdat (clojure.string/split-lines csvfile))
(def final (vec (rest (map (fn [x] (clojure.string/split x #",")) csvdat))))

CSV File 1: date value1 value2 value3

CSV File 2: date valueA valueB valueC

Resulting vector of vectors format: date value1 value2 value3 valueA valueB valueC

I have several ugly ideas I just want to do the best ugly idea. :)

Option 1: get a unique set of times in sequnence and map all the data from the two vector of vectors into a new vector of vectors Option 2: is there a clever way I can do a map from two vector of vectors to a new vector of vectors(more advanced mapping than I can speak to with my experience)

What is the most clojure idomatic method of doing "joins"? Should I be doing maps? I like vectors because I will be doing a lot of range calculations after csv's are joined, like moving a window(groups of rows) down the rows of the joined data.

2
Please edit your question, and post what the ideas are. As a leg up, I use clojure-csv to do this work. - octopusgrabbus
What would one input line from each .csv file look like? What would a merged line look like? - octopusgrabbus
thanks for asking octopusgrabbus.... I have added the "schema" for the input files and desired clojure data structure in the post. - user1536528
Is each value of date unique? Are the .csv file sorted? - octopusgrabbus
many/most of the dates from both files will match up. - user1536528

2 Answers

0
votes

Your data:

(def csv1 [["01/01/2012" 1 2 3 4]["06/15/2012" 38 24 101]])
(def csv2 [["01/01/2012" 99 98 97 96]["06/15/2012" 28 101 43]])

Convert CSV's vector of vectors representation to map:

(defn to-map [v] (into {} (map (fn [[date & data]] [date data]) v)))

Merge the maps:

(merge-with into (to-map csv1) (to-map csv2))
-2
votes

As I understand it, you have data that sort of looks like this:

(def csv1 [["01/01/2012" 1 2 3 4]["06/15/2012" 38 24 101]])
(def csv2 [["01/01/2012" 99 98 97 96]["06/15/2012" 28 101 43]])

Well, you can make a map out of that.

repl-test.core=> (map #(hash-map (keyword (first %1)) (vec (rest %1))) csv1)
({:01/01/2012 [1 2 3 4]} {:06/15/2012 [38 24 101]})

Now, you have a another csv file that may or may not be in the same order (csv2 above).

Suppose I take one line of csv1:

(def l1 (first csv1))
["01/01/2012" 1 2 3 4]

and concat the vector of the same date from that one line csv2

(concat (hash-map (keyword (first l1)) (vec (concat (rest l1) [44 43 42]))))
([:01/01/2012 [1 2 3 4 44 43 42]])

I'm going to leave the writing of the functions to you as an exercise.

Is this what you wanted to do?


Here are some components after using lein new bene-csv:

project.clj

(defproject bene-csv "1.0.4-SNAPSHOT"
  :description "A csv parsing library"
  :dependencies [[org.clojure/clojure "1.4.0"]
                 [clojure-csv/clojure-csv "1.3.2"]
                 [util "1.0.2-SNAPSHOT"]]


  :aot [bene-csv.core]
  :omit-source true)

core.clj (just the header)

(ns bene-csv.core
  ^{:author "Charles M. Norton",
    :doc "bene-csv is a small library to parse a .csv file.
        Created on March 8, 2012"}
  (:require [clojure.string :as cstr])
  (:require [util.core :as utl])
  (:use clojure-csv.core))

routine in core.clj to parse csv file

(defn ret-csv-data
    "Returns a lazy sequence generated by parse-csv.
     Uses utl/open-file which will return a nil, if
     there is an exception in opening fnam.

     parse-csv called on non-nil file, and that
     data is returned."

    [fnam]
    (let [  csv-file (utl/open-file fnam)

            inter-csv-data (if-not (nil? csv-file)
                             (parse-csv csv-file)
                             nil)

            csv-data (vec 
             (filter 
              #(and pos? (count %) (not (nil? (rest %)))) 
                 inter-csv-data))]

            (pop csv-data)))