1
votes

I have loaded data into a .csv within Julia.

I wish to convert my string Date to Date format:

julia> head(df)
6×7 DataFrames.DataFrame
│ Row │ Date       │ Open    │ High    │ Low     │ Close   │ Adj_Close │ Volume  │
├─────┼────────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┤
│ 1   │ 1993-01-29 │ 43.9687 │ 43.9687 │ 43.75   │ 43.9375 │ 27.6073   │ 1003200 │
│ 2   │ 1993-02-01 │ 43.9687 │ 44.25   │ 43.9687 │ 44.25   │ 27.8036   │ 480500  │
│ 3   │ 1993-02-02 │ 44.2187 │ 44.375  │ 44.125  │ 44.3437 │ 27.8625   │ 201300  │
│ 4   │ 1993-02-03 │ 44.4062 │ 44.8437 │ 44.375  │ 44.8125 │ 28.1571   │ 529400  │
│ 5   │ 1993-02-04 │ 44.9687 │ 45.0937 │ 44.4687 │ 45.0    │ 28.2749   │ 531500  │
│ 6   │ 1993-02-05 │ 44.9687 │ 45.0625 │ 44.7187 │ 44.9687 │ 28.2552   │ 492100  │

The type is:

julia> showcols(df)
6258×7 DataFrames.DataFrame
│ Col # │ Name      │ Eltype                           │ Missing │ Values                    │
├───────┼───────────┼──────────────────────────────────┼─────────┼───────────────────────────┤
│ 1     │ Date      │ Union{Missings.Missing, String}  │ 0       │ 1993-01-29  …  2017-12-01 │
│ 2     │ Open      │ Union{Float64, Missings.Missing} │ 0       │ 43.9687  …  264.76        │
│ 3     │ High      │ Union{Float64, Missings.Missing} │ 0       │ 43.9687  …  265.31        │
│ 4     │ Low       │ Union{Float64, Missings.Missing} │ 0       │ 43.75  …  260.76          │
│ 5     │ Close     │ Union{Float64, Missings.Missing} │ 0       │ 43.9375  …  264.46        │
│ 6     │ Adj_Close │ Union{Float64, Missings.Missing} │ 0       │ 27.6073  …  264.46        │
│ 7     │ Volume    │ Union{Int64, Missings.Missing}   │ 0       │ 1003200  …  159947700     │

Right now the Date is a string.

So wish to convert the column a Date format.

trying:

df[:Date, DateFormat("yyyy-mm-dd")]

and

df[df[:Date] = DateFormat("yyyy-mm-dd")]

with error:

MethodError: Cannot convert an object of type DateFormat{Symbol("yyyy-mm-dd"),Tuple{Base.Dates.DatePart{'y'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'m'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'d'}}} to an object of type String This may have arisen from a call to the constructor String(...), since type constructors fall back to convert methods. in setindex! at DataFrames\src\dataframe\dataframe.jl:376 in fill! at base\multidimensional.jl:841 REPL

encase my syntax is wrong I make a vector x from the date column:

x = df[:Date]

Date(x, "yyyy-mm-dd")

MethodError: Cannot convert an object of type Array{Union{Missings.Missing, String},1} to an object of type Int64 REPL

This is easy with R but Julia can not find that great of information, any assistance appreciated.

I am also following this link:

https://docs.julialang.org/en/release-0.4/manual/dates/

Here is an example:

julia> df = Dates.DateFormat("y-m-d");

julia> dt = Date("2015-01-01",df)
2015-01-01

julia> dt2 = Date("2015-01-02",df)
2015-01-02

Why cant I pass a vector or data frame column through this??

Update:

This works when I pass one element from the vector:

julia> Date(x[1], Dates.DateFormat("yyyy-mm-dd"))
1993-01-29

I just want to convert every element to this format and store in the data frame

2
Based on the link you posted, be aware that you appear to be reading the docs from julia v0.4. A lot has changed since v0.4, eg the broadcasting operator . in the (very nice) answer provided. Best to read the docs labeled stable and use v0.6 if you've just come across to julia. (although v0.7 should be ready fairly soon, and involves many changes from v0.6 - but it will also be compatible with v1.0 and so should be the last big set of breaking changes that users have to deal with for many years).Colin T Bowers
Ok great - thanks for pointing this out. Many google searches leading to v.04. Cheers, AndrewAndrew Bannerman

2 Answers

3
votes

Simply write Date.(x, Dates.DateFormat("yyyy-mm-dd")) to get what you want.

Notice the . after Date - it tells Julia to apply Date function to all elements of x and Dates.DateFormat("yyyy-mm-dd") will be reused in every call as it is a scalar.

The details are explained here https://docs.julialang.org/en/latest/base/arrays/#Broadcast-and-vectorization-1.

As a side note if you use latest version of CSV.jl package then it should detect Date type automatically:

julia> data="""Date,Open,High,Low,Close,Adj_Close,Volume
       1993-01-29,43.9687,43.9687,43.75,43.9375,27.6073,1003200
       1993-02-01,43.9687,44.25,43.9687,44.25,27.8036,480500
       1993-02-02,44.2187,44.375,44.125 ,44.3437,27.8625,201300"""
"Date,Open,High,Low,Close,Adj_Close,Volume\n1993-01-29,43.9687,43.9687,43.75,43.9375,27.6073,1003200\n1993-02-01,43.9687,44.25,43.9687,44.25,27.8036,480500\n1993-02-02,44.2187,44.375,44.125 ,44.3437,27.8625,201300"

julia> showcols(CSV.read(IOBuffer(data)))
3×7 DataFrames.DataFrame
│ Col # │ Name      │ Eltype                           │ Missing │ Values                    │
├───────┼───────────┼──────────────────────────────────┼─────────┼───────────────────────────┤
│ 1     │ Date      │ Union{Date, Missings.Missing}    │ 0       │ 1993-01-29  …  1993-02-02 │
│ 2     │ Open      │ Union{Float64, Missings.Missing} │ 0       │ 43.9687  …  44.2187       │
│ 3     │ High      │ Union{Float64, Missings.Missing} │ 0       │ 43.9687  …  44.375        │
│ 4     │ Low       │ Union{Float64, Missings.Missing} │ 0       │ 43.75  …  44.125          │
│ 5     │ Close     │ Union{Float64, Missings.Missing} │ 0       │ 43.9375  …  44.3437       │
│ 6     │ Adj_Close │ Union{Float64, Missings.Missing} │ 0       │ 27.6073  …  27.8625       │
│ 7     │ Volume    │ Union{Int64, Missings.Missing}   │ 0       │ 1003200  …  201300        │

And even if it would not you can pass types argument (in example below it avoids an union with Missing if you would not want this for some reason):

julia> showcols(CSV.read(IOBuffer(data), types=[String; fill(Float64, 5); Int]))
3×7 DataFrames.DataFrame
│ Col # │ Name      │ Eltype  │ Missing │ Values                    │
├───────┼───────────┼─────────┼─────────┼───────────────────────────┤
│ 1     │ Date      │ String  │ 0       │ 1993-01-29  …  1993-02-02 │
│ 2     │ Open      │ Float64 │ 0       │ 43.9687  …  44.2187       │
│ 3     │ High      │ Float64 │ 0       │ 43.9687  …  44.375        │
│ 4     │ Low       │ Float64 │ 0       │ 43.75  …  44.125          │
│ 5     │ Close     │ Float64 │ 0       │ 43.9375  …  44.3437       │
│ 6     │ Adj_Close │ Float64 │ 0       │ 27.6073  …  27.8625       │
│ 7     │ Volume    │ Int64   │ 0       │ 1003200  …  201300        │

EDIT: Under DataFrames.jl version 0.14 or later use describe instead of showcols.

0
votes

Here is what I came up with:

# Pull date column and store in vector
    x = df[:Date]

    # loop to iterate through each element in vector, converting to Date format 
    v = []
    for i in 1:length(x)
        z = Date(x[i], Dates.DateFormat("yyyy-mm-dd"))
        push!(v,z)
    end

    # Check format 
    julia> v[1] - v[3]
    -4 days

    # cbind() R equivalent hcat() to existing data frame
    df = hcat(df,v)

With the output:


    julia> head(df)
    6×8 DataFrames.DataFrame
    │ Row │ Date       │ Open    │ High    │ Low     │ Close   │ Adj_Close │ Volume  │ x1         │
    ├─────┼────────────┼─────────┼─────────┼─────────┼─────────┼───────────┼─────────┼────────────┤
    │ 1   │ 1993-01-29 │ 43.9687 │ 43.9687 │ 43.75   │ 43.9375 │ 27.6073   │ 1003200 │ 1993-01-29 │
    │ 2   │ 1993-02-01 │ 43.9687 │ 44.25   │ 43.9687 │ 44.25   │ 27.8036   │ 480500  │ 1993-02-01 │
    │ 3   │ 1993-02-02 │ 44.2187 │ 44.375  │ 44.125  │ 44.3437 │ 27.8625   │ 201300  │ 1993-02-02 │
    │ 4   │ 1993-02-03 │ 44.4062 │ 44.8437 │ 44.375  │ 44.8125 │ 28.1571   │ 529400  │ 1993-02-03 │
    │ 5   │ 1993-02-04 │ 44.9687 │ 45.0937 │ 44.4687 │ 45.0    │ 28.2749   │ 531500  │ 1993-02-04 │
    │ 6   │ 1993-02-05 │ 44.9687 │ 45.0625 │ 44.7187 │ 44.9687 │ 28.2552   │ 492100  │ 1993-02-05 │