3
votes

New to Julia. Following this blog to do Neural Network:

http://blog.yhathq.com/posts/julia-neural-networks.html

I am confused about data types and error messages in Julia. This is my code (again, following the blog post on Neural Network):

# read in df to train
train_df = readtable("data/winequality-red.csv", separator=';')
# create train and test data splits
y = train_df[:quality]
x = train_df[:, 1:11] # matrix of all except quality
# vector() and matrix() from blog post

n = length(y)
is_train = shuffle([1:n] .> floor(n * .25))

x_train,x_test = x[is_train,:],x[!is_train,:]
y_train,y_test = y[is_train],y[!is_train]

type StandardScalar
  mean::Vector{Float64}
  std::Vector{Float64}
end

# initialize empty scalar
function StandardScalar()
  StandardScalar(Array(Float64, 0), Array(Float64, 0))
end

# compute mean and std of each col
function fit_std_scalar!(std_scalar::StandardScalar, x::Matrix{Float64})
  n_rows, n_cols = size(x_test)
  std_scalar.std = zeros(n_cols)
  std_scalar.mean = zeros(n_cols)

  for i = 1:n_cols
    std_scalar.mean[i] = mean(x[:,i])
    std_scalar.std[i] = std(x[:,i])
  end
end

# further vectorize the transformation
function transform(std_scalar::StandardScalar, x::Matrix{Float64})
  # element wise subtraction of mean and division of std
  (x .- std_scalar.mean') ./ std_scalar.std'
end

# fit and transform
function fit_transform!(std_scalar::StandardScalar, x::Matrix{Float64})
  fit_std_scalar!(std_scalar, x)
  transform(std_scalar, x)
end

# fit scalar on training data and then transform the test
std_scalar = StandardScalar()

n_rows, n_cols = size(x_test)

# cols before scaling
println("Col means before scaling: ")
for i = 1:n_cols
  # C printf function
  @printf("%0.3f ", (mean(x_test[:, i])))
end

I am getting the error:

'.-' has no method matching .-(::DataFrame, ::Array{Float64,2}) in fit_transform! ... 

For this code:

x_train = fit_transform!(std_scalar, x_train)
x_test = transform(std_scalar, x_test)

# after transforming
println("\n Col means after scaling:")
for i = 1:n_cols
  @printf("%0.3f ", (mean(x_test[:,i])))
end

I am new to Julia and am just not understanding what the issue is. Vector() and Matrix() do not work from the blog post. I assume that was from an older version of DataFrame.

What I think my issue is: these functions are taking in ::Matrix{Float64} and I am passing in the DataFrame. I assume that deprecated (?) Matrix() would have fixed this? Not sure. How do I analyze this error and pass these functions the correct types (if that is the problem here)?

Thank you!

2

2 Answers

2
votes

I believe vector(...) and matrix(...) were both replaced with just array(...), but I can't find an issue number to correspond with that change.

2
votes

The error message says that you're attempting an element-wise subtraction, .-, between a DataFrame and an Array but that operation has no definition for those types. A silly example of this sort of situation:

julia> "a" .- [1, 2, 3]
ERROR: `.-` has no method matching .-(::ASCIIString, ::Array{Int64,1})

My guess is that if you add

println(typeof(x_train))

in front of

x_train = fit_transform!(std_scalar, x_train)

that you'll be told that it's a DataFrame rather than an array that you're trying to work with. I'm not experienced with the DataFrame library but may be able to dig up the conversion tomorrow sometime. This is all I have time for just now.

Added comments after obtaining data file

I retrieved winequality-red.csv and worked with its DataFrame

julia> VERSION
v"0.3.5"

julia> using DataFrames

julia> train_df = readtable("data/winequality-red.csv", separator=';')

julia> y = train_df[:quality]
1599-element DataArray{Int64,1}:

julia> x = train_df[:, 1:11]
1599x11 DataFrame

julia> typeof(x)
DataFrame (constructor with 22 methods)

x and y are at this point array-like objects. The blog post apparently uses vector and matrix to convert these to true arrays, but these functions are unfamiliar to me. As IainDunning points out in his answer (I'd like to cite this properly but haven't puzzled that out yet), this conversion is now done via array. Perhaps this is what you need to do:

julia> y = array(train_df[:quality])
1599-element Array{Int64,1}:

julia> x = array(train_df[:, 1:11])
1599x11 Array{Float64,2}:

I've not followed through with an analysis of all of the other code, so this is a hint at the answer rather than a fully fleshed out and tested solution to your problem. Please let me know how this it works out if you give it a try.

I'm accustomed to seeing and using Array{Float64,1} and Array{Float64,2} rather than Vector{Float64} and Matrix{Float64}. Possibly the vector and matrix synonyms for specific types of arrays is deprecated.