2
votes

I am new to Julia and I have a Python function that I want to use in Julia. Basically what the function does is to accept a dataframe (passed as a numpy ndarray), a filter value and a list of column indices (from the array) and run a logistic regression using the statsmodels package in Python. So far I have tried this:

using PyCall

py"""
import pandas as pd
import numpy as np
import random
import statsmodels.api as sm
import itertools
def reg_frac(state, ind_vars):
    rows = 2000
    total_rows = rows*13
    data = pd.DataFrame({
    'state': ['a', 'b', 'c','d','e','f','g','h','i','j','k','l','m']*rows, \
    'y_var': [random.uniform(0,1) for i in range(total_rows)], \
    'school': [random.uniform(0,10) for i in range(total_rows)], \
    'church': [random.uniform(11,20) for i in range(total_rows)]}).to_numpy()
    try:
        X, y = sm.add_constant(np.array(data[(data[:,0] == state)][:,ind_vars], dtype=float)), np.array(data[(data[:,0] == state), 1], dtype=float)
        model = sm.Logit(y, X).fit(cov_type='HC0', disp=False)      
        rmse = np.sqrt(np.square(np.subtract(y, model.predict(X))).mean())
    except:
        rmse = np.nan
    return [state, ind_vars, rmse] 
"""

reg_frac(state, ind_vars) = (py"reg_frac"(state::Char, ind_vars::Array{Any}))

However, when I run this, I don't expect the results to be NaN. I think it is working but I am missing something.

reg_frac('b', Any[i for i in 2:3])

  0.000244 seconds (249 allocations: 7.953 KiB)
3-element Array{Any,1}:
    'b'
    [2, 3]
 NaN

Any help is appreciated.

1
Does the code work in Python (without calling it from Julia)? You've added an except clause that sets rmse to np.nan, so it wouldn't be too suprising if it ended up being NaN. Also any reason you don't just fit the logit model in Julia? - Nils Gudat
Yes the code works in python. model is just an example. I have the model in Julia. I just want to be able to import python functions as part of my Julia journey. - Kay
@PrzemyslawSzufel It works in Python. I just run it and it works. You sure you run it well? Just run reg_frac('b',[2,3]). This was my answer ['b', [2, 3], 0.28999238875117006] - Kay
It does not work. And it can't work because there are several variables undefined in your code such as rows or total_rows. - Przemyslaw Szufel
@PrzemyslawSzufel you are right. My bad, I had those variables already loaded. I have updated the post - Kay

1 Answers

1
votes

In Python code you have strs while in Julia code you have Chars - it is not the same.

Python:

>>> type('a')
<class 'str'>

Julia:

julia> typeof('a')
Char

Hence your comparisons do not work. Your function could look like this:

reg_frac(state, ind_vars) = (py"reg_frac"(state::String, ind_vars::Array{Any}))

And now:

julia> reg_frac("b", Any[i for i in 2:3])
3-element Array{Any,1}:
  "b"
  [2, 3]
 0.2853707270515166

However, I recommed using Vector{Float64} that in PyCall gets converted in-flight into a numpy vector rather than using Vector{Any} so looks like your code still could be improved (depending on what you are actually planning to do).