I am new to Julia and I have a Python function that I want to use in Julia. Basically what the function does is to accept a dataframe (passed as a numpy ndarray), a filter value and a list of column indices (from the array) and run a logistic regression using the statsmodels package in Python. So far I have tried this:
using PyCall
py"""
import pandas as pd
import numpy as np
import random
import statsmodels.api as sm
import itertools
def reg_frac(state, ind_vars):
rows = 2000
total_rows = rows*13
data = pd.DataFrame({
'state': ['a', 'b', 'c','d','e','f','g','h','i','j','k','l','m']*rows, \
'y_var': [random.uniform(0,1) for i in range(total_rows)], \
'school': [random.uniform(0,10) for i in range(total_rows)], \
'church': [random.uniform(11,20) for i in range(total_rows)]}).to_numpy()
try:
X, y = sm.add_constant(np.array(data[(data[:,0] == state)][:,ind_vars], dtype=float)), np.array(data[(data[:,0] == state), 1], dtype=float)
model = sm.Logit(y, X).fit(cov_type='HC0', disp=False)
rmse = np.sqrt(np.square(np.subtract(y, model.predict(X))).mean())
except:
rmse = np.nan
return [state, ind_vars, rmse]
"""
reg_frac(state, ind_vars) = (py"reg_frac"(state::Char, ind_vars::Array{Any}))
However, when I run this, I don't expect the results to be NaN. I think it is working but I am missing something.
reg_frac('b', Any[i for i in 2:3])
0.000244 seconds (249 allocations: 7.953 KiB)
3-element Array{Any,1}:
'b'
[2, 3]
NaN
Any help is appreciated.
exceptclause that setsrmsetonp.nan, so it wouldn't be too suprising if it ended up being NaN. Also any reason you don't just fit the logit model in Julia? - Nils Gudatreg_frac('b',[2,3]). This was my answer['b', [2, 3], 0.28999238875117006]- Kayrowsortotal_rows. - Przemyslaw Szufel