0
votes

I'm a student trying to complete a university assignment involving an empirical analysis. We're doing multiple regression at the moment in python and I'm wondering if I went about this the right way.

What I'm trying to do is a hypothesis test to check if the effect of one variable is the same as the other. it's just a snippet but you can imagine I have a dataframe, of which I am currently interested in the variables that take up column 1 and 2. Column 0 is a constant added to the model. Am I correct?

    import os
    import numpy as np
    import pandas as pd
    import scipy.stats as stats
    import statsmodels.api as sm
    import matplotlib.pyplot as plt
    import seaborn as sns

    newvars3 = data[['w_a', 'gender', 'gkclasssize', 'gkclasstype', 'gktyears', 'gkabsent']]
    newvars3 = sm.add_constant(newvars3)
    modelnewvars3 = sm.OLS(ymath, newvars3, missing='drop')
    resultnewvars3 = modelnewvars3.fit()
    print(resultnewvars3.summary())
    csvnewvars3 = resultnewvars3.summary().as_csv()
    open(report_dir + 'summ_newvars3_math.csv', 'w').write(csvnewvars3)

    ##Testing the effect of gender vs race
    R = np.array([0, 1, 1, 0, 0, 0, 0])
    tvalue = R @ resultnewvars3.params / (R @ resultnewvars3.cov_params() @ R.T)
    pvalue = 2*(1 - stats.norm.cdf(tvalue))
    gen_race_hypo_test = pd.Series(np.array([tvalue, pvalue]), index=['T-value', 'P-value'])
    gen_race_hypo_test.name = 'Hypothesis test for same effect: Gender vs Race'
    print('\n', gen_race_hypo_test)

data['w_a'] is a dummy variable for race, 0 for white/asian, 1 for other. Statistical theory/Knowledge is required to answer this question.

1
Please don't paste images of your code, paste the actual code - Chris

1 Answers

0
votes

When you do a regression you are going to get a model

y i ​ =β 0 ​ +β 1 ​ x i1 ​ +β 2 ​ x i2 ​ +...+β p ​ x ip ​ +ϵ

There you could see if B1 and B2 are going opposite ways. But I don't think that's how you should prove you hypothesis. Maybe doing a simple linear regression and seeing it result model and its attributes for each variable could be the best way to do so.

I'm assuming that if you are interested in column 1 and column 2 is because those are your x-(independent variables), then that would make your column 0 your y-(dependent variable)? For a model y~x1 and y~x2.

You should provide more information about this and be clearer about the step you are doing. The snippet only shows the calculation of the p-value and t-value not the columns you are referencing.