I can call the Stata Do-file from Python successfully, but what would be the best way to get a local macro into Python from Stata? I intend to use the Do-file in a loop in Python.
What I have so far:
Python:
import subprocess
InputParams = [' -3','0',' -3','0',' -3','0']
# /e makes it run quietly, i.e., Stata doesn't open a window
cmd = ['C:\\Program Files (x86)\\Stata14\\StataMP-64.exe','/e','do',dofile] + InputParams
subprocess.call(cmd,shell=True)
And in Stata I run a regression and get a local macro containing the mean square error, like
local MSE = 0.0045
What would be the best way to return the local macro to Python? Write it to a file? I could not find anything on writing a macro to a file.
Bonus question: if I put InputParams = ['-3' , '0']
in Python (I removed the space in front of the negative three), Stata gives an error /3 invalid name
, why?
EDIT
Adding the Stata Do-file. It isn't the actual script, it is just a representation of the operations I do in the real script.
quietly {
capture log close
clear all
cls
version 14.2
set more off
cd "<path here>"
local datestamp: di %tdCCYY-NN-DD daily("$S_DATE","DMY")
local timestamp = subinstr("$S_TIME",":","-",2)
log using "Logs\log_`datestamp'_`timestamp'_UTC.log"
set matsize 10000
use "<dataset path here>"
gen date = dofc(TimeVar)
encode ID, generate(uuid)
xtset uuid date
gen double DepVarLagSum = 0
gen double IndVar1LagMax = 0
gen double IndVar2LagMax = 0
local DepVar1LagStart = `1' // INPUT PARAMS GO HERE
local DepVar1LagEnd = `2'
local IndVar1LagStart = `3'
local IndVar1LagEnd = `4'
local IndVar2Start = `5'
local IndVar2End = `6'
** number of folds for cross validation
scalar kfold = 5
set seed 42
gen byte randint = runiform(1,kfold)
** thanks to Álvaro A. Gutiérrez-Vargas for the matrix operations
matrix results = J(kfold,4,.)
matrix colnames results = "R2_fold" "MSE_fold" "R2_hold" "MSE_hold"
matrix rownames results = "1" "2" "3" "4" "5"
local MSE = 0
** rolling sum, thanks to Nick Cox for the algorithm
forval k = `DepVarLagStart'(1)`DepVar1agEnd' {
if `k' < 0 {
local k1 = -(`k')
replace DepVarLagSum = DepVarLagSum + L`k1'.DepVar
}
else replace DepVarLagSum = DepVarLagSum + F`k'.DepVar
}
** rolling max, thanks to Nick Cox for the algorithm
local IndVar1_arg IndVar1
forval k = `IndVar1LagStart'(1)`IndVar1LagEnd' {
if `k' <= 0 {
local k1 = -(`k')
local IndVar1_arg `IndVar1_arg', L`k1'.IndVar1
}
}
local IndVar2_arg IndVar2
forval k = `IndVar2LagStart'(1)`IndVar2LagEnd' {
if `k' <= 0 {
local k1 = -(`k')
local IndVar2_arg `IndVar2_arg', L`k1'.IndVar2
}
}
gen resid_squared = .
forval b = 1(1)`=kfold' {
** perform regression on 4/5 parts
xtreg c.DepVarLagSum ///
c.IndVar1LagMax ///
c.IndVar2LagMax ///
if randint != `b' ///
, fe vce(cluster uuid)
** store results
matrix results[`b',1] = e(r2)
matrix results[`b',2] = e(rmse)*e(rmse) // to get MSE
** test set
predict predDepVarLagSum if randint == `b', xb
predict residDepVarLagSum if randint == `b', residuals
** get R-squared
corr DepVarLagSum predDepVarLagSum if randint == `b'
matrix results[`b',3] = r(rho)^2
** calculate squared residuals
replace resid_squared = residDepVarLagSum*residDepVarLagSum
summarize resid_squared if randint == `b'
matrix results[`b',4] = r(mean)
drop predDepVarLagSum
drop residDepVarLagSum
mat U = J(rowsof(results),1,1)
mat sum = U'*results
mat mean_results = sum/rowsof(results)
local MSE = mean_results[1,4]
}
}
And I want to feed MSE
back into Python.
Sorry if I missed small typos, I cannot directly copy code from the machine on which I am running Stata.
The idea is to feed input parameters to determine the lag periods, run regressions based on the new variables, get the average test-set mean square error, feed that back into Python.
EDIT 2
I added more items to the InputParams
list to reflect the expected number of inputs to the Stata Do-file.
InputParams
is doing here. – Álvaro A. Gutiérrez-VargasImputParams
. I still don't quite get where you recover (and use!) the['-3' , '0']
. – Álvaro A. Gutiérrez-Vargas// INPUT PARAMS GO HERE
. I recover the' -3'
inInputParams
inDepVarLagStart = `1'
. I added the appropriate number of input arguments in Python. Stata expects 6 and I only gave 2. So, Python gives Stata['-3','0','-3','0','-3','0']
and Stata gets these values via the macros1
,2
,3
,4
,5
, and6
. I am using these values in the lagging algorithms to determine the rolling window size. – PencilBox