Return value from Stata Do-file to Python

Question

I can call the Stata Do-file from Python successfully, but what would be the best way to get a local macro into Python from Stata? I intend to use the Do-file in a loop in Python.

What I have so far:

Python:

import subprocess

InputParams = [' -3','0',' -3','0',' -3','0']

# /e makes it run quietly, i.e., Stata doesn't open a window
cmd = ['C:\\Program Files (x86)\\Stata14\\StataMP-64.exe','/e','do',dofile] + InputParams
subprocess.call(cmd,shell=True)

And in Stata I run a regression and get a local macro containing the mean square error, like

local MSE = 0.0045

What would be the best way to return the local macro to Python? Write it to a file? I could not find anything on writing a macro to a file.

Bonus question: if I put InputParams = ['-3' , '0'] in Python (I removed the space in front of the negative three), Stata gives an error /3 invalid name, why?

EDIT

Adding the Stata Do-file. It isn't the actual script, it is just a representation of the operations I do in the real script.

quietly {

capture log close
clear all
cls
version 14.2
set more off
cd "<path here>"
local datestamp: di %tdCCYY-NN-DD daily("$S_DATE","DMY")
local timestamp = subinstr("$S_TIME",":","-",2)
log using "Logs\log_`datestamp'_`timestamp'_UTC.log"
set matsize 10000

use "<dataset path here>"

gen date = dofc(TimeVar)
encode ID, generate(uuid)

xtset uuid date

gen double DepVarLagSum = 0
gen double IndVar1LagMax = 0
gen double IndVar2LagMax = 0

local DepVar1LagStart = `1' // INPUT PARAMS GO HERE
local DepVar1LagEnd = `2'
local IndVar1LagStart = `3' 
local IndVar1LagEnd = `4'
local IndVar2Start = `5'
local IndVar2End = `6'

** number of folds for cross validation
scalar kfold = 5
set seed 42
gen byte randint = runiform(1,kfold)

** thanks to Álvaro A. Gutiérrez-Vargas for the matrix operations
matrix results = J(kfold,4,.)
matrix colnames results = "R2_fold" "MSE_fold" "R2_hold" "MSE_hold"
matrix rownames results = "1" "2" "3" "4" "5"

local MSE = 0

** rolling sum, thanks to Nick Cox for the algorithm
forval k = `DepVarLagStart'(1)`DepVar1agEnd' {
    if `k' < 0 {
        local k1 = -(`k')
        replace DepVarLagSum = DepVarLagSum + L`k1'.DepVar
    }
    else replace DepVarLagSum = DepVarLagSum + F`k'.DepVar
}

** rolling max, thanks to Nick Cox for the algorithm
local IndVar1_arg IndVar1 
forval k = `IndVar1LagStart'(1)`IndVar1LagEnd' {
    if `k' <= 0 {
        local k1 = -(`k')
        local IndVar1_arg `IndVar1_arg', L`k1'.IndVar1
    }    
}

local IndVar2_arg IndVar2 
forval k = `IndVar2LagStart'(1)`IndVar2LagEnd' {
    if `k' <= 0 {
        local k1 = -(`k')
        local IndVar2_arg `IndVar2_arg', L`k1'.IndVar2
    }    
}

gen resid_squared = .

forval b = 1(1)`=kfold' {
    ** perform regression on 4/5 parts
    xtreg c.DepVarLagSum ///
    c.IndVar1LagMax ///
    c.IndVar2LagMax ///
    if randint != `b' ///
    , fe vce(cluster uuid)

    ** store results
    matrix results[`b',1] = e(r2)
    matrix results[`b',2] = e(rmse)*e(rmse) // to get MSE
    
    ** test set
    predict predDepVarLagSum if randint == `b', xb
    predict residDepVarLagSum if randint == `b', residuals
   

    ** get R-squared
    corr DepVarLagSum predDepVarLagSum if randint == `b'
    matrix results[`b',3] = r(rho)^2
 
    ** calculate squared residuals
    replace resid_squared = residDepVarLagSum*residDepVarLagSum
    summarize resid_squared if randint == `b'
    matrix results[`b',4] = r(mean)

    drop predDepVarLagSum
    drop residDepVarLagSum

mat U = J(rowsof(results),1,1)
mat sum = U'*results
mat mean_results = sum/rowsof(results)

local MSE = mean_results[1,4]
}
}

And I want to feed MSE back into Python.

Sorry if I missed small typos, I cannot directly copy code from the machine on which I am running Stata.

The idea is to feed input parameters to determine the lag periods, run regressions based on the new variables, get the average test-set mean square error, feed that back into Python.

EDIT 2

I added more items to the InputParams list to reflect the expected number of inputs to the Stata Do-file.

Could you post the do file you want to invoke from Python? I can only guess what InputParams is doing here. — Álvaro A. Gutiérrez-Vargas
@ÁlvaroA.Gutiérrez-Vargas Please let me know if there is anything else I can provide, I added a stripped down version of my Stata Do-file — PencilBox
Could you specify exactly where in the dofile you are using the information that it is contained on ImputParams. I still don't quite get where you recover (and use!) the ['-3' , '0']. — Álvaro A. Gutiérrez-Vargas
@ÁlvaroA.Gutiérrez-Vargas There is a commented line in the Stata script that says // INPUT PARAMS GO HERE. I recover the ' -3' in InputParams in DepVarLagStart = `1' . I added the appropriate number of input arguments in Python. Stata expects 6 and I only gave 2. So, Python gives Stata ['-3','0','-3','0','-3','0'] and Stata gets these values via the macros 1, 2, 3, 4,5, and 6. I am using these values in the lagging algorithms to determine the rolling window size. — PencilBox

Álvaro A. Gutiérrez-Vargas Álvaro A. Gutiérrez-Vargas · Accepted Answer · 2021-01-10T08:55:40

Better integration between Python and Stata is available in Stata 16.1, but a pragmatic solution available for earlier versions would be to write on disk your Stata matrix with your results (here I am using an Excel file) and then read it from Python. Here an example of lines of code you can put at the end of your dofile to write your desired matrix.

clear all
version 14.1
matrix M = J(5,2,999)
matrix colnames M = "col1"  "col2" 
matrix rownames M ="1" "2" "3" "4" "5"
global route = "C:\Users\route_to_your_working_directory"
putexcel set "${route}\M.xlsx", sheet("M")  replace
putexcel A1 = matrix(M)   , names

Return value from Stata Do-file to Python

1 Answers