2
votes

I am using GEKKO to study the effect of MPC and MHE on building energy use. To get a performance increase i now switched to a dedicated solving server by setting m.GEKKO(remote=True, server='http://10.0.0.10'). This server is a freshly installed Ubuntu 20.04.1 machine hosting just the linux APmonitor server. Everything works fine so far except that my control and also estimation cycle times increase linearly with every additional solve for the remote solving option. I use my building model with IMODE=6 for MPC and IMODE=5 for MHE and both show the same behaviour.

In the following figure you can see the cycle times and solve times for local and remote solve. These tests were conducted with the IPOPT MA57 Solver, but others show the same behavior. My program does not change except for switching from remote=False to my solving server remote=True, server='http://10.0.0.10'.

Figure of increased cycle time while solving time stays constantly low

In addition, I have added the timing diagnosis output of the solver after a few steps (left) and after a few hundred steps (right). It clearly shows a much higher system time but very similar solving time but I could not find the culprit here either. Except for this test I have always set DIAGLEVEL=0 and m.solve(disp=False) to reduce the console output.

Timing diagnosis output

However, re-initializing the GEKKO model and starting again immediately drops the cycle times. This shows that it is not a problem with the server’s compute capacity reducing over time. For me this looks like some file the process reads and/or writes bloats with the number of iterations but I could not find anything significant in the current model directory. Maybe someone knows the problem and can help me with this one.

Edit 1:

Trying the example script provided by John Hedengren in his answer showed a different result for me. The following picture again shows the same linearly increasing behavior. I ran this test with a different physical machine, also using a different physical network adapter at the solving server. Previously i even switched the physical machine of the solving server. Tested with Python Versions 2.7.16, 3.7.7 and 3.8.5.

Example script output

For me this leads to the conclusion that this situation is not caused by my MPC application. It also doesn't seem to be related to a hardware problem at the client, the network , or the server. IMHO it looks like a server software problem.

A few specs here:

  • Ubuntu 20.04.1
  • Apache 2.4.41
  • PHP 7.4.3
  • APMonitor, Version 0.9.2

Edit 2:

After using John Hedengrens sample problem to test multiple solving servers i came to the conclusion that the behavior of rising cycle times is not unique to my Ubuntu linux setup. Here a few examples:

  • Custom APMonitor server on CentOS7 with MA97Solver (LAN) Custom APM on CentOS7

  • APMonitor server on Windows 10 (LAN) Standard APM on Windows 10

  • Official APMonitor solving server (Web) Official APMonitor solving server

    to make it make it more obvious i removed the oulierers in the next picture Official APMonitor solving server (outliers
removed)

In my case a cycle time increase of about 1 second per 1000 cycles (for this sample problem) happens regardless of the used remote solver (local solve is always fine). The effect of this steady cycle time increase is not as prominent if the solve/cycle times are longer anyway or the number of cycles is not high enough. For my application however this is very problematic as i have to run about 40k cycles as fast as possible.

Edit 3:

Implementing the changes to gekko.py and apm_line.php John Hedengren suggested in his answer fixed the problem partly but not entirely. There is still an increase in cycle time. Once again, looking closer and longer reveals the issue. The following image shows the output of the usual test script, this time for the updated versions of gekko.py and apm_line.php a linux APM server (Windows shows same behavior) and 5k steps. Looking very closely you can even see this effect in the diagram (1k runs) John posted in his answer edit.

Rising cycle times for updated gekko

1

1 Answers

1
votes

Here is an example script that solves for 100 cycles with remote (public server) or local solve. The remote (public server) solve is on a Linux server with solver IPOPT. The local solve is on Windows 10 using apm.exe. Both are solved with Python 3.7.

Clock and Solve Time

import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as plt
import time

for ri,r in enumerate([True,False]):
    m = GEKKO(remote=r)
    m.time = np.linspace(0,10,21)
    mass = 500; b = m.Param(value=50); K = m.Param(value=0.8)
    p = m.MV(value=0, lb=0, ub=100); v = m.CV(value=0)
    m.Equation(mass*v.dt() == -v*b + K*b*p)
    m.options.IMODE = 6 #control
    p.STATUS = 1; p.DCOST = 0.1; p.DMAX = 20
    v.STATUS = 1; m.options.CV_TYPE = 2
    v.SP = 40; v.TR_INIT = 1; v.TAU = 5

    s=time.time(); ct=[]; st=[] # cycle time, solve time
    for i in range(100):
        m.solve(disp=False)
        e=time.time(); ct.append(e-s); s=e
        st.append(m.options.SOLVETIME)

    plt.subplot(2,1,1+ri)
    plt.plot(ct,label='clock time')
    plt.plot(st,label='solve time')
    plt.legend()
    if r:
        plt.title('Remote Solve')
    else:
        plt.title('Local Solve')
        plt.xlabel('cycle');
    plt.ylabel('time (sec)')
plt.show()

There is no linear increase in solve time with this example. There may be something else happening in the solve loop of your application that is giving a linear increase in time for remote=True. It may also be something with the network.

Information from gekko.py

The gekko library has different methods for solving remotely versus locally. When solving remotely, the files are created and sent to the server. Results are then generated and passed back to the local temporary directory m.path or m._path that can be viewed with m.open_folder(). One thing to try is to time the steps in the gekko.py file. Another thing to observe is the size of the files that are passed to the server.

def send_if_exists(extension):
    path = os.path.join(self._path,self._model_name + '.' + extension)
    if os.path.isfile(path):
        with open(path) as f:
            file = f.read()
        cmd(self._server, self._model_name, extension+' '+file)

#clear apm and csv files already on the server
cmd(self._server,self._model_name,'clear apm')
cmd(self._server,self._model_name,'clear csv')

#send model file
with open(os.path.join(self._path,self._model_name + '.apm')) as f:
    model = f.read()
cmd(self._server, self._model_name, ' '+model)
#send csv file
send_if_exists('csv')
#send info file
send_if_exists('info')
#send dbs file
with open(os.path.join(self._path,'measurements.dbs')) as f:
    dbs = f.read()
cmd(self._server, self._model_name, 'option '+dbs)
#solver options
if self.solver_options:
    opt_file=self._write_solver_options()
    cmd(self._server,self._model_name, ' '+opt_file)

#extra files (eg solver.opt, cspline.data)
for f_name in self._extra_files:
    with open(os.path.join(self._path,f_name)) as f:
        extra_file_data = f.read() #read data
        extra_file_data = 'File ' + f_name + '\n' + extra_file_data + 'End File \n'
    cmd(self._server,self._model_name, ' '+extra_file_data)

#solve remotely
response = cmd(self._server, self._model_name, 'solve', disp, debug)

#print APM error message and die
if (debug >= 1) and ('@error' in response):
    raise Exception(response)

#load results
def byte2str(byte):
    if type(byte) is bytes:
        return byte.decode().replace('\r','')
    else:
        return byte

try:
    results = byte2str(get_file(self._server,self._model_name,'results.json'))
    f = open(os.path.join(self._path,'results.json'), 'w')
    f.write(str(results))
    f.close()
    options = byte2str(get_file(self._server,self._model_name,'options.json'))
    f = open(os.path.join(self._path,'options.json'), 'w')
    f.write(str(options))
    f.close()
    if self.options.CSV_WRITE >= 1:
        results = byte2str(get_file(self._server,self._model_name,'results.csv'))
        with open(os.path.join(self._path,'results.csv'), 'w') as f:
            f.write(str(results))
        if self.options.CSV_WRITE >1:
            results_all = byte2str(get_file(self._server,self._model_name,'results_all.csv'))
            with open(os.path.join(self._path,'results_all.csv'), 'w') as f:
                f.write(str(results_all))
except:
    raise ImportError('No solution or server unreachable.\n'+\
                      '  Show errors with m.solve(disp=True).\n'+\
                      '  Try local solve with m=GEKKO(remote=False).')

If a variable such as response is growing in size then some of the string operations may be taking longer.

Edit: Tried the script with 1000 cycles and there is an increase in cycle time for the remote server.

1000 cycles

I checked the server directory and the problem is with the file overrides.dbs that grows each cycle when using remote=True.

-rw-r--r--. 1 apache apache  76911 Jan 26 07:43 overrides.dbs
-rw-r--r--. 1 apache apache 119305 Jan 26 07:43 overrides.dbs
-rw-r--r--. 1 apache apache 219745 Jan 26 07:44 overrides.dbs

As this file grows, it takes longer to interpret the longer list of options. To resolve this problem, I made changes to Gekko gekko.py and to the server file apm_line.php.

Changes to gekko.py to appear in Gekko next release beyond 0.2.8) or in GitHub:

            cmd(self._server,self._model_name,'clear meas')
            cmd(self._server, self._model_name, 'meas '+dbs)

Changes to apm_line.php also found on GitHub.

        } elseif (strtolower(substr($add, 0, 5))=="meas ") {
                $option = trim(substr($add,5)); // starting at 5th position - extract option
                $fn_opt = $d . "/measurements.dbs";
                if ( !file_exists($fn_opt)) {
                    $meas = fopen ($fn_opt, 'w');
                } else {
                    $meas = fopen ($fn_opt, 'a');
                }
                fwrite ($meas, "\n".$option);
                fclose($meas);
                echo "Successfully added meas: " . $meas;
                fclose($handle);
        } elseif  (strtolower($add)=="clear meas") {
                // clear file contents by opening in write mode
                $fn_csv = $d . "/measurements.dbs";
                $meas_file = fopen ($fn_csv, 'w');
                // write a space to the file to clear contents
                fwrite ($meas_file, " ");
                fclose ($meas_file);

                //$clear = "! Cleared measurements.dbs file contents";
                //fwrite ($handle, "\n".$clear);
                fclose($handle);
                echo "Cleared measurements.dbs file";

This gives the improved results with no increase in time.

Improved results

Thanks for your help in finding this bug!