Hello,
It has been found that when we are running the MTB with all 152 cases enabled, the simulation breaks due to an Out of Memory (OOM) issue in PSCAD. This issue has been observed on multiple workstations that we are using within the team. We have contacted Manitoba to determine whether the origin of the issue is related to hardware or another cause. They have provided the following answer after some back-and-forth communication:
"I can verify that PSCAD is not the problem with memory. This is obvious when I look at the loading on my test machine while the simulation is running. I think the problem is in the Python modules you are using and what you are doing with the data."
During our investigation of this issue, it was found that the OOM occurrence depends on how many output signals are recorded during the simulation. By minimizing the number of recorded signals, we are able to run more cases in parallel without encountering the OOM issue. However, we are still not able to run all 152 cases in a single run, and it is still necessary to split them into batches. We have also tried reducing the volley size to half of the maximum CPU cores, but the issue still appears. It is very difficult to debug because when it occurs, the whole system collapses and the workstation needs to be restarted, without being able to see any logs in Task Manager or elsewhere.
Another workaround is to create a script that, instead of running all cases in one run, splits the simulation into batches based on the volley size defined in the config.ini file. The script then distributes the task_id to the first X cases from the testcases.xlsx file. After finishing the first batch run, it exports the output files into the destination folder before moving on to the next batch with X cases running in parallel. Using this approach, it was possible to run all 152 cases without encountering the OOM issue. The only downside is that the simulations are slightly delayed because, after each batch, the script needs to export the results before moving to the next batch and recreating the study cases. However, the time difference is only a few minutes, which from my point of view is negligible when the total simulation time is already around 7–8 hours.
One more observation from my side is that the problem started occurring when the MTB shifted from .csv to .out output files. Before this change, we were able to run all cases without any issues.
At this point, I do not know if other developers are experiencing the same issue. We also have a dedicated workstation for this type of work. For example, I am running PSCAD on a machine with 24 cores and 32 GB of RAM.
Hello,
It has been found that when we are running the MTB with all 152 cases enabled, the simulation breaks due to an Out of Memory (OOM) issue in PSCAD. This issue has been observed on multiple workstations that we are using within the team. We have contacted Manitoba to determine whether the origin of the issue is related to hardware or another cause. They have provided the following answer after some back-and-forth communication:
"I can verify that PSCAD is not the problem with memory. This is obvious when I look at the loading on my test machine while the simulation is running. I think the problem is in the Python modules you are using and what you are doing with the data."
During our investigation of this issue, it was found that the OOM occurrence depends on how many output signals are recorded during the simulation. By minimizing the number of recorded signals, we are able to run more cases in parallel without encountering the OOM issue. However, we are still not able to run all 152 cases in a single run, and it is still necessary to split them into batches. We have also tried reducing the volley size to half of the maximum CPU cores, but the issue still appears. It is very difficult to debug because when it occurs, the whole system collapses and the workstation needs to be restarted, without being able to see any logs in Task Manager or elsewhere.
Another workaround is to create a script that, instead of running all cases in one run, splits the simulation into batches based on the volley size defined in the
config.inifile. The script then distributes thetask_idto the first X cases from thetestcases.xlsxfile. After finishing the first batch run, it exports the output files into the destination folder before moving on to the next batch with X cases running in parallel. Using this approach, it was possible to run all 152 cases without encountering the OOM issue. The only downside is that the simulations are slightly delayed because, after each batch, the script needs to export the results before moving to the next batch and recreating the study cases. However, the time difference is only a few minutes, which from my point of view is negligible when the total simulation time is already around 7–8 hours.One more observation from my side is that the problem started occurring when the MTB shifted from
.csvto.outoutput files. Before this change, we were able to run all cases without any issues.At this point, I do not know if other developers are experiencing the same issue. We also have a dedicated workstation for this type of work. For example, I am running PSCAD on a machine with 24 cores and 32 GB of RAM.