When executing simulations one is often facing the task to running simulation studies with varying parameters. The simulation results are then plotted as a family of curves. Especially in the case of discrete event simulations, the task of running the complete simulation study can be efficiently distributed on multiple compute nodes, as every parameter combination can be treated as an individual simulation. This article describe a scalable yet inexpensive setup to execute such simulation studies.
As a disclaimer it shall be mentioned, that scalability of course has limitations and there are good reasons why there are dedicated architectures for high performance computing. Yet for the purpose of discrete event simulations the presented setup is a suitable choice.
The diagram below depicts the setup, which consists of multiple compute nodes, a central, shared storage system and a node to control the simulation. Even if the diagram shows these roles as individual nodes, a single node can act in multiple roles. E.g. a compute node can act at the same time as storage system or control node. The bottleneck in such a setup is usually the storage system, but alternatives like distributed file systems can improve the performance.
The structure of the simulation study, i.e. the different parameters and their values can be directly reflected in the shared file system provided by the central storage system.
File System Structure
As all nodes share the same file system, the parameters, their values but also the simulation results can be stored in this file system. This allows to manage the simulation study, i.e. adding or removing parameters, with native tools provided by the operating system. But of course the simulation study can also be managed with specialized tools.
In the following we will call the file system directory, that holds the simulation study, study root. The study root can one the one hand contain the parameters, values and results, but on the other hand also all scripts or binaries to actually execute the simulation. The listing below shows an example of an empty study root.
├── MetaData └── Results ├── ParaS__Strategy__ProportionalFair │ ├── ParaF__Load__0.1 │ ├── ParaF__Load__0.5 │ └── ParaF__Load__1.0 └── ParaS__Strategy__RoundRobin ├── ParaF__Load__0.1 ├── ParaF__Load__0.5 └── ParaF__Load__1.0
The directory MetaData contains configuration files, the simulation binaries and parameter file templates. The directory Results holds the structure of the study root, i.e. the parameters and their values. In this case the parameter Strategy of type string with the values ProportionalFair and RoundRobin and the parameter Load of type float with the values 0.1, 0.5 and 1.0 exists. All files created by the simulation binary would be created as leaves of a specific parameter combination.
The parameter file template could look as follows.
Network.Load = %%Load%% Scheduler.Strategy = %%Strategy%%
The task of the simulation control are twofold. The first task is to manage the study root, i.e. creating it, adding parameters and values. The second task is to actually execute the simulation study. While the first task can also be performed with native programs, the second task requires special tooling.
To execute the simulation for a single parameter combination, the simulation control reads the parameter file template located in the directory MetaData, replaces all placeholders with the actual values of the parameters and writes the parameter file in the corresponding leaf in the study root. For example for the parameter combination
Strategy = ProportionalFair and
Load = 0.1, the simulation control would replace the placeholders %%Strategy%% with ProprtionalFair and %%Load%% with 0.1. The resulting parameter file would be written in the directory
After creating the parameter file, the simulation control executes the simulation binary remotely on one of the available compute nodes. Additionally the working directory is set correspondingly and the created parameter file is passed to the simulation binary. For this purpose standard tooling like SSH is used.
Additional scheduling rules can be considered by the simulation control, e.g. in order to prioritize some parameter combinations or compute nodes. Also the available capacity of the compute nodes with respect to compute resources or memory can be considered.
As mention special tooling to manage simulation studies exists. An example of such a tool is IKR SimTree which is developed by the Institute of Communication Networks and Computer Engineering of the University of Stuttgart. However, IKR SimTree is not actively maintained and has some special dependencies to the infrastructure it is executed on.
This is why we will publish a complete rewrite name SimTreeNG in the near future. Also, we will discuss in future articles the details of the setup we briefly introduces in this article.