Traceability in Simulation Studies

This article deals with a challenge of simulation studies. In particular it deals with traceability. We define what the term traceability means and what is not meant by it. Additionally we illustrate a concept how to achieve traceability with established methods from software engineering.

What is traceability

The term traceability describes the fact that all steps necessary to create results with simulation studies are linked with each other. Thereby traceability is the foundation of reproducible simulation results. A basic workflow when producing results out of simulation study consists of the following steps.

  1. Development of the simulation model
  2. Configuring the simulation model
  3. Executing the simulation study
  4. Collecting the simulation results
  5. Post-processing the simulation results, e.g. into plots

With traceability it is possible to link all steps. E.g. it is possible to link a plot to the execution environment used to run the study. The execution environment is linked to the configuration of the simulation model and the configuration is finally linked to the source code of the model itself.

It is important to mention that traceability does not guarantee that the simulation results are correct. This brings us to the next section, which discusses what traceability is not.

What traceability is not

Traceability may not be mixed up with validity or credibility of simulations. While there are methods, partly also known in software engineering, to ensure the correctness of simulations, it is in general much more difficult to guarantee the credibility of simulations.

This topic deserves an own article and will be covered in a later post.

How to achieve traceability

Traceability is already widely adopted in software engineering, even if it is not known under this term. The remainder of this article will discuss these concepts and illustrate how these can be applied to simulation studies.

Version control

A version control system allows to keep a complete history of changes applied to a set of files. This allows to track all changes as well as to restore older versions. Each version is identified by an unique identifier. While version control systems can manage all kind of files, they work especially well for text based files because there the changes can directly be shown and read by the users of the version control. Therefore, all source code used in simulation studies, e.g. the source code of the model, configuration files, scripts to execute the simulation or collect the results, and also the results themselves, should be manged by version control systems.

Versioning

If the simulation depends on external libraries or tools, the exact version of the tools should be included in the configuration of the study. The configuration itself should be managed by a version control system. Thereby, the different components are linked and traceability is achieved.

Meta data

Finally, meta data allows to embed additional information in other data, e.g. plots of simulation results. A plot should contain the exact version of the simulation environment as meta data. Thereby, it is traceable how this plot has been created and thereby also how the simulation model has been executed and configured and what the exact version of the simulation model was.

Scalable Setup for Simulation Studies

Introduction

When executing simulations one is often facing the task to running simulation studies with varying parameters. The simulation results are then plotted as a family of curves. Especially in the case of discrete event simulations, the task of running the complete simulation study can be efficiently distributed on multiple compute nodes, as every parameter combination can be treated as an individual simulation. This article describe a scalable yet inexpensive setup to execute such simulation studies.

As a disclaimer it shall be mentioned, that scalability of course has limitations and there are good reasons why there are dedicated architectures for high performance computing. Yet for the purpose of discrete event simulations the presented setup is a suitable choice.

Setup

The diagram below depicts the setup, which consists of multiple compute nodes, a central, shared storage system and a node to control the simulation. Even if the diagram shows these roles as individual nodes, a single node can act in multiple roles. E.g. a compute node can act at the same time as storage system or control node. The bottleneck in such a setup is usually the storage system, but alternatives like distributed file systems can improve the performance.

Simulation Setup

The structure of the simulation study, i.e. the different parameters and their values can be directly reflected in the shared file system provided by the central storage system.

File System Structure

As all nodes share the same file system, the parameters, their values but also the simulation results can be stored in this file system. This allows to manage the simulation study, i.e. adding or removing parameters, with native tools provided by the operating system. But of course the simulation study can also be managed with specialized tools.

In the following we will call the file system directory, that holds the simulation study, study root. The study root can one the one hand contain the parameters, values and results, but on the other hand also all scripts or binaries to actually execute the simulation. The listing below shows an example of an empty study root.

├── MetaData
└── Results
    ├── ParaS__Strategy__ProportionalFair
    │   ├── ParaF__Load__0.1
    │   ├── ParaF__Load__0.5
    │   └── ParaF__Load__1.0
    └── ParaS__Strategy__RoundRobin
        ├── ParaF__Load__0.1
        ├── ParaF__Load__0.5
        └── ParaF__Load__1.0

The directory MetaData contains configuration files, the simulation binaries and parameter file templates. The directory Results holds the structure of the study root, i.e. the parameters and their values. In this case the parameter Strategy of type string with the values ProportionalFair and RoundRobin and the parameter Load of type float with the values 0.1, 0.5 and 1.0 exists. All files created by the simulation binary would be created as leaves of a specific parameter combination.

The parameter file template could look as follows.

Network.Load = %%Load%%
Scheduler.Strategy = %%Strategy%%

Simulation Control

The task of the simulation control are twofold. The first task is to manage the study root, i.e. creating it, adding parameters and values. The second task is to actually execute the simulation study. While the first task can also be performed with native programs, the second task requires special tooling.

To execute the simulation for a single parameter combination, the simulation control reads the parameter file template located in the directory MetaData, replaces all placeholders with the actual values of the parameters and writes the parameter file in the corresponding leaf in the study root. For example for the parameter combination Strategy = ProportionalFair and Load = 0.1, the simulation control would replace the placeholders %%Strategy%% with ProprtionalFair and %%Load%% with 0.1. The resulting parameter file would be written in the directory ./Results/ParaS__Strategy__ProportionalFair/ParaF__Load__0.1.

After creating the parameter file, the simulation control executes the simulation binary remotely on one of the available compute nodes. Additionally the working directory is set correspondingly and the created parameter file is passed to the simulation binary. For this purpose standard tooling like SSH is used.

Additional scheduling rules can be considered by the simulation control, e.g. in order to prioritize some parameter combinations or compute nodes. Also the available capacity of the compute nodes with respect to compute resources or memory can be considered.

Tooling

As mention special tooling to manage simulation studies exists. An example of such a tool is IKR SimTree which is developed by the Institute of Communication Networks and Computer Engineering of the University of Stuttgart. However, IKR SimTree is not actively maintained and has some special dependencies to the infrastructure it is executed on.

This is why we will publish a complete rewrite name SimTreeNG in the near future. Also, we will discuss in future articles the details of the setup we briefly introduces in this article.