Keith Swenson's Go Flow blog continues to produce thought-provoking discussions of BPM issues. Check it out if you are not a subscriber. His latest concerns simulation, one of my hot buttons. A couple years ago I wrote that simulation was a "fake feature" - one of those things vendors put in the tool to tick off the Gartner checklist but which don't do anything useful. Since then the situation has not improved to any great degree. This is too bad, because, as Keith suggests, simulation can be of great value in projecting the expected performance improvement from a process change before committing the resources needed to make that change.
But it would be better to say it could be of great value, if the tools were any good. I recently did a small consulting project for a BPMS vendor on what was good and not so good about their product. They really hyped their simulation tool, but I had to tell them it was, in my opinion, mostly useless, because it did not distinguish between the active time of a process activity, which consumes the assigned resource, and wait time (sometimes called lag time), which does not. It considered the total time to be active time.
The reason I call it useless is that in most process improvement projects, the problem is not too few widget-tweakers assigned to the widget-tweaking step, causing a backlog when there is a spike in widget orders. That simulation use case, which I call optimizing resouce allocation, is real in certain heads-down scenarios, such as call centers and backend clerical processes, but it's not the main one BPM project teams are dealing with.
Far more common is the process improvement use case, which aims for improvement in some metric, usually cycle time but occasionally cost or quality, based on changing flow of process activities. The resource assigned to an activity - approval by a manager, for instance - is not fully dedicated to that activity, nor even other activities described in concurrent simulation models. The active time to perform the task bears no relationship to the actual time to complete it.
Separation of active time and lag time parameters for an activity is just one of many real-world requirements ignored by many simulation tools. Here are some others:
- Event probability and time of occurrence. In BPMN, events provide an expressive visual language for describing the exceptions that occur in real-world processes. In fact, these exceptions are usually at the root of performance problems in the as-is process. To project the expected to-be improvement, you need to be able to assign a probability and mean time of occurrence for events in the process model. Most simulation tools ignore events.
- Repeating activities. BPMN has two types of repeating activities, called looping (DoWhile) and multi-instance (ForEach). You need a simulation parameter to model the number of iterations.
- Instance properties. In most simulation models, the probabilities at each node are uncorrelated. In real-world processes they are highly correlated. For instance, the duration of a particular task, the probability of taking a particular gateway output, and the probability of some event occurring usually track together. In other words, certain classes of instances tend to take longer, tend to take path 1 rather than path 2 out of the gateway, and have a higher than usual probability of some in-flight event. Some BPM Suites have optimization routines in their BAM/Analytics component that actually figures these correlations out for you! But try to find a simulation tool that lets you correlate them in the simulation model. One way is to define the simulation parameters not as a simple number (mean and standard deviation) but as an expression of one or more instance properties, such as orderValue, which could take values high, medium, and low. This makes configuring instance generation more complex, as you need to define the rate of each type, but it could provide much better output.
- Contingent resource assignment. Most simulation tools let you assign tasks to roles with some defined cost-per-hour or cost-per-use parameter, or groups of same. But not many let you say assign to role A as the primary resource, but if no member of role A is available then assign to role B.
- Prepopulation of backlogs. Simulation models generally start empty, meaning no instances in the system. Besides the obvious distortion at the beginning of the simulation period, this does not allow the resource allocation optimization use case to apply when it is really needed, i.e. when an actual running process is jammed up with backlogs and you need to try various alternatives for working out of it.
- Access to raw output. The prebuilt metrics and charts provided by simulation tools are convenient but they rarely provide the detail you need for real analysis. For that you need the raw output, one record for each process instance, and also one record for each activity or event instance, dumped into Excel or a database. From that you can create histograms, provide activity-based costing, and perform other useful analysis. Without it you basically have eye candy.