What is a "Process"?

The whole reason for BPMN's existence is to provide a notation that is outwardly familiar to flowcharters but has precise semantics and rules as demanded by analysts, architects, and developers. Nevertheless, the BPMN 2.0 spec fails to offer a business-understandable definition of its most basic concepts... starting with "process".

What does BPMN mean by a process, anyway? It doesn't mean the same thing many long-time flowcharters think it means, as evidenced in a recent comment thread here. Blame for this lies mainly with the BPMN spec itself, which adds to its sins of omission by continual reference to both pools and lanes as "swimlanes," when they mean entirely different things. (Microsoft is even worse by using the same shape for pool and lane in Visio Premium's BPMN stencil!) For some unknown reason, the BPMN spec is reluctant to acknowledge that a pool is primarily a container for a process (in addition to its secondary value as a participant in a collaboration).

A BPMS Watch reader wrote to ask why a diagram like this

gives BPMN validation errors. The tool's validation function is correct. BPMN (and any good BPMN tool) requires a sequence flow between task 1 and task 4. Otherwise the pool named "Process 1" contains two processes, one containing task 1 and another containing task 4, and a pool can only contain one process. Or, you could conceivably argue that there is one process but that task 1 and task 4 are on alternative paths within it... certainly not what the modeler intended.

In BPMN, a process model depicts all the possible paths from the initial state of any process instance to its final state, and the logic that causes the instance to take one path vs another. That routing logic is based on a combination of instance data (via gateways) and events that occur. Because the logic for all possible paths from start to end is known in advance, the process could in principle be automated in an engine... even though few modeled processes are in fact automated.

A sequence flow - the solid arrow - is BPMN's symbol for flow of control within a process. When the node at the tail of the sequence flow is complete, the node at the arrowhead is enabled to start. All nodes in a process must be linked via a continuous chain of sequence flows from the process's start to an end node. The shape that separates what is inside a process from what is outside is called a pool. Sequence flows are confined within a pool; they cannot cross its boundary. Other than its rectangular shape a pool has almost nothing to do with a lane, which is a graphical subdivision of a process used to categorize activities. Historically lanes have been widely used to indicate the role or department that performs or is responsible for the activity, but technically they can be used for any categorization (critical vs non-critical, etc.)

Usually the start node of a process is denoted by a start event and end node by an end event. But BPMN also allows implicit start and end nodes as well. Any flow object (activity, gateway, or event) with no incoming sequence flow (with a few special exceptions) is an implicit start, and any flow object with no outgoing sequence flows is similarly an implicit end. Until recently, BPMN had a rule that you could not have implicit start or end nodes in a process level if it contained any actual start or end events. That's in fact the reason why the diagram above, in which task 4 is an implicit start node of Process 1, gives a validation error.

But Steve White reports that that rule was eliminated in the final BPMN 2.0 spec. The diagram above shows why that was a mistake. With the elimination of constraints on implicit start and end events, you could argue that the diagram above is "valid." But it still makes no sense, and it certainly does not reflect the modeler's intent. Here's why.

It would mean that in Process 1, task 4 is a second start node of the process, along with the start event. What does it mean when a process has more than one start node? Here BPMN is a bit confusing. Multiple start events represent alternative start nodes, i.e., only one of them represents the start node for any given instance. That is certainly the case for triggered starts, denoted by a trigger icon inside the event. You could have an order that comes in through the call center and an order that comes in through the web. They would have separate Message start events, and the paths out of those events would merge downstream (if they are not, in fact, entirely independent processes). For None starts, I believe the same applies. I interpret None start events in a top-level process as manual start by a task performer, but the spec does not insist on that. The BPMN 2.0 spec says that an implicit start node acts as if it were preceded by a None start event.

If the diagram above is considered valid, the start event and task 4 become alternative start nodes of Process 1. But task 4 was certainly not intended to represent the initial state of Process 1. Also, an instance of Process 1 that contains task 4 would be independent of the instance that contained task 1, rather than a continuation of it, as the modeler obviously intended.

It's actually very simple: if task 1 and task 4 are meant to represent actions on the same process instance, they need to both lie on the same chain of sequence flows from process start to end. That's what a "process" in BPMN means.

(By the way, Process 2 should have a start and end event as well, and the message flow from task 1 should really go the Message start of Process 2. A Message start event means create a new instance of this process whenever this message arrives.)

The best solution of all is to make those pools lanes instead, and replace the message flows with sequence flows. That's probably what the modeler intended... before the BPMN spec (and Visio) muddied the waters with the notion that pools and lanes are sort of the same thing.