What's Wrong With This Picture? Part 2

Only one reader (thank you, Kevin Brennan!) tried to identify the errors, so I'm going to do it myself. In the first example (images 1 and 2 in the original post), I've circled each of the errors and given them numbers, referenced in the explanation below.

whatswrong11.png (click to enlarge)

Let's start with the simple notation errors. Errors 1, 2, 3, 4, and 5 are all essentially the same. The small diamond on the tail of a sequence flow (solid arrow) means the sequence flow is conditional. The spec says it is used only when the sequence flow originates in an activity (task or subprocess), not when it originates in a gateway (diamond shape). Two unconditional sequence flows (i.e., without the little diamond) originating from the same activity means a parallel split (same as a gateway with a + inside). If some of the sequence flows from an activity have a condition (little diamond on the tail) it is the same as what BPMN calls an OR-gateway (diamond with an O inside).

An exclusive gateway (diamond with an X inside) means that only one of its outgoing sequence flows is enabled, based on the gateway condition (a property of the gateway, not a property of the sequence flow). So to correct errors 1-3, get rid of the sequence flow conditions (little diamonds).

Error 4 is a conditional sequence flow from a timer intermediate event, indicating an exception flow. A conditional sequence flow cannot originate from an intermediate event, per the BPMN spec. Get rid of the little diamond here too.

Error 5 is like errors 1-3 except here it is an event-triggered gateway, basically the same as an exclusive gateway except the process takes the path determined by the first event received. Get rid of the little diamonds.

Error 6 I've shown in 2 pieces, 6a and 6b. The spec says a sequence flow into an expanded subprocess should be drawn either to the boundary of the subprocess or to the start event drawn attached to the boundary. 6a shows the flow drawn to the start event inside the subprocess. OK, that's kind of minor, until you pair it with 6b, which shows the sequence flow out of the subprocess drawn from one of the end events. There are 2 other end events in this subprocess, one labeled Rejection Response (which generates a message) and another labeled Bookings. If you trace through the diagram, you'll see that only one of these 3 end events can be reached, so reaching any one of them ends the subprocess, and the flow continues to the next step, which in this case is an end event for the entire process (but it could be another activity). The diagram is confusing because all 3 end events in the subprocess terminate the subprocess, but the sequence flow is drawn to only one of them. To fix, best to attach the sequence flows to the subprocess boundary.

Errors 7-8 are the same -- intermediate events intended to be compensation events (perhaps not obvious from the diagram, but in this case so indicated by the event properties in the tool itself). Compensation events should be indicated with the rewind symbol inside.

But the problem here is much more serious than that. A compensation event attached to an activity means that in response to a compensate signal (generated by a compensation end event or a transaction cancel end event), IF the activity has completed successfully, the specified compensation activity is executed to undo the completed activity, as part of the recovery logic of a business transaction. Modeling business transactions is a little complicated, but suffice it to say for now that the link from a compensation event to the compensating activity is an "association" (dotted line), NOT a sequence flow. A compensation event does not trigger an exception flow that interrupts the activity and then proceeds down its exception sequence flow. The activity must have already completed for the compensation event to have effect, and its effect is limited to executing the compensating activity linked by the association. So errors 9 and 10 are associations drawn as sequence flows. Errors 11 and 12 are the compensating activities shown without the compensation (rewind) symbol. Errors 13 and 14 are sequence flows out of the compensating activity, which is forbidden by BPMN. In a nutshell, this diagram confuses fault handling with compensation, and combines them in a flow triggered by a compensation event. They are separate things, but both are part of recovering from an exception (business exception or system fault) in a business transation.

Error 15 is not a notation error but a business semantics error. I think it's just the tip of the iceberg in this particular example, but it's illustrative of the exception handling problem in general.

Let's look at what this travel booking process is trying to do. The first step is to verify the form of payment (but not charge the customer). If the verification fails, the process ends with a message to the customer. If the verification succeeds, the customer's trip request is processed to create one or more candidate itineraries back to the customer, requesting confirmation. If the confirmation is not received in 2 days, the travel service terminates the process and sends a cancellation notice to the customer. If the confirmation is received in time, air and hotel are booked in parallel. After both are booked the customer is charged. If there is a problem with the debit, a notice is sent to the customer, and the process waits for an event in return, either an Update message or a Cancel message. "Update" presumably means "try again" since it loops back to the same Charge Buyer activity, and "Cancel" means abort the booking. If no response is received in 2 days, the task Contact Customer executes after which the process retries the debit (that doesn't sound exactly right!) This retry cycle can repeat up to N times, after which the process terminates.

Attached to the entire process (except for the first verify form of payment step) is an intermediate message event listening for a Cancel message. (Presumably this is a different Cancel than the one specific to the Payment Problem notification described above.) No matter where in the process this occurs -- before booking, during booking, after booking but before debiting customer, etc -- the processing is the same (Handle Cancel Message). That is incorrect, since how you handle the cancel depends on the state of the process... how far you've gotten. Of course you could put all that state-dependent processing inside Handle Cancel Message, but then why did you draw this diagram in the first place?

I intended error 15 to mean modeling a single cancellation handling activity for the process regardless of state, where in reality the cancellation behavior has to be state-dependent. On reflection now, I believe that the author was trying to model a business transaction in which a Cancel message aborted the transaction and triggered compensation of the bookings and debits, but the exception flows, compensation events, and other BPMN rules got him so befuddled that the business semantics were completely lost.

It is tempting to draw from this example the conclusion that BPMN is too hard for business analysts to understand, or perhaps exception handling in BPMN is too hard. I don't believe it. It's just that there isn't much educational material out there that shows people how to use BPMN correctly. Surely a business analyst should be able to model a customer transaction process that describes when and how to charge the customer, issue credits, allow cancellation, etc. Should all the exception-handling be left out of the model and turned over to the programmers? I don't think so.

In a few days I'll work out what I think would have been the right way to do this. It's good practice for me as I put together my BPMN training.