I rarely get comments on my obscure techie BPMN 2.0 posts, but this one seems to have legs. Kris Verlaenen of jBPM has a thoughtful response, posted both as a comment to mine and on his own site (to show a diagram). He says,
It seems to me, from reading the specification, that data input and output associations are meant to be “local”, by which I mean they are not intended to be referenced from outside the element in which they are defined. On top of that, it also seems to me that they are intended to be “immediate” or “instantaneous”… meaning they are “executed” when they are reached, but they don’t exist anymore after that. Therefore, I would agree with your statement that process data inputs should not be the source of a data input association, as the process data input association only exists when the process is invoked and at that point, that data input association is executed…. I think the ioSpecification of the process, that contains these process data inputs, should also contain data output associations that map these data inputs to more persistent (as opposed to immediate or instantaneous) item-aware elements like a property or a data object. These can then serve as the source of other data input associations.
Intuitively I agree with Kris. Also, this is the way Oracle said their BPMN 2.0 engine works. And it is different from the proposal from Falko Menge of Camunda/Activiti. Pictures may be helpful.
Here is a diagram showing a process data input directly feeding an activity data input, as favored by Falko Menge:
And here is the alternative suggested by Kris Verlaenen, in which the process data input is first stored in a data object, which then feeds the activity data input:
In a way, both these diagrams imply that the input data just shows up magically, but that is not the case. There are only two ways, in reality, that the input data can arrive at the process data input, either from the start event trigger (such as a message) or from a call activity in a “parent” BPMN process. The BPMN spec allows a message start event to have a dataOutputAssociation to the process data input. So in this case, I believe Falko’s picture would look like this:
And I think Kris’s picture (also that of Oracle) would look like this:
Here the process data input just serves invocation by call activity. When triggered by a message, the data flow is from the message start event to a data object to the activity data input.
OK, so what? Two things: First, whichever approach is correct, clarification in the BPMN 2.0 (2.1?) spec is needed. Falko proposed some language that allows direct data association from process data input to activity data input, where the question of whether that is allowed is ambiguous in the spec as is. But I think that language would allow all four of these diagrams, and this might limit interoperability of BPMN models where vendors of executable BPMN 2.0 make different assumptions about what a process data input means. For that reason, I think it is important that all interested parties weigh in, so that any clarifying language in the spec makes things better not worse.
Thanks for sticking with this Bruce.
From my Lombardi background, our memory model persists the process input data for the life of the process instance, so it’s implicitly available as an input for every activity. Activities pass data to and from their enclosing process – no Activity to Activity data mapping is possible. Other systems are similar.
Folks coming from a BPEL background (like WPS) had a completely different data model – Data mapping from Activity to Activity is the norm, with explicit Data Stores if you need to persist data for use by subsequent Activities.
Not saying one approach is right or better… but I think this may be the root of these different interpretations of the spec. Now that you are highlighting the discrepancy we may get clarification.
John,
For executable BPMN 2.0 it should not be a matter of interpretation. There is one way. We just need to figure out what it is (and have implementers agree). BPMN 2.0 dataObject is same as BPEL (or Lombardi) “variable”. Scope can be process or subprocess, depending where defined. No question on that. Data input for activity is like a portType, an interface spec. It is not a variable. Data input for a process is ambiguous. BPEL was clearer because data always arrived via a message, where BPMN sometimes implies data arrives “by magic”. I think this is the issue.
Hi Bruce, Hi Kris,
Sorry, for warming up this topic again. 😉
A Data Association between two Item-aware Elements is not allowed in BPMN 2.0, because it can only be owned by an Activity or Event:
p221 (PDF 251) “The DataAssociation class is a BaseElement contained by an Activity or Event, used to model how data is pushed into or pulled from item-aware elements.”
@Kris: Were does the spec say that Process Data Inputs are less persistent than other Item-aware Elements?
If I interpret the spec correctly, Process Data Inputs are similar to the parameters of a method in an object-oriented programming language, which can be read at any time during the execution of that method, just like any other local variable. I understand Process Data Inputs and Outputs as “ordinary” Data Objects that are marked as coming from or going to the outside world.
In “BPMN 2.0 by Example, Version 1.0” (http://www.omg.org/cgi-bin/doc?dtc/10-06-02) on page 36 (PDF 44), Stephen A. White also uses a Process Data Input in the way I described.
I’d suggest we file an issue at the OMG containing my proposal and a couple of open questions. The specification team, which consists of a large number of BPMN tool vendors and consultant companies, can then discuss the topic and decide for a solution in a ballot.
Greetings from Berlin,
Falko
No problem with bringing it up again. It just further demonstrates that the spec is sufficiently ambiguous that reasonable people can disagree. Resolving it in BPMN 2.1 is a good idea, but I don’t think we really want to wait 2-3 more years for that, as the major BPMN 2.0 vendors are developing their code now. It may all be a tempest in a teapot, as the serialization is just an interchange format not necessarily an execution language… although some vendors may treat it as such. But because data flow is part of even the Descriptive subclass, resolving this now is important for BPMN interoperability. I wish IBM, Oracle, SAP, TIBCO etc would weigh in on the question.
Hi, interesting discussion. I agree that data objects need some clarification at the standard specification level.
One question I have wrt the examples you are showing is the following:
in your version of Falko’s picture (second-last img in the post), I’m not really happy about using an input object with an incoming arrow. What if the starting point was a task instead of an event? In that case I would have had an input object coming OUT of a task (and seeing it coming out of an event makes me uncomfortable anyway)..
Falko’s picture is the first one, but since he views Data Input as a storage object (like data object), the question is how does the data get passed into it. It comes from the message. A message start event then maps it via data association to the storage object. With regular data object it certainly works like that (the last picture, omitting the process data input). The second to last is adding the mapping from start event to the data input in Falko’s picture. Kris’s picture, which seems better to me, just uses the process data input to handle trigger by Call Activity (alternative to message start). This was supposedly the intent of the original drafting team, but changed (in ambiguous way) in FTF.
One could say that DataObjects are intended to be “internal” and DataInputs and DataOutputs are inteded to be the “external” data of a considered process. Anyway, there is no “better” term for “inputs” and “outputs”. The term “local” suggests that there is something “global”. Unfortunately, A “global” word is used for the term “global process” and it has nothing to the data. At the other hand, “external” suggests that there is a local space for the data, and external space for the data. Unfortunately, there is a “data store” term which is meant to be an “external space for the data”. 🙂
I am no so sure about Falko being wrong. Process input may not allways be the same process trigger (or process trigger may not always carry input).
Let’s say there’s a custom guitar workshop. Guitars are only made upon a request. Request is a trigger – the start event. But if you take a look at workshop’s SIPOC – a sawmill is the supplier and the wood blanks are input. And since always have blanks in warehouse luthier takes them himself upon process start. It’s not like customer brings his own blank – “here, make me a guitar out of this.”
In this example DI is retrieved from Data store (DS). In other examples DI can be received with trigger from outside process owner’s domain. I think BPMN should be able to handle both.
I also didn’t get the idea of transforming DI into DO:
– why would you do that before process starts?
– and isn’t transformation itself a some sort of task?
I don’t think this is about what is logical. It is about what the spec says. From my investigations, the original drafters (Oracle, IBM, SAP) meant one thing, the FTF (Falko) meant something different, and the spec is ambiguous at best. The issue isn’t “transforming” DI to DO, it’s that DI is an interface and DO is a variable. They are not the same thing. You cannot persist or retrieve a DI. Isn’t it like partnerLink vs input variable in BPEL? In any case, I am confident that a BPMS designer could deal with any agreed-upon interpretation of this, but we won’t have that. This section of the spec is messed up to the point of precluding one serialization interoperable across tools… which supposedly is the point of a standard.
I’m really in over my head here this discussion you talk about “partnerLink vs input variable in BPEL”. I am an analyst and a business user, so all I am concerned about is logic and intuitive visual representation. I am pretty sure the point of BPMN was to have same simple semantics for both business users and IT. IT problem (interoperable across tools) will become an end in itself, then coders and vendors will eventually ruin BPMN for business users, who in turn won’t use it, which will make BPMN useless.
Version 2.0 already is “scarier” than 1.0.
Standart says there are 3 levels (sub-classes) of conformity:
I’m really in over my head here this discussion you talk about “partnerLink vs input variable in BPEL”. I am an analyst and a business user, so all I am concerned about is logic and intuitive visual representation. I am pretty sure the point of BPMN was to have same simple semantics for both business users and IT. IT problem (interoperable across tools) will become an end in itself, then coders and vendors will eventually ruin BPMN for business users, who in turn won’t use it, which will make BPMN useless.
Version 2.0 already is “scarier” than 1.0.
Standart says there are 3 levels (sub-classes) of conformity:
– Descriptive
– Analytic
– Common Executable
I believe BPMN would benefit if it was split in three clearly saparable sections (or even separate documents) according to to those sub-classes, separating visual elements and attributes from technical mambo-jambo intended for vendors. ofcourse that would mean to change the scope of those sub-classes:
– descriptive – focuses on logic and intuitive visual representation – don’t read anything but this if you are a business user
– Analytic – focuss on standart attributes for quantitative analysis and additional elements to make models compact. This levels should use all visual elements there are.
– Interoperability – technical stuff. Don’t read it unless you are (or plan to be) a vendor.
OOOPS – sorry for going so far off topic 😉
And sorry for some error posts
[…] BPMN 2.0 spec are the worst part of the document, so vague as to be mostly unusable. And I have debated the student’s question previously, in a slightly different form, with Falko Menge of camunda, so even BPMN experts do not agree on […]