I have been working on rounding out the BPMN Interoperability (BPMN-I) spec and tool in the area of data flow, and I am puzzled by a fundamental concept where the BPMN 2.0 spec and non-normative “BPMN by Example” documents disagree. I wrote to the experts on the BPMN 2.0 committee but have not heard back, so let me just put it out there and maybe BPMS Watch readers will help sort it out.
The issue has to do with dataInput and dataOutput, elements which are defined for a process, task, or event ?(but not subProcess). For a process or task, they are part of the ioSpecification, which defines one or more inputSets and outputSets. Each inputSet and outputSet references zero or more dataInputs and dataOutputs. The operational semantics of ioSpecification and inputSet say that a process or task cannot start until the inputSet data is available (unless marked “can start without me”). In that sense, ioSpecification defines the interface or signature of the process or task, i.e. the data requirements.
But for a task (including callActivity) or event, dataInput and dataOutput are also used to model the data flow. A dataInput is the target of a dataInputAssociation, which maps data from a variable (dataObject) or external record (dataStore), and a dataOutput is similarly the source of a dataOutputAssociation. The spec is explicit that this applies to tasks and events, but is mysterious when it comes to dataInput and dataOutput of a process. The key question is this: Can a process dataInput be the source of a dataInputAssociation, and can a process dataOutput be the target of a dataOutputAssociation? The spec (mostly) says No, but BPMN by Example says Yes. And what does a process dataInput or dataOutput mean, anyway? Is it just a signature, like a WSDL portType, or is it actual instance data, like a variable? Let’s look at both sides of the argument.
1. Just a signature, not actual instance data
The evidence for this comes mainly from the spec.
- p213. “Data Inputs MAY have incoming Data Associations.” [It does not say may have outgoing.]
- p213. “If the Data Input is directly contained by the top-level Process, it MUST not be the target of Data Associations within the underlying model. Only Data Inputs that are contained by Activities or Events MAY be the target of Data Associations in the model.” [This would imply NO data associations can connect to a data input for a process.]
- p215. “Data Outputs MAY have outgoing Data Associations.” [It does not say may have incoming.]
- p215. “If the Data Output is directly contained by the top-level Process, it MUST not be the source of Data Associations within the underlying model. Only Data Outputs that are contained by Activities or Events MAY be the target of Data Associations in the model.” [Again this would imply NO data associations can connect to a data output for a process.]
Subsequent discussion explicitly refers to data inputs and data outputs of activities and events, not processes, except for this:
- p225. “In the case of a Start Event, the Data Inputs of the enclosing process are available as targets to the DataOutputAssociations of the Event. This way the Process Data Inputs can be filled using the elements that triggered the Start Event. In the case of an End Event, the Data Outputs of the enclosing process are available as sources to the DataInputAssociations of the Event. This way the resulting elements of the End Event can use the Process Data Outputs as sources.” [In other words, a process data input can have incoming data association from a start event, and a process data output can have outgoing data association to an end event. The purpose of this – it seems to me – would be to allow transformation between request/response message data and the signature defined by the process ioSpecification.]
The spec (p213, 215) only mentions displaying the dataInput or dataOutput shape for a process, not a task or event. From the above discussion, it would appear that no data association shape should connect to that dataInput or dataOutput shape, with the possible exception of incoming from a startEvent shape and outgoing to an endEvent shape.
2. Actual instance data, not just a signature
This alternative interpretation is used in more than one diagram and serialization in BPMN By Example. It is characterized by using the process dataInput as the source of a dataInputAssociation to a task or event. Here, for instance, is a clip from the Email Voting example:
You might say that from this diagram, the dataInput belongs to the task Review Issue List not the process (and I would agree with you!), but the serialization provided shows the dataInput defined for the process, and mapped from there to the task dataInput by direct dataInputAssociation (i.e. no intervening dataObject).
This would appear to be illegal, according to the statements from the spec quoted above. The one puzzling statement from the spec in its favor is this one:
- p221. “The purpose of retrieving data from Data Objects or Process Data Inputs is to fill the Activities inputs and later push the output values from the execution of the Activity back into Data Objects or Process Data Outputs.”
This is a strong statement, but it is the ONLY mention of using a process dataInput as the source of a dataInputAssociation, while there are numerous statements that suggest this is not allowed.
Why does it matter?
Is it worth quibbling over these details? If your purpose is to make the diagram understandable to the viewer, maybe we should just agree to live and let live. The email voting diagram conveys the information clearly. But if your goal is interoperation between modeling tools, then resolving the issue is important. The serialization in BPMN by Example is either correct or it is not (and even if legal per the spec, it could be declared interoperable or not by BPMN-I.) And this is not just an issue for executable processes. Even the Descriptive subclass (non-executable) contains data objects and data associations, so this serialization issue affects even the most basic model interchange.
So… what do you think? And why? Please comment on this post.
By the way, I think a better way to model the start of Email Voting (with data flow) would be something like this:
It shows the source and internal flow of the data more clearly (defining a data input for a process triggered by a timer is inherently confusing, I think), and maps to execution more cleanly as well. But that’s just me.
Hi Bruce,
Data Inputs and Data Outputs are defined as Item Aware Objects on page 214/215 of the spec and thus Data Associations can connect to them as explicitly stated on page 221.
The statements you quoted from page 213 and 215 only limit Process Data Inputs to be read-only and Process Data Outputs to be write-only. A Process Data Input is allowed to be the source of a Data Association whereas a Process Data Output can be the target of a Data Association.
Hence, Process Data Inputs and Outputs are signature as well as instance data.
Greetings from Berlin,
Falko
Falko,
Thank you for commenting, and greetings to you from sunny California. I admire your certainty but am not entirely convinced by your logic. How do you conclude from the statement that data input may have incoming data association that it is read-only? By definition, incoming data association means writable not readable. The data association copies the source to the data input. Also, while I agree the spec says a data association can connect to any item-aware element, it also says that certain connections are explicitly not allowed. It explicitly allows incoming data association to data input, but never mentions any outgoing data association from data input.
–Bruce
Bruce,
I am grateful that you have the patience to go through the spec at this level of detail… “quibbling over details” is the only way any BPM Systems are ever going to be interoperable.
I prefer your model of Email Voting to the original, because it’s more explicit regarding the implementation details of “How to get the issue list”. Your model clearly shows that the issue list will be retrieved and then passed to an Activity. The original indicates that there is an issue list that is available for review by some undefined mechanism – and it’s those mechanisms that are likely to be incompatible across BPM Systems.
Thanks John. My Email Voting example is more about “BPMN style” (clarity from the diagram) than “BPMN-I” (interoperability). The debate with Camunda is result of contradictions in the BPMN 2.0 spec itself. It’s too bad there is no effort to fix, or even post errata. BPMN-I needs to take a stand, but I think we need additional voices to weigh in. I might be waiting a while.
Hi Bruce,
admittedly, I’m that certain, because I’m a member of the specification team and took part in discussions of the Finalization Task Force about this topic. However, just stating this would be a lame argument, as it only documents the intention of the specification team, but normative is only the specification text. So let me try again to derive the answer from the actual specification:
p213. “The Data Input is an item-aware element.”
p215. “The Data Output is an item-aware element.”
p221. “The DataAssociation class is a BaseElement contained by an Activity or Event, used to model how data is pushed into or pulled from item-aware elements. DataAssociation elements have one or more sources and a target; the source of the association is copied into the target.”
So in general, Data Inputs and Data Outputs can be source as well as target of a Data Association.
p213. “Data Inputs MAY have incoming Data Associations.”
This does not prohibit outgoing Data Associations, otherwise the sentence must have used ‘MAY ONLY’.
There is no statement in the spec saying that Data Inputs MUST NOT have outgoing Data Associations. In fact, there are multiple statements saying the opposite:
p221. “Data Associations are used to move data between Data Objects, Properties, and inputs and outputs of
Activities, Processes, and GlobalTasks.”
p221. “The purpose of retrieving data from Data Objects or Process Data Inputs is to fill the Activities inputs […]”
p213. “If the Data Input is directly contained by the top-level Process, it MUST not be the target of Data Associations within the underlying model. Only Data Inputs that are contained by Activities or Events MAY be the target of Data Associations in the model.”
Data Associations are directed, i.e., have source and target. The two sentences only speak about the target of Data Associations. They do not prohibit Process Data Inputs to be the source of Data Associations. Hence, a Process can read its own top-level Data Inputs, but not write to them.
The only exception for this read-only access, which you quoted from p225, allows to fill Process Data Inputs of a single Process by either a Call Activity or a StartEvent with an EventDefinition that has an ItemDefinition such as a Message, Escalation, Error or Signal.
The argumentation is analog for Process Data Outputs and incoming Data Associations:
p215. “Data Outputs MAY have outgoing Data Associations.”
Again, no ‘MAY ONLY’ here.
p215. “If the Data Output is directly contained by the top-level Process, it MUST not be the source of Data Associations within the underlying model. Only Data Outputs that are contained by Activities or Events MAY be the target of Data Associations in the model.”
The two sentences only prohibit Process Data Outputs from being the source of Data Associations. Thus, a Process can write to its own top-level Data Outputs, but not read from them.
So I can’t see a contradiction here.
However, I’m also not too happy with the readability of the data modeling section. Properties and semantics of an element are often derived from an abstract base class rather than being explicitly stated. This might be acceptable for implementers of the spec but not for end-users. As a try to assist end-users in understanding the spec, we at camunda have been driving the effort on ‘BPMN 2.0 by Example’.
If you think that BPMN 2.1 should contain more explicit statements about Process Data Inputs and Outputs, please consider filing an issue at http://www.omg.org/technology/agreement.htm preferably containing not just criticism but also a proposal of what exactly should be added or modified in the specification text. In the end, the spec is mostly driven by volunteers, who are willing to invest their time and money into it.
As I couldn’t find your name or company on the Voters List for BPMN 2.1, I would volunteer to take over responsibility of such an issue and make sure your proposal is heard by the specification team.
Greetings from Karlsruhe,
Falko
Falko,
Ouch. Guess I touched a nerve there. Well, I will take your word for it that the intention is that process dataInput can have outgoing data association. I don’t really have a problem with the concept; I just disagree that the spec says that. And we certainly can agree that it is “lame” to rely on the intentions of the FTF team when that intention is not expressed in the specification itself. And, let me add, it is even more lame to assert that the real semantics and rules are fully expressed in the UML class structure of BPMN 2.0, used by implementers like yourself, whereas the narrative of the spec is just there for the less intelligent end user population. I guess I must have missed the class diagrams that show the data association constraints for subprocesses, start events, and end events. Perhaps you could refer me to them? Finally, you may not realize that it is not necessary to be a Voter to participate in drafting the BPMN specification, something I did for 7 months until the beginning of FTF, including participation on the Examples team (funny, I don’t recall you at those meetings). And since you apparently find my proposal for this data input issue obscure, let me spell it out for you: If a dataInput can have outgoing data associatioon, just say so in the spec.
[…] Edition 7.01 Reportbruce on IBM’s BPM Donut HoleMartin on IBM’s BPM Donut Holebruce on BPMN 2.0 Mystery: Process dataInput and dataOutputFalko Menge on BPMN 2.0 Mystery: Process dataInput and dataOutputPaul on IBM’s BPM Donut […]
Hi Bruce,
I’m sorry. I didn’t mean to question your contributions to BPMN. I highly appreciate all the work you did and still do as pioneer of our field.
As a try to improve the situation, I posted a proposal for BPMN 2.1 as a comment to your follow-up post:
https://www.methodandstyle.com/2011/04/20/more-on-bpmn-2-0-process-data-input/comment-page-1/#comment-1415
Cheers,
Falko
[…] BPMN 2.0 Process Data InputColumn 2 : Process Modeling With BPMN on The Rules of BPMNFalko Menge on BPMN 2.0 Mystery: Process dataInput and dataOutputFalko Menge on More on BPMN 2.0 Process Data InputBeauty is in the Eye of the Beholder with IBM BPM […]
I try to write this in java program language:
1?BPMN by Sample:
(in-process scope)
void eMailVoting_Process(List issueList){
//issueList as source in DataAssociation
reviewIssueList(issueList);
//issueList as source in DataAssociation
announceIssueForDiscussion(issueList);
}
In some process caller(out-process scope):
//issueList as target in DataAssociation
eMailVoting_Process(myIssueList);
2?Oracle implements:
void eMailVoting_Process(List issueList){
//A DataObject
List dataObject=issueList;
//dataObject as source in DataAssociation
reviewIssueList(dataObject);
//dataObject as source in DataAssociation
announceIssueForDiscussion(dataObject);
}
3?Bruce “a better way to model the start of Email Voting”
void eMailVoting_Process(){
List issueList=retrieveIssueListFromDataStore();
//issueList as source in DataAssociation
reviewIssueList(issueList);
}
Which one do you like?
I like the first.
The problem is in two different meanings of InputOutputSpecification.
One meaning is to be a signature (data requirements) of a collable element. Procass (as a subclass of CollableElement) defines its InputOutputSpecification in this way. DataInputs and DataOutputs from this type of IOSpecification plays as FORMAL parameters INSIDE the Process, so DataInput can be a source of DataInputAssociation for some Task inside the Process.
Second, opposite meaning is to present a “socket” for actual call, Like in CallActivity or ServiceTask for example. In this role we can’t define IOSpecification, but only COPY it from corresponding (called) CallableElement. DataInputs and DataOutputs from this type of IOSpecification are used as a points to transfer ACTUAL parameters from/to OUTSIDE of called Process or anything else. So, DataInput from such IPSpecification can be a target for some DataInputAssociation of this call.
The gap between two meanings is covered in Page 218:
Call Activity Mapping
The DataInputs and DataOutputs of the Call Activity are mapped to the corresponding elements in the
CallableElement without any explicit DataAssociation.
So, IOSpecification of CallActivity or other Task looks like redundant element in BPMN Definitions, while it should by equal to corresponding original IOSpecification, defined in CallableElement or some called web-service.
Hi Bruce,
I’m studying a bit the ways to use data elements inside bpmn models and found your article.
Could you tell me if there are news about this subject in new spec version?
@cristiano: I doubt it.
Dear Bruce,
I read your book and see you as my mentor for BPMN 🙂
Could you help me with one open question please. You say that “The operational semantics of ioSpecification and inputSet say that a process or task cannot start until the inputSet data is available (unless marked “can start without me”)”.
Actually, I understand how to use the data association, but I do not understand how an activity knows that that all inputSet data is available. How would that work for a BPMS?
If that don’t fit into this post, I would be more than happy to receive a private message on nils dot leideck at leidex dot net.
I am looking forward to your reply, sitting here and continuing writing my very own BPMS 🙂
— Cheers, Nils
Nils,
This is but one of many examples in BPMN where “and then some magic occurs” in order to implement in a real BPMN engine. For example, an interrupting boundary event on a subprocess is supposed to terminate all activity in the subprocess immediately. Difficult to implement but a common pattern in BPMN. The one you mention is, I suspect, ignored by every real BPMS, so I wouldn’t worry about it.
–Bruce
Hi Bruce,
thanks for your feedback. I wonder how a BPMS then knows when to go to the next activity.
I would understand that if a condition on the sequence flow is always required, but as it seems, there is no rule that tells a BPMS to move on to the next activity.
I also understand that service task activities can move on once the service has been connected and the communication has ended. Same for script task activities.
But for user task activities, this is different. My software has the option to define multiple forms for one user task activity. Each form can store data into dataStores and dataObjects. But the only way to continue is to define some kind of expression that tells the BPMS to move on when the “data” matches the expression. But this concept seems to be missing in the BPMN.
I am right or is there some more for me to learn about BPMN?
— Cheers, Nils
User task completion is straightforward, since the BPMS is presenting the task UI, in which completion is indicated by the user clicking a button (e.g., “Approved” or “Task complete”).
Hi Bruce,
that concept you described is okay if there os only one form per activity with mandatory fields only. But pretend you have 3 form per activity and each form can have optional and mandatory fields.
Submitting one of the form doesn’t mean “Task completed”, so there must be a kind of expression that tells the process engine to move to the next activity.
For now, I am realising this expressions set on the sequence flows, but that is violating the BPMN spec because if I have a sequence flow with an expression, I also need one (1) sequence flow defined as default sequence flow.
So what I guess is that there is no concept in BPMN to define some kind of expression that tells the process engine that a task can be considered as “completed”.
Any idea how to define this in pure BPMN ?
— Cheers, Nils
I’m not really following you. A form is not what I mean. I mean the user’s view of the task from the process portal runtime. That is how it is normally done in real BPMSs. If you want to go by the spec, Chapter 13 discusses the operational semantics of engines conforming to BPMN Execution. The activity implementation is responsible for maintaining its state and reporting to the process engine by some means (unspecified). It is not done by expressions on sequence flows. It is part of the implementation.
Hi Bruce,
thanks for your valuable time and feedback!
My conclusion is that I need to implement some kind of data-check mechanism per activity to enable the process engine to decide if an activity is “completed”. The GUI in my software don’t have a button like “Completed”. Instead it is designed to provide one or more data entry pages per activity (e.g. Activity : “Incident analysis” with the pages “Log file results”, “Customer data” and “On site support travel expense data”). The process engine needs to know that it con only process with the next activity if “all required data is available” – and this logic is missing in BPMN.
If you have any more idea how to implement this in a most BPMN’ish way, I would be more than grateful to listen 🙂
— Cheers, Nils