In Trisotech Decision Modeler, that is the case even if you never created such a service yourself. That’s because the tool has created one for you automatically, the Whole Model Decision Service, and used it by default in model testing. In fact, a single decision model is typically the source for multiple decision services, some created automatically by the tool and some you define manually, and you can select any one of them for testing or deployment. In this post we’ll see how that works.
As with any service, the interface to a decision service is defined by its inputs and outputs. The DMN model also defines the internal logic of the decision service, the logic that computes the service’s output values from its input values. Unlike a BKM, where the internal logic is a single boxed expression, the internal logic of a decision service is defined by a DRD.
When the service is invoked by a decision in a DMN model, typically it has been defined in another DMN model and imported to the invoking model, such as by dragging from the Digital Enterprise Graph. But otherwise, the service definition is a fragment of a larger decision model, including possibly the entire DRD, and is defined in that model.
Within that fragment, certain decisions represent the service outputs, other decisions and input data represent the inputs, and decisions in between the outputs and inputs are defined as “encapsulated”, meaning they are used in the service logic but their values are not returned in the service output. When you execute a decision service, you supply values to the service inputs and the service returns the values of its outputs.
In Trisotech’s Whole Model Decision Service, the inputs are all the input data elements in the model, the outputs are all “top-level” decisions – that is, decisions that are not information requirements for other decisions. All other decisions in the DRD are encapsulated. In addition to this service, Trisotech automatically creates a service for each DRD page in the model, named Diagram Page N. If there is only one DRD page in the model, it will be the same as the Whole Model Decision Service, but if your model has imported a decision service or defines one independently of the invoking DRD, an additional Diagram Page N service will reflect that one.
All that is just scratching the surface, because quite often you will want to define additional decision services besides those automatically created. For example, you might want your service to return one or more decisions that are defined as encapsulated in the Whole Model Decision Service. Or you might want some inputs to your service to be not input data elements but supporting decisions. In fact, this is very common in Trisotech Low-Code Business Automation models, where executable BPMN typically invoke a sequence of decision services, each a different fragment of a single DMN model. In BPMN, if you are invoking the Whole Model Decision Service it’s best to rename it, because the BPMN task that invokes it inherits the decision service name as the task name.
So how do you define a decision service? The DMN spec describes one way, but it is not the best way. The way described in the spec is as a separate DRD page containing an expanded decision service shape, a resizable rounded rectangle bisected by a horizontal line. The shape label is the service name. Decisions drawn above the line are output decisions, those below the line are encapsulated decisions. Service inputs – whether input data or decisions – are drawn outside the service shape, as are any BKMs invoked by decisions in the service. Information requirements and knowledge requirements are drawn normally.
The better way, at least in Trisotech Decision Modeler, is the decision service wizard.
In the wizard, you first select the output decisions from those defined in your model, and the wizard populates the service inputs with their direct information requirements. You can then promote selected inputs to encapsulated, and the wizard recalculates the needed inputs. You can keep doing that until all service inputs are input data, or you can stop anywhere along the way. The reason why this is better is that it ensures that all the logic needed to compute the output values from the inputs are properly captured in encapsulated decisions. You cannot guarantee that with the expanded decision service shape method.
Trisotech’s Automation feature lets you test the logic of your DMN models, and I believe that is critically important. On the Execution ribbon, the Test button invites you first to select a particular decision service to test. If you forget to do this, the service it selects by default depends on the model page you have open at the time you click Test.
In Test, the service selector dropdown lists even more services than the automatically generated ones and those you created manually, which are listed above a gray line in the dropdown. Below the line is listed a separate service for every decision in the model, named with the decision name, with direct information requirements as the inputs. (For this reason, you should not name a decision service you create with the name of a decision, as this name conflicts with the generated service.) In addition, below the line is listed one more: Whole model, whose inputs are all the input data elements and outputs are all the decisions in the model. It’s important to note that these below-the-line services are available only for testing in Decision Modeler. If you want to deploy one of them, you need to manually create it, in which case it is listed above the line.
In Test, your choice of decision service from the dropdown determines the inputs expected by the tool. As an alternative to the normal html form, which is based on the datatypes you have assigned to the inputs, you can select an XML or json file with the proper datatype, or use a previously saved test case.
Invoking a decision service in DMN works the same way as invoking a BKM. On the DRD containing the invocation, the decision service is shown as a collapsed decision service shape linked to the invoking decision with a knowledge requirement.
The invoking decision can use either a boxed invocation or literal invocation. In the former, service inputs are identified by name; in the latter, input names are not used. Arguments are passed in the order of the parameters in the service definition, so you may need to refer to the service definition to make sure you have that right.
In Business Automation models it is common to model almost any kind of business logic as a DMN decision service invoked by a business rule task, also called a decision task. In Trisotech Workflow Modeler, you need to link the task to a decision service in your workspace; it is not necessary to first Deploy the service. (Deployment is necessary to invoke the service from an external client.) As mentioned previously, the BPMN task inherits the name of the decision service. By default, the task inputs are the decision service inputs and the task outputs are the decision service outputs.
Data associations provide data mappings between process variables – BPMN data objects and data inputs – to the task inputs, and from task outputs to other process variables – BPMN data objects and data outputs. On the Trisotech platform, these data mappings are boxed expressions using FEEL, similar to those used in DMN.
The important takeaway is the a decision service is more than a fancy BKM that you will rarely use. If you are actually employing DMN in your work, you will use decision services all the time, both for logic testing and deployment, and for providing business logic in Business Automation models. The decision service wizard makes it easy.
If you want to find out more about how to define and use decision services, check out DMN Method and Style 3rd edition, with DMN Cookbook, or my DMN Method and Style training, which includes post-class certification.
]]>
One key reason why FEEL is more business-friendly than the Excel Formula language, which they now call Power FX, is its operators. FEEL has many, and Power FX has very few. In this post we’ll discuss what operators are, how they simplify the expression syntax, and how DMN boxed expressions make some FEEL operators more easily understood by business users.
It bears repeating that an expression language is not the same as a programming language. A programming language has statements. It defines variables, calculates and assigns their values. You could call DMN as a whole a programming language, but the expression language FEEL does not define variables or assign their values. Those things done graphically, in diagrams and tables – the DRD and boxed expressions. FEEL expressions are simply formulas that calculate values: data values in, data values out.
Those formulas are based on two primary constructs: functions and operators.
The logic of a function is specified in the function definition in terms of inputs called parameters. The same logic can be reused simply by invoking the function with different parameter values, called arguments. The syntax of function invocation – both in FEEL and Excel Formulas – is the function name immediately followed by parentheses enclosing a comma-separated list of arguments. FEEL provides a long list of built-in functions, meaning the function names and their parameters are defined by the language itself. Excel Formulas do the same. In addition, DMN allows modelers to create custom functions in the form of Business Knowledge Models (BKMs) and decision services, something Excel does not allow without programming.
Operators are based on reserved words and symbols in the expression with meaning defined by the expression language itself. There are no user-defined operators. They do not use the syntax of a name followed by parentheses enclosing a list of arguments. As a consequence, the syntax of an expression using operators is usually shorter, simpler, and easier to understand than an expression using functions.
You can see this from a few examples in FEEL where you could use either a function or an operator. One is simple addition. Compare the syntax of the expression adding variables a and b using the sum() function
sum(a, b)
with its equivalent using the addition operator +:
a + b
The FEEL function list contains() and the in operator do the same thing, test containment of a value in a list. Compare
list contains(myList, "abc")
with
"abc" in myList
Both FEEL and Excel support the basic arithmetic operators like +, -, *, and /, comparison operators like =, >, or <=, and string concatenation. But those are essentially the only operators provided by Excel, whereas FEEL provides several more. It is with these more complex operators that FEEL’s business-friendliness advantage stands out.
Let’s start with the conditional operator, if..then..else. These keywords comprise an operator in FEEL, where Excel can use only functions. Compare the FEEL expression
if Credit Score = "High" and Affordability = "OK" then "Approved" else "Disapproved"
with Excel’s function-based equivalent:
IF(AND(Credit Score = "High", Affordability = "OK"), "Approved", "Disapproved")
The length is about the same but the FEEL is more human-readable. Of course, the Excel expression assumes you have assigned a variable name to the cells – something no one ever does. So you would be more likely to see something like this:
IF(AND(B3 = "High", C3 = "OK"), "Approved", "Disapproved")
That is a trivial example. A more realistic if..then..else might be
if Credit Score = "High" and Affordability = "OK" then "Approved" else if Credit Score = "High" and Affordability = "Marginal" then "Referred" else "Disapproved"
That’s longer but still human-readable. Compare that with the Excel formula:
IF(AND(Credit Score = "High", Affordability= "OK"), "Approved", IF(AND(Credit Score = "High", Affordability = "Marginal"), "Referred", "Disapproved"))
Even though the FEEL syntax is fairly straightforward, DMN includes a conditional boxed expression that enters the if, then, and else expressions in separate cells, in theory making the operator friendlier for some users and less like code. Using that boxed expression, the logic above looks like this:
The FEEL filter operator is square brackets enclosing either a Boolean or integer expression, immediately following a list. When the enclosed expression is a Boolean, the filter selects items from the list for which the expression is true. When the enclosed expression evaluates to positive integer n, the filter selects the nth item in the list. (With negative integer n, it selects the nth item counting backward from the end.) In practice, the list you are filtering is usually a table, a list of rows representing table records, and the Boolean expression references columns of that table. I wrote about this last month in the context of lookup tables in DMN. As we saw then, if variable Bankrates is a table of available mortgage loan products like the one below,
then the filter
Bankrates[lenderName = "Citibank"]
selects the Citibank record from this table. Actually, a Boolean filter always returns a list, even if it contains just one item, so to extract the record from that list we need to append a second integer filter [1]. So the correct expression is
Bankrates[lenderName = "Citibank"][1]
Excel Formulas do not include a filter operator, but again use a function: FILTER(table, condition, else value). So if we had assigned cells A2:D11 to the name Bankrates and the column A2:A11 to the name lenderName, the equivalent Excel Formula would be
FILTER(Bankrates, lenderName = "Citibank", "")
but would more likely be entered as
FILTER(A2:D11, A2:A11 = "Citibank", "")
FEEL’s advantage becomes even more apparent with multiple query criteria. For example, the list of zero points/zero fees loan products in FEEL is
Bankrates[pointsPct = 0 and fees = 0]
whereas in Excel you would have
FILTER(A2:D11, (C2:C11=0)*(D2:D11=0), "")
There is no question here that FEEL is more business-friendly.
The for..in..return operator iterates over an input list and returns an output list. It means for each item in the input list, to which we assign a dummy range variable name, calculate the value of the return expression:
for <range variable> in <input list> return <return expression, based on range variable>
It doesn’t matter what you name the range variable, also called the iterator, as long as it does not conflict with a real variable name in the model. I usually just use something generic like x, but naming the range variable to suggest the list item makes the expression more understandable. In the most common form of iteration, the input list is some expression that represents a list or table, and the range variable is an item in that list or row in that table.
For example, suppose we want to process the Bankrates table above and create a new table Payments by Lender with columns Lender Name and Monthly Payment, using a requested loan amount of $400,000. And suppose we have a BKM Lender Payment, with parameters Loan Product and Requested Amount, that creates one row of the new table, a structure with components Lender Name and Monthly Payment. We will iterate a call to this BKM over the rows of Bankrates using the for..in..return operator. Each iteration will create one row of Payments by Lender, so at the end we will have a complete table.
The literal expression for Payments by Lender is
for product in Bankrates return Lender Payment(product, Requested Amount)
Here product is the range variable, meaning one row of Bankrates, a structure with four components as we saw earlier. Bankrates is the input list that we iterate over. The BKM Lender Payment is the return expression. Beginners are sometimes intimidated by this literal expression, so, as with if..then..else, DMN provides an iterator boxed expression that enters the for, in, and return expressions in separate cells.
The BKM Lender Payment uses a context boxed expression with no final result box to create each row of the table. The context entry Monthly Payment invokes another BKM, Loan Amortization Formula, which calculates the value based on the adjusted loan amount, the interest rate, and fees.
Excel Formulas do not include an iteration function. Power FX’s FORALL function provides iteration, but it is not available in Excel. To iterate an expression in Excel you are expected to fill down in the spreadsheet.
The FEEL operators some..in..satisfies and every..in..satisfies represent another type of iteration. The range variable and input list are the same as with for..in..return. But in these expressions the satisfies clause is a Boolean expression, and the iteration operator returns not a list but a simple Boolean value. The one with some returns true if any iteration returns true, and the one with every returns true only if all iterations return true.
For example, again using Bankrates,
some product in Bankrates satisfies product.pointsPct = 0 and product.fees = 0
returns true, while
every product in Bankrates satisfies product.pointsPct = 0 and product.fees = 0
returns false. The iterator boxed expression works with this operator as well.
The bottom line is this: FEEL operators are key to its combination of expressive power and business-friendliness, surpassing that of Microsoft Excel Formulas. Modelers should not be intimidated by them. For detailed instruction and practice in using these and other DMN constructs, check out my DMN Method and Style training. You get 60-day use of Trisotech Decision Modeler and post-class certification at no additional cost.
]]>
The best way to model a lookup table is a filter expression on a FEEL data table. There are several ways to model the data table – as submitted input data, a cloud datastore, a zero-input decision, or a calculated decision. Each way has its advantages in certain circumstances. In this post we’ll look at all the possibilities.
As an example, suppose we have a table of available 30-year fixed rate home mortgage products, differing in the interest rate, points (an addition to the requested amount as a way to “buy down” the interest rate), and fixed fees. You can find this data on the web, and the rates change daily. In our decision service, we want to allow users to find the current rate for a particular lender, and in addition find the monthly payment for that lender, which depends on the requested loan amount. Finding the lender’s current rate is a basic lookup of unmodified external data. Finding the monthly payment using that lender requires additional calculation. Let’s look at some different ways to model this.
We can start with a table of lenders and rates, either keyed in or captured by web scraping. In Excel it looks like this:
When we import the Excel table into DMN, we get a FEEL table of type Collection of tBankrate, shown here:
Each row of the table, type tBankrate, has components matching the table columns. Designating a type as tPercent simply reminds us that the number value represents a percent, not a decimal.
Here is one way to model this, using a lookup of the unmodified external data, and then applying additional logic to the returned value.
We define the input data Bankrates as type Collection of tBankrate and lookup my rate – the one that applies to input data my lender. The lookup decision my rate uses a filter expression. A data table filter typically has the format
<table>[<Boolean expression of table columns>]
in which the filter, enclosed in square brackets, contains a Boolean expression. Here the table is Bankrates and the Boolean expression is lenderName = my lender. In other words, select the row for which column lenderName matches the input data my lender.
A filter always returns a list, even if it contains just one item. To extract the item from this list, we use a second form of a filter, in which the square brackets enclose an integer:
<list>[<integer expression>]
In this case, we know our data table just has a single entry for each lender, so we can extract the selected row from the first filter by appending the filter [1]. The result is no longer a list but a single row in the table, type tBankrate.
The decision my payment uses a BKM holding the Loan Amortization Formula, a complicated arithmetic expression involving the loan principal (p), interest rate (r), and number of payments (n), in this case 360.
Decision my payment invokes this BKM using the lookup result my rate. Input data loan amount is just the borrower’s requested amount, but the loan principal used in Loan Amortization Formula (parameter p) also includes the lender’s points and fees. Since pointsPct and ratePct in our data table are expressed as percent, we need to divide by 100 to get their decimal value used in the BKM formula.
When we run it with my lender “Citibank” and loan amount $400,000, we get the result shown here.
That is one way to do it. Another way is to enrich the external data table with additional columns, such as the monthly payment for a given loan amount, and then perform the lookup on this enriched data table. In that case the data table is a decision, not input data.
Here the enriched table Payments by Bank has an additional column, payment, based on the input data loan amount. Adding a column to a table involves iteration over the table rows, each iteration generating a new row including the additional column. In the past I have typically used a context BKM with no final result box to generate the each new row. But actually it is simpler to use a literal expression with the context put() function, as no BKM is required to generate the row, although we still need the Loan Amortization Formula. (Simpler for me, but the resulting literal expression is admittedly daunting, so I’ll also show you an alternative way using boxed expressions that breaks it into simpler pieces.)
context put(), with parameters context, keys, and value, appends components (named by keys) to a an existing structure (context), and assigns their value. If keys includes an existing component of context, value overwrites the previous value. Here keys is the new column name “payment”, and value is calculated using the BKM Loan Amortization Formula. So, as a single literal expression, Payments by Bank looks like this:
Here we used literal invocation of the BKM instead of boxed invocation, and we applied the decimal() function to round the result.
Alternatively, we can use the iterator boxed expression instead of the literal for..in..return operator and invocation boxed expressions for the built-in functions decimal() and context put() as well as the BKM. With FEEL built-in functions you usually use literal invocation but you can use boxed invocation just as well.
Now my payment is a simple lookup of the enriched data table Payments by Bank, appending the [1] filter to extract the row and then .payment to extract the payment value for that row.
When we run it, we get the same result for Citibank, loan amount $400,000:
The enriched data table now allows more flexibility in the queries. For example, instead of finding the payment for a particular lender, you could use a filter expression to find the loan product(s) with the lowest monthly payment:
Payments by Bank[payment=min(Payments by Bank.payment)]
which returns a single record, AimLoan. Of course, you can also use the filter query to select a number of records meeting your criteria. For example,
Payments by Bank[payment < 2650]
will return records for AimLoan, AnnieMac, Commonwealth, and Consumer Direct.
Payments by Bank[pointsPct=0 and fees=0]
will return records for zero-points/zero-fee loan products: Aurora Financial, Commonwealth, and eLend.
Both of these methods require submitting the data table Bankrates at time of execution. Our example table was small, but in real projects the data table could be quite large, with thousands of rows. This is more of a problem for testing in the modeling environment, since with the deployed service the data is submitted programmatically as JSON or XML. But to simplify testing, there are a couple ways you can avoid having to input the data table each time.
You can make the data table a zero-input decision using a Relation boxed expression. On the Trisotech platform, you can populate the Relation with upload from Excel. To run this you merely need to enter values for my lender and loan amount. You can do this in production as well, but remember, with a zero-input decision you cannot change the Bankrates values without versioning the model.
Alternatively, you can leave Bankrates as input data but bind it to a cloud datastore. Via an admin interface you can upload the Excel table into the datastore, where it is persisted as a FEEL table. So in the decision model, you don’t need to submit the table data on execution, and you can periodically update the Bankrates values without versioning the model. Icons on the input data in the DRD indicate its values are locked to the datastore.
Lookup tables using filter expressions are a basic pattern you will use all the time in DMN. For more information on using DMN in your organization’s decision automation projects, check out my DMN Method and Style training or my new book, DMN Method and Style 3rd edition, with DMN Cookbook.
]]>I still hear from managers, “We spend all this money on process mapping projects, but in the end the only ones who understand the diagrams are the people who created them.” At the root of this problem is the fact that the basic look of BPMN is based on traditional swimlane flowcharts, with boxes and arrows, long preceding BPMN but without any common precise meaning to the shapes. The challenge for the BPMN 2.0 task force was to provide the shapes precise meanings and operational behavior so that, in fact, what you model with those shapes is indeed what you execute, and execution works the same way in any compliant tool.
It was purely by accident that in 2009 I found myself on the IBM-SAP-Oracle team competing to draft the BPMN 2.0 spec. I had been doing training on BPMN 1.x for a couple years by then and had a good idea how modelers used the diagrams and the kinds of mistakes they made. My students were not interested in automating their processes, simply describing them clearly for documentation, analysis, and improvement. But the other team members had little interest in that. They were singularly focused on the execution details, most of which were invisible in the diagrams. They desperately needed BPMN diagrams to be executable in order to win business support for their major IT push to service-oriented architecture (SOA). Use of the diagrams for purely descriptive modeling was of no interest. And this stance was evident in the resulting spec.
For example, while the spec provides various icons and symbols to clarify the meaning of activities, their use is optional. And not only is there no guidance for labeling of the various diagram elements, there is no requirement to provide labels at all! That is why, when you validate the diagram below against the rules of the spec, you get no errors:
But what does that diagram mean? Basically it means nothing at all. Contrast that with the same diagram adorned with the optional icons, markers, and labels:
Now the diagram has concrete meaning. The style part of Method and Style is based on rules that require the icons and labels, sometimes with a particular format, and assign specific meanings to them. But to get modelers to pay attention to those rules, it’s best when they are validated in the BPMN tool itself. The Style Validation feature of Trisotech Workflow Modeler, for example, provides this. When you test the first diagram with Style Validation, you see a list of errors:
The error messages here just say that labels are required, but Method and Style actually prescribes their format. A Message start event label, for example, should be labeled “Receive [message flow name]”, and the message flow must be drawn and labeled (a noun, the name of the message). End events, if there are more than one, should be labeled Noun-adjective to indicate the end state of the process instance, i.e., how did it end? With Message start events, the process instance is indicated by the start message label, here Request, so the end event labels indicate how that Request was fulfilled, successfully or otherwise.
Complicating matters is the fact that process data and the data expressions executed by the process engine, while they are actually what controls execution behavior, are not shown in the diagram. Process variables in the form of data objects do have diagram shapes, but they are used only in executable models. The reason is that BPMN does not standardize the language of process data and expressions. Most executable BPMN tools use Java, thus outside the understanding of business users. The challenge for Method and Style, then, is to imply the data suggested by the flow, and use that implied data to label the diagrams.
Here Method and Style uses the concept of activity end states, meaning how did the activity instance end? I’ve mentioned already process end states, indicated by labeled end events. The end states of a subprocess are similarly indicated by labeled end events. Because you cannot look inside a task, there are no labeled shapes to indicate its end states. Instead they are implied by the gateway that immediately follows the task. Enumerating the end states of an activity would seem to be completely subjective, but Method and Style prescribes the way: Each activity end state corresponds to a distinct next step – activity or event – following completion of the activity. If there are two distinct next steps, that implies two activity end states, named by the gate labels leading to those respective next steps. For example, in the diagram above, the task Validate request has two possible next steps, the task Handle request and the end event Request rejected. That means Validate request must be followed by an XOR gateway with two gates, appropriately labeled to indicate the task end state leading to each possible next step. If Validate request were instead a subprocess, Method and Style would require the child level diagram to have exactly two end events, labeled to match the parent level gates, “Valid” and “Invalid”. And this would be checked by Style Validation.
Another problem of the BPMN 2.0 spec is the vocabulary of shapes and symbols is too large. Like any work of a committee, the spec contains some elements of little utility. Moreover, some elements make sense only in executable models and should not be used in descriptive models. Method and Style pares down the vocabulary. For example, a Script task is automated logic executed on the process engine, while a Service task is automated by some other engine. Similarly, a User task is a human task assigned and monitored by the process engine, while a Manual task is a human task invisible to the process engine. Since these distinctions are meaningless in descriptive modeling, where there is no process engine, Method and Style does not use Script or Manual tasks. We also try to limit events to Message, Timer, and Error. There is really little need for the others in descriptive BPMN, and students who use them usually do so incorrectly. Simplifying the BPMN vocabulary leads to better models.
All of this relates to the style part of Method and Style. There is also a method piece, although it is used less often. The method is a procedure for creating a properly structured model by top-down decomposition, as opposed to the normal practice of stringing together activities noted in the process discovery sessions. It’s hard for students, because most are used to thinking concretely from the bottom up rather than abstractly from the top down. In the method, you start by identifying the process instance, when it starts and the ways it possibly ends. That nails down the process start and end events. Then you divide everything that happens in the process into a handful of subprocesses. Just name them at this point. Now for each subprocess, think of where the instance could possibly go next, after the subprocess completes. It can only be one of the other subprocesses or one of the end events. That list of possible next steps determines the subprocess end states, and if there is more than one possible next step, the gates of a gateway following the subprocess. Now you have your top-level diagram. I like to do this without lanes, which only complicate the drawing at this stage. You can always add lanes at the end.
Next, create the child level of each subprocess. Already you know its end states. And you can continue this decomposition to further child levels. The method does not include adding events, which you can add at the end, but it results in a properly structured BPMN model, always a good starting point.
So what’s the benefit of all these new rules and methodology? They are not part of the spec, that’s true. But if you get your whole team – actually, all the stakeholders in your process models – to understand and use them, an amazing thing happens: The meaning of the process model is understood by all, simply from the printed diagram alone! In the descriptive modeling world, that is a huge win. Otherwise, it rarely happens.
OK, so how do you learn to do this? One way is my book BPMN Quick and Easy. It’s fairly short and to the point. My earlier book BPMN Method and Style is more comprehensive, but it doesn’t pare down the vocabulary as much as I’ve learned to do after many years of BPMN training. A better way is my BPMN Method and Style training. It’s better because it’s hands-on with a great BPMN tool, Trisotech Workflow Modeler, which has the Style Validation built in. Students get 60-day use of the tool for use in the many in-class exercises as well as the post-class certification, in which students must create a model containing certain required elements and submit it for my review. If there are errors – and usually there are some – the student must fix and resubmit. It’s actually in this cycle of fixing and resubmitting that the weak parts of the student’s understanding come to light and the student has that Aha! moment. It is satisfying for student and teacher alike. There is a lot more information on Method and Style and the training on my website methodandstyle.com. Check it out!
]]>
The new edition is based on the draft DMN 1.6 spec, which contains many features unavailable in the second edition of the book, which was based on DMN 1.2. And while that book was targeted at business users, the new edition balances the needs of business and technical modelers. Part I is the Guide to Decision Modeling, which explains all the modeling elements – DRDs, decision tables, other boxed expressions, and FEEL – in a business-friendly way. Part II is the DMN Cookbook, formerly a separate volume, with “recipes” for various modeling challenges, aimed at more technical modelers. Looking back on the original DMN Cookbook, based on the draft DMN 1.2 spec, I am struck by how few of those recipes would be done in the same way today.
Another impetus to writing a new edition is my own experience in DMN engagements. Seemingly obscure parts of FEEL – double iteration, handling lists of lists, interpolation of published data, complex data validation – can be critical in real-world projects, and those things all made their way into the new edition.
Below is the table of contents, and you can read the Preface here:
Preface to the Third Edition. vii
1. What Is DMN? 3
2. DMN Elements. 23
3. Decision Requirements 37
4. Decision Tables. 49
5. Data Modeling and Reuse. 77
6. Literal Expressions. 97
7. Business Knowledge Models. 109
8. Contexts. 117
9. Decision Services. 127
10. Calendar Arithmetic 141
11. Lists and Tables 151
12. Data Validation. 177
13. Orchestrating DMN 187
14. About the DMN Cookbook. 201
15. String Recipes. 205
16. Calendar Arithmetic Recipes 219
17. List and Table Recipes 225
18. Math Recipes 239
19. Machine Learning Recipes. 253
20. Data Validation Recipes 265
Index 279
About the Author 285
]]>Consider the table Dataset, a list of x-y pairs. Here x represents the number of members in a group health insurance plan, and y represents the administrative cost of the plan as a percentage of claim value. We would expect the admin cost percentage to decrease as the number of members increases, which is what we see when we graph the table.
Here we have 6 data points {x[i], y[i]}, and we want to find the best straight line fit, described as
y = m*x + b
where m is the slope of the line and b is the intercept of the line at x = 0, using linear regression.
Best fit means minimizing the sum of the square of errors, where error is the actual y value minus the straight line y value. When the dependent variable y depends only on a single variable x, we can find the best fit analytically. When there are additional independent variables, we typically need to use numerical methods.
Using the analytical solution, the slope m is given by
(n*sumxy – sumx*sumy)/(n*sumx2 – sumx**2)
and b is given by
(sumy – m*sumx)/n
where
n = count(Dataset)
sumxy = sum(for i in 1..n return Dataset.x[i]*Dataset.y[i])
sumx = sum(Dataset.x)
sumy = sum(Dataset.y)
sumx2 = sum(for i in 1..n return Dataset.x[i]**2)
Modeling this in DMN, we get m = -0.274, b=37.0:
OK, it’s good to know how to do this, but it’s not all that profound. More interesting is the case where we do not have an analytical solution and we need to do it another way.
When the dependent variable y is a linear expression of multiple variables, we must resort to numerical methods to estimate the coefficients of each independent variable that minimize the cost function, the sum of the squared difference between the modeled value and the observed or actual value.
One common way to do this is called gradient descent. We calculate the cost function for some initial guess at the coefficients, then iteratively adjust the coefficients and calculate the cost function again until it approaches a minimum. The adjustment is based on the gradient of the cost function, meaning the slope of the cost function curve when you vary one coefficient while keeping the other values constant. Doing this successfully depends on the proper choice of a parameter called the learning rate. If you set the learning rate too high, the iterations do not converge on a minimum; if you set it too low, it will converge but could take thousands of iterations.
Gradient descent is used in many machine learning algorithms besides linear regression, including classification using logistic regression, even neural networks. But here we will see how it works with linear regression using the same dataset we used previously.
We calculate the cost function in FEEL as
cost = sum(for i in 1..n return (h[i] – y[i])**2)/(2*n)
where h[i] is the estimated value of y[i]. In the case of linear regression,
h[i] = c1*x[i] + c0
where c1 and c0 are the coefficients we are trying to optimize. If we are successful they will equal the values of m and b calculated previously.
Gradient descent means we adjust our initial guess of the coefficients by multiplying the learning rate a times the partial derivative of the cost function with respect to each coefficient. The adjustment with each iteration is given by
c0 := c0 – a/n*sum(for i in 1..n return h[i] – y[i]) c1 := c1 – a/n*sum(for i in 1..n return x[i]*(h[i] – y[i])
To do this in DMN requires recursion, a function that calls itself with new arguments each time. Each iteration generates new values for c0 and c1, from which we calculate cost, and we iterate until cost converges on a minimum.
When we use the raw Dataset values, we need to set the learning rate to a very small number, and it requires too many iterations to converge to the minimum cost. We can get much faster convergence is by normalizing the data:
xn[i] = (x[i] – xmin)/(xmax – xmin) yn[i] = (y[i] – ymin)/(ymax – ymin)
Now the normalized values xn and yn all lie in the interval [0..1]. We can use gradient descent on the normalized dataset to find values of cn0 and cn1 that minimize the cost function for the regression line
yn = cn1*xn + cn0
and then transform the coefficients to get c0 and c1, which apply to the raw Dataset.
The DRD for our model is shown below:
Decision NormData applies the normalization transformation to the raw Dataset. Input data initial is our initial guess at the coefficients, which we naively set to cn1=0, cn0=0. We play around with the learning rate a until we find one that converges well. The decision regression calls the BKM recursion once, with the initial coefficient values. The BKM calculates the cost function, adjusts values of the coefficients, and then iteratively calls itself again with the new coefficients.
In the BKM, context entry h is the collection of estimated y values using the coefficients specified by the BKM parameter coeff. Context entry costFunction is the calculated cost using h. Context entry newCoeff uses the gradient descent formula with learning rate a to generate new coefficient values. From that we calculate new values of h and a new value for the cost function.
We can set recursion to end at a certain number of iterations or when the cost function change is essentially zero. In the final result box, we recursively call the BKM 750 times with adjusted coefficient values. Context entry out is the structure output by the BKM when we exit. It provides values for the coefficients, the fractional change in the cost function from the previous iteration – ideally near zero, and the count of iterations.
Executing this model with initial guess {c0:0, c1:0} for the coefficients and 0.1 for the learning rate a, we get the result below:
Convergence is excellent; after 750 iterations the fractional change in the cost function is a few parts per billion. Now we need to convert the coefficients – here labeled c1 and c0 but actually what we earlier called cn1 and cn0 – to the c1 and c0 values for the raw Dataset.
Plugging the calculated values of -0.897 for cn1 and 0.826 for cn0 into the normalization equations
yn = cn1*xn + cn0
xn = (x – xmin)/(xmax – xmin)
yn = (y – ymin)/(ymax – ymin)
y = c1*x + c0
we get c1 = -.274, c0=36.99, which matches the solution calculated analytically.
You would likely use the analytical solution, not gradient descent, for linear regression on a single variable, but for many other machine learning problems, numerical methods for minimizing the cost function are the only way. To be fair, many other languages do this more easily than DMN, but if you need to sneak in a bit of machine learning to your DMN model, you can do it!
These examples, along with many others, are taken from the DMN Cookbook section of my new book, DMN Method and Style 3rd edition, coming soon.
]]>
Each process instance has a defined start and end. The start is the triggering event, a BPMN start event. The end occurs when the instance reaches an end state of the process instance, which in Method and Style is an end event. It helps to have a concrete idea of what the process instance represents, but I have found in my BPMN Method and Style training that most students starting out cannot tell you. Actually it’s very easy: It is the handling of the triggering event, which in Method and Style is one of only three kinds: a Message event, representing an external request; a Timer event, representing a scheduled recurring process; or a None start event, representing manual start by an activity performer in the process, which you could call an internal request. Of these three, Message start is by far the most common. That request message could take the form of a loan application, a vacation request, or an alarm sent by some system. The process instance in that case is then essentially the loan application, the vacation request, or the alarm. In Method and Style, it’s the label of the message flow into the process start event. With a Timer start event, the instance is that particular occurrence of the process, as indicated by the label of the start event.
Here is why knowing what the process instance represents is important. The instance of every activity in the process must have one-to-one correspondence with the process instance! Of course, there are a few exceptions, but failure to understand this fundamental point leads to structural errors in your BPMN model. And those structural errors are commonplace in beginner models, because other corners of the BPM universe don’t apply that constraint to what they call “processes”.
Take, for example, the Process Classification Framework of APQC, a well-known BPM Architecture organization. It is a catalog of processes and process activities commonly found in organizations. But these frequently are not what BPMN would call processes. Even those that qualify as BPMN processes may contain activities that are not performed repeatedly on instances or whose instances are not aligned with the process instance. Here is one called Process Expense Reimbursements, listing five activities.
But notice that two of the five (8.6.2.1 and 8.6.2.5) are not activities aligned one-to-one with the process instance. That is, they are not performed once for each expense report. That means that if we were to model 8.6.2 Process Expense Reimbursements in BPMN, activities 8.6.2.1 and 8.6.2.5 could not be BPMN activities in that BPMN process. So where do they go? They need to be modeled in separate processes… if they can be modeled as BPMN activities at all! Take 8.6.2.1 Establish and communicate policies and limits. For simplicity, let’s assume that establishing and communicating have one-to-one correspondence, so they could be part of a single process. How does an instance of that process start? It could be a recurring process – that’s Timer start – performed annually. Or it could be triggered occasionally on demand – Message or None start. The point is that 8.6.2.1 needs to be modeled as a process separate from Process Expense Reimbursements. The result of that process, the policy and limits information, is accessible to Process Expense Reimbursements through shared data, such as a datastore.
Activity 8.6.2.5 Manage personal accounts is not a BPMN activity at all. It cannot be a subprocess, because there is no specified set of activity sequences from start to end. To me it is an instance of a case in CMMN, not an activity in this BPMN process.
All this is simply to point out that instance alignment is a problem specific to BPMN because other parts of BPM do not require it.
Since “business processes” in the real world often involve actions that are not one-to-one aligned with the main BPMN process instance, how do we handle them? We’ve already seen one way: Put the non-aligned activity in a separate process – or possibly case. Communication of data and state between the main process and the external process or case is achieved by a combination of messages and shared data.
Repeating activities are another way to achieve instance alignment.
When instance alignment requires two BPMN processes working in concert, it is often helpful to draw the top level of both processes in the same diagram. This can clarify the relationship between the instances as well as the coordination mechanism, a combination of messages and shared data. You can indicate a one-to-N relationship between instances of Process A and Process B by placing a multi-instance marker on the pool of Process B.
An example of this we use in the BPMN Method and Style training is a hiring process. The instance of the main process is a job opening to be filled. It starts when the job is posted and ends when it is filled or the posting is withdrawn. So it qualifies as a BPMN process. But most of the work is dealing with each individual applicant. You don’t know how many applications you will need to process. You want processing of multiple applicants to overlap in time, but they don’t start simultaneously; each starts when the application is received. So repeating activities don’t work here. One possible solution is shown below.
Here there is one instance of Hiring Process for N instances of Evaluate Candidate, so the latter has the multi-participant marker. Hiring Process starts manually when the job is posted and ends when either the job is filled or the posting expires unfilled after three months. Each instance of Evaluate Candidate starts when the application is received, and there are various ways it could end. It could end right at the start if the job is already filled, since before the instance is routed to any person, the process checks a datastore for the current status of the job opening. It could end after Screen and interview if the candidate is rejected. If an offer is extended, it could end if the candidate rejects the offer, or successfully if the offer is accepted. And there is one more way: Each running instance could be terminated in a Message event subprocess upon receiving notice from Hiring Process that the posting is either filled or canceled. While not perfect, this BPMN model illustrates instance alignment between multiple processes working in concert, including how information is communicated between them via messages and shared data.
There is yet another way to do it… all in a single process! It uses a non-interrupting Message event subprocess, and is an exception to the rule that all process activities must align one=to-one with the process instance. It looks like this:
Now instead of being a separate process, Evaluate Applicant is a Message event subprocess. Each Application message creates a new instance of Evaluate Applicant. You don’t know how many will be received, and they can overlap in time. As before, each instance checks the datastore Job status. Since everything is now in one process, we can no longer use messages to communicate between Evaluate Applicant and the main process. Here we have a second datastore, candidates, updated by Evaluate Applicant and queried by Get shortlist to find newly passed applicants. Instead of an interrupting event subprocess to end the instance, we use a Terminate event after notifying all in-process candidates.
If you are just creating descriptive, i.e., non-executable, BPMN models, you may wonder why instance alignment matters. It certainly can make your models more complicated. But even in descriptive models, in order for the process logic to be clear and complete from the printed diagrams alone – the basic Method and Style principle – the BPMN must be structurally correct. If it is not, the other details of the model cannot be trusted. If you want to get your whole team on board with BPMN Method and Style, check out my training. The course includes 60-day use of Trisotech Workflow Modeler, lots of hands-on exercises, and post-class certification.
]]>
DMN, which stands for Decision Model and Notation, is a model-based language for business decision logic. Furthermore, it is a vendor-neutral standard maintained by the Object Management Group (OMG), the organization behind BPMN, CMMN, and other standards. As with OMG’s other business modeling standards, “model” means the names, meaning, and execution behavior of each language element are formally defined in a UML metamodel, and “notation” means that a significant portion of the language is defined graphically, in diagrams and tables using specific shapes and symbols linked to model elements. In other words, the logic of a business decision, business process, or case is defined by diagrams having a precise meaning, independent of the tool that created them. The main reason for defining the logic graphically is to engage non-programmers, aka business people, in their creation and maintenance.
DMN models the logic of operational decisions, those made many times a day following the same explicit rules. Examples include approval of a loan, validation of submitted data, or determining the next best action in a customer service request. These decisions typically depend on multiple factors, and the logic is frequently complex. The most familiar form of DMN logic is the decision table. All DMN tools support decision tables, and that’s because business people understand them readily with zero training. Consider the decision table below, which estimates the likelihood of qualifying for a home mortgage:
Qualifying for a home mortgage depends primarily on three factors: the borrower’s Credit Score, a measure of creditworthiness; the Loan-to-Value ratio, dividing the loan amount by the property appraised value, expressed as a percent; and the borrower’s Debt-to-Income ratio, dividing monthly housing costs plus other loan payments by monthly income, expressed as a percent. Those three decision table inputs are represented by the columns to the left of the double line. The decision table output, here named Loan Prequalification, with possible values “Likely approved”, “Possibly approved”, “Likely disapproved”, or “Disapproved”, is the column to the right of the double line. Below the column name is its datatype, including allowed values. Each numbered row of the table is a decision rule. Cells in the input columns are conditions on the input, and if all input conditions for a rule evaluate to true, the rule is said to match and the value in the output column is selected as the decision table output value.
A hyphen in an input column means the input is not used in the rule; the condition is true by default. So the first rule says, if Credit Score is less than 620, Loan Prequalification is “Disapproved”. Numeric ranges are shown as values separated by two dots, all enclosed in parentheses or square brackets. Parenthesis means exclude the endpoint in the range; square bracket means include it. So rule 4 says, if Credit Score is greater than or equal to 620 and less than 660, and LTV Pct is greater than 75 and less than or equal to 80, and DTI Pct is greater than 36 and less than or equal to 45, then Loan Prequalification is “Likely disapproved”. Once you get the numeric range notation, the meaning of the decision table is clear, and this is a key reason why DMN is considered business-friendly.
But if you think harder about it, you see that while Credit Score might be a known input value, LTV Pct and DTI Pct are not. They are derived values. They are calculated from known input values such as the loan amount, appraised property value, monthly income, housing expense including mortgage payment, tax, and insurance, and other loan payments. In DMN, those calculations are provided as supporting decisions to the top-level decision Loan Prequalification. Each calculation itself could be complex, based on other supporting decisions. This leads to DMN’s other universally supported feature, the Decision Requirements Diagram, or DRD. Below you see the DRD for Loan Prequalification. The ovals are input data, known input values, and the rectangles are decisions, or calculated values. The solid arrows pointing into a decision, called information requirements, define the inputs to the decision’s calculations, either input data or supporting decisions.
Like decision tables, DRDs are readily understood by business users, who can create them to outline the dependencies of the overall logic. In the view above, we show the datatype of each input data and decision in the DRD. Built-in datatypes include things like Text, Number, Boolean, and collections of those, but DMN also allows the modeler to create user-defined types representing constraints on the built-in types – such as the numeric range 300 to 850 for type tCreditScore – and structured types, specified as a hierarchy of components. For example, tLoan, describing the input data Loan, is the structure seen below:
Individual components of the structure are referenced using a dot notation. For example, the Loan Amount value is Loan.Amount.
A complete DRD, therefore, including datatypes for all supporting decisions down to input data, provides significant business value and can be created easily by subject matter experts. As a consequence, all DMN tools support DRDs. But by itself, the DRD and a top-level decision table is not enough to evaluate the decision. For that you need to provide the logic for the supporting decisions. And here there is some disagreement within the DMN community. Some tool vendors believe that DMN should be used only to provide model-based business requirements. Those requirements are then handed off to developers for completion of the decision logic using some other language, either a programming language like Java or a proprietary business rule language like IBM ODM. I call those tools DMN Lite, because fully implemented DMN allows subject matter experts to define the complete, fully executable decision logic themselves, without programming.
Full DMN adds two key innovations to DRDs and decision tables: the expression language FEEL and standardized tabular formats called boxed expressions. Using boxed expressions and FEEL, real DMN tools let non-programmers create executable decision models, even when the logic is quite complex. So you can think of DMN as a Low-Code language for decision logic that is business-friendly, transparent, and executable.
In that language, the shapes in the DRD define variables (with assigned datatypes), with the shape labels defining the variable names. Defining variables by drawing the DRD explains an unusual feature of FEEL, which is that variable names may contain spaces and other punctuation not normally allowed by programming languages. The value expression of each individual decision is the calculation of that decision variable’s value based on the values of its inputs, or information requirements. It is the intention of FEEL and boxed expressions that subject matter experts who are not programmers can create the value expressions themselves.
FEEL is called an expression language, a formula language like Excel formulas, as opposed to a programming language. FEEL just provides a formula for calculating an output value based on a set of input values. It does not create the output and input variables; the DRD does that. Referring back to our DRD, let’s look at the value expression for LTV Pct, the Loan-to-Value ratio expressed as a percent. The FEEL expression looks like this:
It’s simple arithmetic. Anyone can understand it. This is the simplest boxed expression type, called a literal expression, just a FEEL formula in a box, with the decision name and datatype in a tab at the top. Decision table is another boxed expression type, and there are a few more. Each boxed expression type has a distinct tabular format and meaning, and cells in those tables are FEEL expressions. In similar fashion, here is the literal expression for DTI Pct:
The tricky one is Mortgage Payment. It’s also just arithmetic, based on the components of Loan. But the formula is hard to remember, even harder to derive. And it’s one that in lending is used all the time. For that, the calculation is delegated to a bit of reusable decision logic called a Business Knowledge Model, or BKM. In the DRD, it’s represented as a box with two clipped corners, with a dashed arrow connecting it to a decision. A BKM does not have incoming solid arrows, or information requirements. Instead, its inputs are parameters defined by the BKM itself. BKMs provide two benefits: One, they allow the decision modeler to delegate the calculation to another user, possibly with more technical or subject matter knowledge, and use it in the model. Two, it allows that calculation to be defined once and reused in multiple decision models. The dashed arrow, called a knowledge requirement, signifies that the decision at the head of the arrow passes parameter values to the BKM, which then returns its output value to the decision. We say the decision invokes the BKM, like calling an api. The BKM parameter names are usually different from the variable names in the decision that invokes them. Instead, the invocation is a data mapping.
Here I have that BKM previously saved in my model repository under the name Loan Amortization Formula. On the Trisotech platform, I can simply drag it out onto the DRD and replace the BKM Payment Calculation with it. The BKM definition is shown below, along with an explanation of its use from the BKM Description panel. It has three parameters – p, r, and n, representing the loan amount, rate, and number of payments over the term – shown in parentheses above the value expression. The value expression is again a FEEL literal expression. It can only reference the parameters. The formula is just arithmetic – the ** symbol is FEEL’s exponentiation operator – but as you see, it’s complicated.
The decision Mortgage Payment invokes the BKM by supplying values to the parameters p, r, and n, mappings from the decision’s own input, Loan. We could use a literal expression for this, but DMN provides another boxed expression type called Invocation, which is more business-friendly:
In a boxed Invocation, the name of the invoked BKM is below the tab, and below that is a two-column table, with the BKM parameter names in the first column and their value expressions in the second column. Note that because Loan.Rate Pct is expressed as a percent, we need to divide its value by 100 to get r, which is a decimal value, not a percent.
At this point, our decision model is complete. But we need to test it! I can’t emphasize enough how important it is to ensure that your decision logic runs without error and returns the expected result. So let’s do that now, using the input data values below:
Here Loan Prequalification returns “Possibly approved”, and the supporting decision values look reasonable. We can look at the decision table and see that rule 8 is the one that matches.
So you see, DMN is something subject matter experts who are not programmers can use in their daily work. Of course, FEEL expressions can do more than arithmetic, and like Excel formulas, the language includes a long list of built-in functions that operate on text, numbers, dates and times, data structures, and lists. I’ve discussed much of that in my previous posts on DMN. But learning to use the full power of FEEL and boxed expressions – which you will need in real-world decision modeling projects – generally requires training. Our DMN Method and Style training gives you all that, including 60-day use of Trisotech Decision Modeler, lots of hands-on exercises, quizzes to test your understanding, and post-class certification in which you need to create a decision model containing certain required elements. It’s actually in perfecting that certification model that the finer points of the training finally sink in. And you need that full 60 days to really understand DMN’s capabilities.
DMN lets you accelerate time-to-value by engaging subject matter experts directly in the solution. If you want to see if DMN is right for you, check out the DMN training.
]]>
Beginning decision modelers generally assume that the input data supplied at execution time is complete and valid. But that is not always the case, and when input data is missing or invalid the invoked decision service returns either an error or an incorrect result. When the service returns an error result, typically processing stops at the first one and the error message generated deep within the runtime is too cryptic to be helpful to the modeler. So it is important to precede the main decision logic with a data validation service, either as part of the same decision model or a separate one. It should report all validation errors, not stop at the first one, and should allow more helpful, modeler-defined error messages. There is more than one way to do that, and it turns out that the design of that validation service depends on details of the use case.
The first method, which I wrote about in April 2021, uses a Collect decision table with generalized unary tests to find null or invalid input values, as you see below. When I introduced my DMN training, I thought this was the best way to do it, but it’s really ideal only for the simple models I was using in that training. That is because the method assumes that values used in the logic are easily extracted from the input data, and that the rule logic is readily expressed in a generalized unary test. Moreover, because an error in the decision table will usually fail without indicating which rule had the problem, the method assumes a modest number of rules with fairly simple validation expressions. As a consequence, this method is best used when:
The second method, which I wrote about in March 2023, takes advantage of enhanced type checking against the item definition, a new feature of DMN1.5. Unlike the first method, this one returns an error result when validation errors are present, but it returns all errors, not just the first one, each with a modeler-defined error message. Below you see the enhanced type definition, using generalized unary tests, and the modeler-defined error messages when testing in the Trisotech Decision Modeler. Those same error messages are returned in the fault message when executed as a decision service. On the Trisotech platform, this enhanced type checking can be either disabled, enabled only for input data, or enabled for input data and decisions.
This method of data validation is avoids many of the limitations of the first method, but cannot be used if you want the decision service to return a normal response, not a fault, when validation errors are present. Thus it is applicable when:
More recently I have been involved in a large data validation project in which neither of these methods are ideal. Here the input data is a massive data structure containing several hundred elements to be validated, and we want validation errors to generate a normal response, not a fault, with helpful error messages. Moreover, data values used in the rules are buried deep within the structure and many of them are recurring, so simply extracting them properly is non-trivial. Think of a tax return or loan application. Also, even with properly extracted values, the validation rules themselves may be complex conditions involving many variables.
For these reasons, neither of the two methods described in my previous posts fits the bill here. The fact that an element’s validation rule can be a complex expression involving multiple elements rules out the type-checking method and is also a problem with the Collect decision table. Decision tables also add the problem of testing. When you have many rules, some of them are going to be coded incorrectly the first time, and if a rule returns an error the whole decision table fails, so debugging is extremely difficult. You need to be able to tell, when a rule fails to return the expected result, if it is because you have incorrectly extracted the data element value or you have incorrectly defined the rule logic. Your validation method needs to separate those concerns.
This defines a new set of requirements:
The third method thus has a completely different architecture:
While possibly overkill for simple validation services, in complex validation services this method has a number of distinct advantages over the other two:
Let’s walk through this third data validation method. We start with the Extraction service. The input data Complex Input has the structure shown here:
In this case there is only one non-repeating component, containing just two child elements, and one repeating component, also containing just two child elements. In the project I am working on, there are around 10 non-repeating components and 50 repeating components, many containing 10 or more child elements. So this model is much simpler than the one in my engagement.
The Extraction DRD has a separate branch for each non-repeating and each repeating component. Repeating element branches must iterate a BKM that extracts the individual elements for that instance.
The decisions ending in “Elements” extract all the variables referenced in the validation rules. These are not identical to the elements contained in Complex Input. For example, element A1 is just the value of the input data element A1, but element A1Other is either the input data element A2, if the value of A1 is “Other”, or null otherwise.
Repeating component branches must iterate a BKM that extracts the variable from a single instance of the branch.
In this case, we are extracting three variables – C1, AllCn, and Ctotal – although AllCn is just used in the calculation of Ctotal, not used in a rule. The goal of Extraction is just to obtain the values of variables used in the validation rules.
The ExtractAll service will be invoked by the Rules, and again the model has one branch for each non-repeating component and one for each repeating component. Encapsulating ExtractAll as a separate service is not necessary in a model this simple, but when there are dozens of branches it helps.
Let’s focus on Repeating Component Errors, which iterates a BKM that reports errors for a single instance of that branch.
In this example we have just two validation rules. One reports an error if element C1 is null, i.e. missing in the input. The other reports an error if element Ctotal is not greater than 0. The BKM here is a context, one context entry per rule, and all context entries have the same type, tRuleData, with the four components shown here. We could have added a fifth component containing the error message text, but here we assume that is looked up from a separate table based on the RuleID.
So the datatype tRepeatingComponentError is a context containing a context, and the decision Repeating Component Errors is a collection of a context containing a context. And to collect all the errors, we have one of these for each branch in the model.
That is an unwieldy format. We’d really like to collect the output for all the rules – with isError either true or false – in a single table. The decision ErrorTable provides that, using the little-known FEEL function get entries(). This function converts a context into a table of key-value pairs, and we want to apply it to the inner context, i.e. a single context entry of Repeating Component Error.
It might take a minute to wrap your head around this logic. Here fRow is a function definition – basically a BKM as a context entry – that converts the output of get entries() into a table row containing the key as a column. For non-repeating branches, we iterate over each error, calling get entries() on each one. This generates a table with one row per error and five columns. For repeating branches, we need to iterate over both the branches and for the errors in each branch, an iteration nested in another iteration. That creates a list of lists, so we need the flatten() function to make that a simple list, again one row per error (across all instances of the branch) and five columns. In the final result box, we just concatenate the tables to make one table for all errors in the model.
Here is the output of ErrorTable when run with the inputs below:
ErrorTable as shown here lists all the rules, whether an error or not. This is good for testing your logic. Once tested, you can easily filter this table to list only rules for which isError is true.
Bottom Line: Validating input data is always important in real-world decision services. We’ve now seen three different ways to do it, with different features and applicable in different use cases.
]]>
The FEEL expressions for dates, times, and durations may look strange, but don’t blame DMN for that. They are based on the formats defined by ISO 8601, the international standard for dates and times in many computer languages. In FEEL expressions, dates and times can be specified by using a constructor function, e.g. date(), applied to a literal string value in ISO format, as seen below. For input and output data values, the constructor function is omitted.
A date is a 4-digit year, hyphen, 2-digit month, hyphen, 2-digit day. For example, to express May 3, 2017 as an input data value, just write 2017-05-03. But in a FEEL expression you must write date(“2017-05-03”).
Time is based on a 24-hour clock, no am/pm. The format is a 2-digit hour, 2-digit minute, 2-digit second, with optional fractional seconds after the decimal point. There is also an optional time offset field, representing the time zone expressed as an offset from UTC, what they used to call Greenwich Mean Time. For example, to specify 1:10 pm and 30 seconds Pacific Daylight Time, which is UTC – 7 hours, you would enter 13:10:30-07:00 as an input data value, or time(“13:10:30-07:00”) in a FEEL expression. To specify the time as UTC, you can either use 00:00 as the time offset, or the letter Z, which stands for Zulu, the military symbol for UTC. DateTime values concatenate the date and time formats with a capital T separating them, as you see below. In a FEEL expression, you must use the proper constructor function with the ISO text string enclosed in quotes.
It is possible in FEEL to extract the year, month, or day component from a date, time, or dateTime using a dot followed by the component name: year, month, day, hour, minute, second, or time offset. For example, the expression date(“2017-05-03”).year returns the number 2017. All of these extractions return a number except for the time offset component, which returns not a number but a duration, with a format we’ll discuss shortly.
DMN also provides an attribute, weekday – not really a component but also extracted via dot notation – returning an integer from 1 to 7, with 1 meaning Monday and 7 meaning Sunday.
The interval between two dates, times, or dateTimes defines a duration. DMN, like ISO, defines two kinds of duration: days and time duration, and years and months duration. Days and time duration is equivalent to the number of seconds in the duration. The ISO format is PdDThHmMsS, where the lower case d,h,m, and s are integers indicating the days, hours, minutes, and seconds in the duration. If any of them is zero, they are omitted along with the corresponding uppercase D, H, M, or S. And they are supposed to be normalized so that the sum of the component values is minimized.
For example a duration of 61 seconds could be written P0DT61S, but we can omit the 0D, and the normalized form is 1 minute 1 second, so the correct value is PT1M1S. In a FEEL expression, you need to enclose that in quotes and make it the argument of the duration constructor function: duration(“PT1M1S”).
Days and time duration is the one normally used in calendar arithmetic, but for long durations the alternative years and months duration is available, equivalent to the number of whole months included in the duration. The ISO format is PyYmM, where lower case y and m are again numbers representing number of years and months in the duration, and again the normalized form minimizes the sum of those component values. So a duration of 14 months would be written in normalized form as P1Y2M, and in FEEL, duration(“P1Y2M”). Since months contain varying numbers of days, the precise value of years and months duration is the number of months in between the start date and the end date, plus 1 if the day of the end month is greater than or equal to the day of the start month, or plus 0 otherwise.
As with dates and times, you can extract the components of a duration using a dot notation. For days and time duration, the components are days, hours, minutes, and seconds. For years and months duration, the components are years and months.
The point of all this is to be able to do calendar arithmetic.
Let’s start with addition of a duration.
A common use of calendar arithmetic is finding the difference between two dates or dateTimes.
You can multiply a duration times a number to get another duration of the same type, either days and time duration or years and months duration.
You can divide a duration by a number to get another duration of the same type. But the really useful one is dividing a duration by a duration, giving a number. For example, to find the number of seconds in a year, the expression duration(“P365D”)/duration(“PT1S”) returns the correct value, number 31536000. Note this is not the result returned by extracting the seconds component. The expression duration(“P365D”).seconds returns 0.
Here is a simple example of calendar arithmetic from the DMN training. Given the purchase date and the timestamp of item return, determine the eligibility for a refund, based on simple rules.
The solution, using calendar arithmetic, is shown below:
Here we use a context in which the first context entry computes a days and time duration by subtracting the two dateTimes. The second context entry is a decision table that applies the rules. Note we can use durations like any other FEEL type in the decision table input entries.
A second example comes from the Low-Code Business Automation training. It is common for databases and REST services to express dateTime values as a simple number. One common format used is the Unix timestamp, defined as the number of seconds since January 1, 1970, midnight UTC. To convert a Unix timestamp to a FEEL dateTime, you can use the BKM below:
The BKM multiplies the duration of one second by timestamp, a number, returning a days and time duration, and then adds that to the dateTime of January 1, 1970 midnight UTC, giving a dateTime. And you can perform the reverse mapping with the BKM below:
This time we subtract January 1, 1970 midnight UTC from the FEEL dateTime, giving a days and time duration, and then divide that by the duration one second, giving a number.
Calendar arithmetic is used all the time in both decision models and Low-Code business automation models. While the formats are unfamiliar at first, FEEL makes calendar arithmetic very easy. It’s all explained in the training. Both the DMN Method and Style course and the Low-Code Business Automation course provide 60-day use of the Trisotech platform and include post-class certification. Check it out!
]]>