Background
Object-Centric Process Mining
Standard process mining assumes a single case notion where events (i.e., activity occurrences) may be partitioned by case obtaining traces, which are analyzed / aggregated to perform process mining tasks (e.g., process discovery); a case typically consists in some core object instance referred / manipulated by the process.
In general, processes deal with multiple, inter-related objects that co-evolve, such as the processing of multiple applications for a job vacancy. In this setting of a 1:N relationship (in general N:M, N or M > 1), the single case notion lead to the two problems [vanderaalst2019sefm]:
- convergence problem, where occorrences of activities on the "N" side are referred to a case object on the "1" side, making impossible to related events for the same "N" side object;
- divergence problem, where occurrences of activities on the "1" side are referred to a case object on the "N" side, resulting in their multiplication and the impossibility to corefer the resulting clones.
The Object-Centric setting [vanderaalst2019sefm] assumes instead that there might be multiple case notions (vs. single one) corresponding to different object types, and that an event may refer to many objects and thus case notions. The lack of a well-defined case ID requires new types of event logs/models, process models, and process mining techniques.
Object-Centric Behavioral Constraint (OCBC) Model
Object-Centric Behavioral Constraint (OCBC) models [li2017ocbcdiscovery] allow for declaratively describing the data and behavioral aspects of processes in an object-centric setting. OCBC models borrow UML notation for classes and relations to model the data component, with relationship cardinalities possibly extended with eventually / always temporal operators to cope with object evolution in time. The behavioral part consist of activities boxes linked among them by Declare-like behavioral cardinality constraints and to object classes via object-centric behaviral constraints. The first constraints specify that if a reference activity occurs (denoted by the dot), then one (single arrow) or at least one (multiple arrows) occurrences of another activity must preceed, follow, or either preceed or follow (lack of arrows), with 'X' negating the constraint. The latter constraints (rendered as dotted lines) either link an activity to an object class specifying cardinalities (with temporal operators / stereotypes), or link an activity/activity constraint to a class relationship, to identify the constrained activities based on their associated objects.
The same paper introducing OCBC [li2017ocbcdiscovery] also refers to a possible log format XOC that represents for a sequence of events each associated to their activities, attributes, objects, and a snapshot of object relations when then the event occurred, and where a case identifier like in XES is missing. A process model discovery algorithm is then proposed, where object classes, activities, and their relationships are directly extracted from raw data, and where constraints can be extracted via heuristics. The approach is implemented in an OCBC Model Discovery plugin for ProM.
XOC, the eXtensible Object-Centric log format, is better described in [li2018xoc], where the authors describe an approach for extracting a XOC log from a relational DB exploiting also the DBMS REDO log tracking DB changes. The approach is implemented in the XOC Log Generator ProM plugin. Example XOC datasets are available for download here and here.
A formalization of OCBC models based on temporal description logics is proposed in [artale2019ocbctdl]. It extends existing formalizations for the UML-based data component of OCBC, specifying how to map OCBC behavioral constraints to corresponding temporal DL logics axioms. The result is a formal semantics for OCBC models, and the possibility to check model consistency, activity executability, trace compliance, and the existence of implied properties using temporal DL reasoning services, with ExpTime complexity that can be reduced to PSpace if constraints involving object class relations are not used/modeled.
Graph Data Models for Object-Centric Data
Berti [berti2019bpmphd] considers process mining without a predefined notion of event case, and in that context investigates the extraction of event data from heterogeneous sources (SQL / NoSQL databases), their pre-processing and encoding in an event graph, and process discovery and compliance checking applications over that graph. The nodes of the event graph are events, activities (classes of events), objects, object classes, and process clusters (i.e., sets of object classes closely related by a business process. This graph model generalizes a previous work by Berti and van der Aalst [berti2018starstar] on StarStar models, which lack process clusters and where events have arbitrary <key, value> attributes. In a StarStar model, the E2O event-object subgraph is obtained from log data. Based on it, a weighted E2E event-event subgraph is obtained by linking, for each object, directly following events related to that object. Finally, a doubly weighted (weight, performance) A2A activity-activity subgraph is obtained by projecting E2E edges to A2A edges for corresponding activities, aggregated along object classes. These transformation are performed in a ProM plugin operating on XOC logs, OpenSLEX meta-models and Neo4J dbs.
Esser and Fahland propose a generic data model [esser2020multidimensional], derived from a preliminary use case investigation [esser2020multidimensional], to encode object centric logs as property graphs in a graph database. Nodes are events, event classes, entities, and logs intended as collections of events (multiple logs can be combined in a graph this way). They test the data model with log datasets from BPI Challenges 2014 - 2019, for which they provide log data and python code to reshape that data according to the data model and load it into Neo4J.
Graph Database Languages
A standardized graph query language, named GQL, is currently under development at ISO, integrating features from the existing languages Cypher, PGQL, G-CORE and GSQL. A comparison of the first three languages, used as input to ISO, is reported here.
Cypher, developed within community project openCypher, was originally developed for Neo4J and then adopted by other vendors (publicly available: Neo4J and AgensGraph, the latter based on PostgreSQL). The current version is Cypher 9. Cypher 10 was also started being developed with an implementation for SPARK provided by project Morpheus, but development has slowed down to concentrate contributions to ISO GQL. Cypher 9 queries are not composable, but Cypher 10 queries should be and may address multiple graphs.
Cypher can be implemented on top of the Gremlin language provided by the graph computing framework Apache TinkerPop, via an adapter. This allows layering Cypher on top of TinkerPop implementations, such as OrientDB
G-CORE [angles2018gcore] is an academic project characterized by paths as first-class citizen of the data model, and by composable queries. A prototype, research-grade implementation for SPARK is being developed.
PGQL is a graph query language for Oracle, which is its only implementation so far. Its syntax is aligned to SQL as much as possible.
GSQL is the query language of commercial graph database TigerGraph.
References
-
[vanderaalst2019sefm] W. M. P. van der Aalst, “Object-Centric Process Mining: Dealing with Divergence and Convergence in Event Data,” in Proc. of Software Engineering and Formal Methods (SEFM), 2019, pp. 3–25, doi: 10.1007/978-3-030-30446-1_1.
-
[li2017ocbcdiscovery] G. Li, R. M. de Carvalho, and W. M. P. van der Aalst, “Automatic Discovery of Object-Centric Behavioral Constraint Models,” in Business Information Systems, 2017, pp. 43–58, doi: 10.1007/978-3-319-59336-4_4.
-
[li2018xoc] G. Li, E. G. L. de Murillas, R. M. de Carvalho, and W. M. P. van der Aalst, “Extracting Object-Centric Event Logs to Support Process Mining on Databases,” in Lecture Notes in Business Information Processing, Springer International Publishing, 2018, pp. 182–199.
-
[artale2019ocbctdl] A. Artale, A. Kovtunova, M. Montali, and W. M. P. van der Aalst, “Modeling and Reasoning over Declarative Data-Aware Processes with Object-Centric Behavioral Constraints,” in Proc. of Int. Conf. on Business Process Management (BPM), 2019, pp. 139–156, doi: 10.1007/978-3-030-26619-6_11.
-
[berti2019bpmphd] A. Berti, “Process Mining on Event Graphs: a Framework to Extensively Support Projects,” in Proc. of Dissertation Award, Doctoral Consortium, and Demonstration Track at 17th Int. Conf. on Business Process Management (BPM), Vienna, Austria, 2019, vol. 2420, pp. 60–65.
-
[berti2018starstar] A. Berti and W. M. P. van der Aalst, “StarStar Models: Using Events at Database Level for Process Analysis,” in Proc. of 8th Int. Symp. on Data-driven Process Discovery and Analysis (SIMPDA), Seville, Spain, 2018, vol. 2270, pp. 60–64.
-
[esser2019bpmgraphdb] S. Esser and D. Fahland, “Storing and Querying Multi-dimensional Process Event Logs Using Graph Databases,” in Proc. of Business Process Management (BPM) Workshops, 2019, pp. 632–644, doi: 10.1007/978-3-030-37453-2_51.
-
[esser2020multidimensional] S. Esser and D. Fahland, “Multi-Dimensional Event Data in Graph Databases.” 2020. https://arxiv.org/abs/2005.14552
-
[angles2018gcore] R. Angles et al., “G-CORE: A Core for Future Graph Query Languages,” in Proceedings of the 2018 International Conference on Management of Data - SIGMOD 18, 2018, doi: 10.1145/3183713.3190654.