Implementation
Objectives
The scope of the implementation includes as primary objectives:
- the addition a generic mechanism to support custom aggregate functions in RDF4J;
- custom functions for JSON:
JSON_ARRAYAGG
,JSON_OBJECTAGG
,JSON_ARRAY
,JSON_OBJECT
,JSON_TEMPLATE
; - the handling of those functions within the Ontop OBDA system, leading to their mapping to corresponding SQL functions;
- the implementation of GraphQL queries on top of OBDA leveraging those new functions
Secondary objectives:
- support for SPARQL Construct/Describe + JSON-LD frame
- a generic mechanism to rewrite custom (aggregate) functions in terms of plain SPARQL (where possible), to allow execution in standard SPARQL endpoints;
- extension of RDF4J query evaluation code to support new aggregate functions
The work is justified by the existence of standard JSON aggregate functions in SQL and their support (or the support of equivalent forms) in most RDBMS backing an OBDA system. Specifically:
- aggregate functions of following RDBMS support standard JSON aggregate operators (and more): PostgreSQL, MySQL 5.7, MySQL 8, MariaDB, Oracle, IBMi/DB2, H2, MonetDB, and Teiid (json aggregate functions listed separately);
- aggregate functions of HSQLDB, Denodo,
and SapHana do not support standard JSON aggregation operators but provide for
GROUP_CONCAT
-like operators that may be used to simulate them; - aggregate functions of SQLServer also do not include standard JSON aggregation functions, but there is a special
FOR JSON
directive that can be similarly used to generate JSON directly from SQL queries.
Modules
Ontop modules to be modified:
- model - define new SPARQL/RDB aggregate function symbols
-
rdb - map new RDB aggregate functions to different SQL dialects (
DefaultSelectFromWhereSerializer
) -
reformulation-core -- extend query translator to support extended RDF4J algebra (
RDF4JInputQueryTranslatorImpl
) -
endpoint -- extend HTTP endpoint with support for GraphQL query language. Library graphql-java-spring provides an example of Spring ReST controller (see classes
GraphQLController
,GraphQLRequestBody
) implementing GraphQL GET/POST queries, and may be adapted here to extract the GraphQL query and delegate to code inrdf4j-queryext-graphql
module where the query is mapped to SPARQL SELECT and evaluated (vs. delegating to graphql-java) - rdf4j-config-sql -- include GraphQL query language and JSON response as supported query languages/result formats
- cli -- add/revise commands for GraphQL queries
Ontop modules possibly modified:
- optimization - add/edit optimizers (need not foreseen so for)
New general-purpose external modules:
-
rdf4j-queryext-core - Collector for extensions to RDF4J query support, initially including new algebra nodes for custom aggregate function, their evaluation, and SPARQL parsing and rendering (further extensions may be added here). RDF4J SPARQL parser might be reused: it will see custom aggregate calls as simple function calls, so we have to post-process the resulting algebraic expression (otherwise, adapt RDF4J parser possibly leveraging lexical states, or build new parser, e.g., with parboiled that might enable to dynamically control the grammar based on registered aggregate functions).
-
rdf4j-queryext-graphql - Defines a
GRAPHQL
QueryLanguage
for RDF4J, with parser mapping it to SPARQL select using JSON functions. May leverage GraphQL schema/query/result data model provided by library graphql-java, but not its execution engine (based on field resolvers, while here we translate to SPARQL). -
rdf4j-queryext-jsonldframe - Handles queries of form <SPARQL Construct/Describe, JSON-LD frame>, where one of the two components may be missing and autogenerated from the other; internally maps to a SPARQL Select with JSON functions (TBD: whether to define new query language or adopt different approach)
These modules do not depend on ontop and simply extend RDF4J functionalities so in principle they may be contributed back to RDF4J (see instructions). Enhancements will consist in:
- the support for custom aggregate functions (algebra, evaluation, parsing, rendering)
- the rewriting facility for custom functions
- any defined custom aggregate function
- any improvement to SPARQL rendering code, which lacks some features in RDF4J (see https://github.com/eclipse/rdf4j/issues/71)