Commit da46cc7f authored by npedot's avatar npedot
Browse files

updates semint overview

parent 1561c665
......@@ -4,6 +4,11 @@ Deep Learning and Logic Reasoning from Data and Knowledge.
[Project Overall](docs/semint-overall.md)
[Install FAQ](docs/install.md)
[DB Repair Notebook](https://nbviewer.jupyter.org/urls/bitbucket.org/semint/spike.ltn/raw/master/notebooks/db-reparing.ipynb)
[Docker files](docker/usage.md)
## References
......@@ -11,10 +16,4 @@ Deep Learning and Logic Reasoning from Data and Knowledge.
[Code LTN Reference](https://github.com/logictensornetworks/logictensornetworks)
[Install FAQ](docs/install.md)
[DB Repair Notebook](https://nbviewer.jupyter.org/urls/bitbucket.org/semint/spike.ltn/raw/master/notebooks/db-reparing.ipynb)
[Docker files](docker/usage.md)
[Notebook markdown](https://github.com/aaren/notedown)
\ No newline at end of file
# SemInt Overview
Nei processi di espansione di aziende che incorporano altre aziende, nella nascita di nuove startup che crescono cercando di intercettare e rispondere ai bisogni utente e nell'evoluzione degli archivi di dati dei loro schemi, l'integrazione di nuove sorgenti dati diventa parte essenziale di una strategia aziendale di successo e spesso prerequisito alla sua stessa sopravvivenza.
In the expansion processes of companies that incorporate other companies, in the birth of new startups that grow trying to intercept and respond to user needs and in the evolution of the data archives of their schemes, processed by new data sources, it becomes an essential part of a strategy successful business and often a prerequisite for its own survival.
Anche la normale storia evolutiva di una singola azienda che voglia rispondere alle sfide del mercato implica spesso un complicato e costoso accesso alle sorgenti ed alla loro gestione per:
* infrastruttura a spaghetti
* sfide 3V
* normative di securezza e audit
* debito tecnico
Even the normal evolutionary history of a single company that wants to respond to market challenges often involves a complicated and costly access to the sources and their management by:
L'attivita' evolutiva e' un insieme di scelte di compromesso ed equilibrio eseguite al fine di mantenere lo schema dei dati quano piu' efficente e leggibile, ossia cercando di:
1. tenere al minimo il debito tecnico consentendo un evoluzione rapida delle applicazioni che vi accedono
2. offrire massima efficenza di accesso e manipolazione dei dati a chi debba e possa accedervi
3. offrire massima rapida' e corretta interpretazione dei dati a tutti i ruoli che ne possano beneficiare
requisiti che spesso lottano tra loro.
* growth business requirements
* technological alignment to the market
* big data challenges: volume, velocity, and variety
* safety and audit regulations
* technical debt
The evolutionary activity is a set of compromise and balance choices made in order to keep the data schema more efficient and readable, ie trying to:
In this paper we will focus on keeping low the technical debs and we present a curated selection of steps to help this evolution as Ontology-based Data Integration (OBDI) of structured datasources like relational databases.
1. keep technical debt to a minimum by allowing the applications that access it to evolve rapidly;
2. offer maximum efficiency in accessing and manipulating data to those who should and can access it
3. offer maximum rapid and correct interpretation of the data to all the roles that can benefit from it;
La metodologia qui proposta ha caratteristiche interatattive e iterative.
Requirements that often struggle with each other.
Iterativa perche' offre un approccio pay-as-you-go[] che consente di frammentare il costo e beneficiare in tempi piu' rapidi del lavoro svolto, in opposizione all'evoluzione a singolo passo.
Here we will focus on keeping low the technical debs and we present a curated selection of steps to help this evolution as Ontology-based Data Integration (OBDI) of structured datasources like relational databases.
The methodology proposed here has inter-active and iterative characteristics.
Iterative because it offers a pay-as-you-go [] approach that allows you to fragment the cost and benefit more quickly than the work done, as opposed to single-step evolution.
Interactive because the proposal sees in the decision-making centrality of the designer the solution to the various non-automatable problems of choice.
Interattiva perche' la proposta vede nella centralita' decisionale del progettista la soluzione ai diversi problemi di scelta non automatizzabili.
Steps:
1. for each datasource reverse engineer from database to conceptual level with semantic enrichment
1. for each datasource reverse engineer from database to conceptual level with semantic enrichment using standard vocabularies (eg. schema.org).
2. map and integrate from many conceptual diagrams to a single overall conceptual model gaining semantic services
3. map from the conceptual model to physical structured datasources
4. SQL query on virtual or materialized new datasources
5. repeat from step 1 for each new datasource to integrate
The goal of this process is to gain:
* a progressive integration
* a live sharable documentation in sync
* no intermediary for low level access
Per la descrizione del modello concettuale si usera' Object Role Modeling[] in quanto notazione amichevole sia al progettista che all'esperto di dominio in virtu' della sua proprieta' di verbalizzazione[], offrendo nel contempo una semanica formale ben fondata dalla quale sara' possibile estrarre una descrizione OWL2[] per usufuire di servizi di verifica della consistenza e l'esplicitazione di regole che altrimenti rimarrebbero implicite, efficentare servizi di pulizia del dato esportando i vincoli concettuali di dominio [HoloClean].
Ogni passaggio e' stato studiato per ridurre al minimo le frizioni di perdita di informazione dovute alla poverta' semantica dei livelli fisici rispetto alla ricchezza di quelli concettuali, evidenziando i necessari compromessi pratici.
For the description of the conceptual model we will use Object Role Modeling [] as a friendly notation to both the designer and the domain expert by virtue of his verbalization [], while offering a well-established formal semanic from which he will be It is possible to extract an OWL2 [] description to use services to check the consistency and explication of rules that would otherwise remain implicit, to improve data cleaning services by exporting the domain's conceptual constraints [HoloClean].
Each step has been studied to minimize the friction of information loss due to the semantic poverty of the physical levels with respect to the richness of the conceptual ones, highlighting the necessary practical compromises.
## Model top-down: ORM
Database design, is a forward engineering process which systematically transforms a high-level conceptual schema into a relational database schema residing on a physical machine, via a series of tasks — requirement collection and analysis, logical
design, normalisation and the final physical implementation.
In ORM, the knowledge is structured into:
* Facts, A fact is a statement, or assertion, about some piece of information within the
application domain. (Professor works as Employee for Departement)
* Facts, A fact is a statement, or assertion, about some piece of information within the application domain. (Professor works as Employee for Departement)
* Predicates, is a verb, or verb phrase, that connects the object types in a fact, with one role each. (e.g. ... works for ...)
* Roles, Each role in a predicate is expressed by a role label and is played by one object type (e.g. Employee).
* Object Types, categorizes data into different kinds of meaningful sets (e.g Professor).
......@@ -54,15 +64,23 @@ application domain. (Professor works as Employee for Departement)
The knowledge about the domain is the stated by means of set of facts.
These facts may be verbalized using sample data, named as fact instances.
eg.
Instructor works for Department (fact type)
Instructor 100 works for Department “CS” (fact instance)
Instructor works for Department (fact type)
Instructor 100 works for Department “CS” (fact instance)
For a detailed guide [Guide to FORML]
## Model bottom-up: Reverse
[Nony]
Database Reverse Engineering (DBRE) is a process typically used for creating an
equivalent conceptual schema from an existing relational database. It is a trans-
lation between schemas of different data models and structures.
The application of a reverse engineering process over a relational database is
beneficial for recovering hidden knowledge and representing it in a conceptual
schema that provides a more expressive formulation of the domain.
[Nony]
## Merge
......@@ -78,23 +96,21 @@ Business questions are anwsered via ORM facts.
* Rendundancy, distinct types names same constraints
* Derivable, empty types
difficult to automate
over hundread or thousand entity attributes
no direct mapping
Very difficult to automate may have over hundread or thousand entity attributes there is no direct mapping.
Roles:
Business Expert giving business value to data
Knowledge Scientist, design ontology competence
IT Dev, to data access
## Map to ER from conceptual
[Halpine]
## Query
Via standard SQL query.
* Business Expert giving business value to data
* Knowledge Scientist, design ontology competence
* IT Dev, to data access
## Map to ER from conceptual
Once given the overall conceptual schema described in ORM notation ther is a standard automatizable procedure to generate a normalized chema database.
[Halpine]
## Query
Once exported to a new database data could be queried using standard SQL query and analytical tools.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment