@@ -67,7 +67,7 @@ Our approach extends that based on operators:

* operators specialized in activities with enrichment of constraints from domain experts

Other works follow this approach in particular we follow the steps of the [HoloClean] cleanup framework which divides into detect and fix,

Other works follow this approach in particular we follow the steps of the [HoloClean](http://holoclean.io/) cleanup framework which divides into detect and fix,

we propose our own solution of fix based on learning with neural networks and fuzzy logic rules.

...

...

@@ -84,11 +84,11 @@ In the configuration of the algorithm the distribution of existing data will be

What we propose here is an approach to our unique knowledge of neural network classifier enriched by fuzzy logic constraints.

Statistical Relational Learning (SRL) approaches have been developed for reasoning under uncertainty and learning in the presence of data and rich knowledge.

As presented in [Serafini] LogicTensor Networks (LTNs) are an SRL framework which integrates neural networks with first-order fuzzy logic to allow

As presented in [Serafini](https://sites.google.com/fbk.eu/ltn/tutorial-ijcnn-2018) LogicTensor Networks (LTNs) are an SRL framework which integrates neural networks with first-order fuzzy logic to allow

(i) efficient learning from noisy data in the presence of logical constraints, and

(ii) reasoning with logical formulas describing general properties of the data.

LTNs combine learning in deep networks with relational logical constraints [27]. I

LTNs combine learning in deep networks with relational logical constraints. I

LTN uses a First-order Logic (FOL) syntax interpreted in the real numbers, which is implemented as adeep tensor network. Logical terms are interpreted as feature vectors in a real-valued n-dimensional space. Function symbols are interpreted as real-valued functions, and predicate symbols as fuzzy logic relations. This syntax and semantics, called realsemantics, allow LTNs to learn efficiently in hybrid domains, where elements are com-posed of both numerical and relational information.

...

...

@@ -161,7 +161,7 @@ the first tuple explicitly expressed both values as true.

the second tuple has explicitly expressed only the truth for A.

The test result for B (t2) will be true with rapid convergence, as there is no conflicting information.

Case study: [test_ldb_basic](docker/docker-ldb/ldb/test/test_basic.py)

Case study: [test_ldb_basic](https://bitbucket.org/semint/spike.ltn/src/master/docker/docker-ldb/ldb/test/test_basic.py)

### Minimal categorical example

...

...

@@ -172,14 +172,14 @@ Mutual exclusion constraints must therefore be introduced for each possible cate

Given a certain training population we can ask LDB the value of a new person, according to the values

assumed by other parameters, eg worker, adult

Case study: [test_ldb_category](docker/docker-ldb/ldb/sample/basic_category.py)

Case study: [test_ldb_category](https://bitbucket.org/semint/spike.ltn/src/master/docker/docker-ldb/ldb/sample/basic_category.py)

### Minimal example of inconsistency

Starting from the previous categorical case if we add a description that contradicts a constraint, optimization will end by timeout and not by minimizing the distances of the constraints, leaving the value space in an unsecured configuration to satisfy all constraints.

Case study: [test_ldb_basic_contradiction](docker/docker-ldb/ldb/test/test_basic_contradiction.py)

Case study: [test_ldb_basic_contradiction](https://bitbucket.org/semint/spike.ltn/src/master/docker/docker-ldb/ldb/test/test_basic_contradiction.py)

### Minimal numerical example

...

...

@@ -188,19 +188,19 @@ We translate a simple archive that describes a numerical feature.

Suppose a table (id: string, age: integer (0-100), adult: Boolean)

given a first sample of values we can use LDB as a classifier to make us suggest the best value:

Case study: [test_ldb_numeric](docker/docker-ldb/ldb/sample/basic_numeric.py)

Case study: [test_ldb_numeric](/docker/docker-ldb/ldb/sample/basic_numeric.py)

Choosing a two-dimensional metric

we can extend the constraints for example to a Euclidean space.

Case study: [test_ldb_numeric_ecludean](docker/docker-ldb/ldb/code/basic_numeric_euclidean.py)

Case study: [test_ldb_numeric_ecludean](https://bitbucket.org/semint/spike.ltn/src/master/docker/docker-ldb/ldb/code/basic_numeric_euclidean.py)

### Distribution example

To understand how it affects the distribution of values let's start from a table in CSV format:

Case study: [test_ldb_105](docker/docker-ldb/ldb/sample/basic_distrib.py)

Case study: [test_ldb_105](https://bitbucket.org/semint/spike.ltn/src/master/docker/docker-ldb/ldb/sample/basic_distrib.py)

### Example of constrained distribution

...

...

@@ -208,7 +208,7 @@ Case study: [test_ldb_105](docker/docker-ldb/ldb/sample/basic_distrib.py)

We verify that the logical constraints affect the result beyond just the distribution.

We then introduce a further constraint on the distribution and observe the changes on the result.

Constraint case study: [test_ldb_constraint](docker/docker-ldb/ldb/sample/basic_distrib_constraint.py)

Constraint case study: [test_ldb_constraint](https://bitbucket.org/semint/spike.ltn/src/master/docker/docker-ldb/ldb/sample/basic_distrib_constraint.py)

### Simplified realistic example

...

...

@@ -217,7 +217,7 @@ Assuming to start from values not in enumerated format but free text value

you will need to apply additional translation steps to the source data.

The first step is the construction of categories for each column.

Case study: [hospital_10](docker/docker-ldb/ldb/sample/fix_hospital_10.py)

Case study: [hospital_10](https://bitbucket.org/semint/spike.ltn/src/master/docker/docker-ldb/ldb/sample/fix_hospital_10.py)

* highlighted mapping difficulty, to get good learning the whole record should be mapped to real vectors,

for some types of field this activity is not immediate.

...

...

@@ -227,7 +227,7 @@ for some types of field this activity is not immediate.

Using the entire data set consisting of

HoloClean hospital case study: [hospital_complete](docker/docker-ldb/ldb/sample/fix_hospital.py)

HoloClean hospital case study: [hospital_complete](https://bitbucket.org/semint/spike.ltn/src/master/docker/docker-ldb/ldb/sample/fix_hospital.py)

* highlighted scalability difficulty, network training, application and effectiveness times remain largely optimizable.