Commit 7ab7986c authored by npedot's avatar npedot
Browse files

small fixes

parent 8f20ff12
......@@ -161,7 +161,7 @@ the first tuple explicitly expressed both values as true.
the second tuple has explicitly expressed only the truth for A.
The test result for B (t2) will be true with rapid convergence, as there is no conflicting information.
Case study: [test_ldb_basic] (docker/docker-ldb/ldb/test/test_basic.py)
Case study: [test_ldb_basic](docker/docker-ldb/ldb/test/test_basic.py)
### Minimal categorical example
......@@ -172,7 +172,7 @@ Mutual exclusion constraints must therefore be introduced for each possible cate
Given a certain training population we can ask LDB the value of a new person, according to the values
assumed by other parameters, eg worker, adult
Case study: [test_ldb_category] (docker/docker-ldb/ldb/sample/basic_category.py)
Case study: [test_ldb_category](docker/docker-ldb/ldb/sample/basic_category.py)
### Minimal example of inconsistency
......@@ -208,7 +208,7 @@ Case study: [test_ldb_105](docker/docker-ldb/ldb/sample/basic_distrib.py)
We verify that the logical constraints affect the result beyond just the distribution.
We then introduce a further constraint on the distribution and observe the changes on the result.
Constraint case study: [test_ldb_constraint] (docker / docker-ldb / ldb / sample / basic_distrib_constraint.py)
Constraint case study: [test_ldb_constraint](docker/docker-ldb/ldb/sample/basic_distrib_constraint.py)
### Simplified realistic example
......@@ -217,7 +217,7 @@ Assuming to start from values ​​not in enumerated format but free text value
you will need to apply additional translation steps to the source data.
The first step is the construction of categories for each column.
Case study: [hospital_10] (docker / docker-ldb / ldb / sample / fix_hospital_10.py)
Case study: [hospital_10](docker/docker-ldb/ldb/sample/fix_hospital_10.py)
* highlighted mapping difficulty, to get good learning the whole record should be mapped to real vectors,
for some types of field this activity is not immediate.
......@@ -227,7 +227,7 @@ for some types of field this activity is not immediate.
Using the entire data set consisting of
HoloClean hospital case study: [hospital_complete] (docker/docker-ldb/ldb / sample / fix_hospital.py)
HoloClean hospital case study: [hospital_complete](docker/docker-ldb/ldb/sample/fix_hospital.py)
* highlighted scalability difficulty, network training, application and effectiveness times remain largely optimizable.
......@@ -257,13 +257,13 @@ The experiments carried out indicate a positive applicability of the proposal, v
## References
problema data cleaning
data cleaning problem
* [Data cleaning is a machine learning problem that needs data systems help!](http://wp.sigmod.org/?p=2288)
* [Detecting Data Errors:Where are we and what needs to be done?](http://www.vldb.org/pvldb/vol9/p993-abedjan.pdf)
* [Ganti and Sarma - 2013 - Data Cleaning A Practical Perspective](https://www.amazon.it/Data-Cleaning-Perspective-Management-2013-09-01/dp/B01JXT5HYW?SubscriptionId=AKIAILSHYYTFIVPWUY6Q&tag=duc01-21&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B01JXT5HYW)
proposta data cleaning
data cleaning proposal
* [Trends in Cleaning Relational Data: Cosistency and Deduplications, Foundations and Trends in Databases 2015](https://cs.uwaterloo.ca/~ilyas/papers/IlyasFnTDB2015.pdf)
* [A formal framework for probabilistic unclean databases, ICDT, 2019](https://arxiv.org/pdf/1801.06750.pdf)
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment