Case study: [hospital_10](docker/docker-ldb/ldb/sample/fix_hospital_10.py)
* highlighted mapping difficulty, to get good learning the whole record should be mapped to real vectors,
for some types of field this activity is not immediate.
...
...
@@ -227,7 +227,7 @@ for some types of field this activity is not immediate.
Using the entire data set consisting of
HoloClean hospital case study: [hospital_complete](docker/docker-ldb/ldb / sample / fix_hospital.py)
HoloClean hospital case study: [hospital_complete](docker/docker-ldb/ldb/sample/fix_hospital.py)
* highlighted scalability difficulty, network training, application and effectiveness times remain largely optimizable.
...
...
@@ -257,13 +257,13 @@ The experiments carried out indicate a positive applicability of the proposal, v
## References
problema data cleaning
data cleaning problem
*[Data cleaning is a machine learning problem that needs data systems help!](http://wp.sigmod.org/?p=2288)
*[Detecting Data Errors:Where are we and what needs to be done?](http://www.vldb.org/pvldb/vol9/p993-abedjan.pdf)
*[Ganti and Sarma - 2013 - Data Cleaning A Practical Perspective](https://www.amazon.it/Data-Cleaning-Perspective-Management-2013-09-01/dp/B01JXT5HYW?SubscriptionId=AKIAILSHYYTFIVPWUY6Q&tag=duc01-21&linkCode=xm2&camp=2025&creative=165953&creativeASIN=B01JXT5HYW)
proposta data cleaning
data cleaning proposal
*[Trends in Cleaning Relational Data: Cosistency and Deduplications, Foundations and Trends in Databases 2015](https://cs.uwaterloo.ca/~ilyas/papers/IlyasFnTDB2015.pdf)
*[A formal framework for probabilistic unclean databases, ICDT, 2019](https://arxiv.org/pdf/1801.06750.pdf)