README.md

# Expected and Experienced Utility of Points of Interest in Tourism Recommender Systems
Here you can find the dataset created and used within the context of the study published in the paper entitled **_Expected and Experienced Utility of Points of Interest in Tourism Recommender Systems_** that has been published at the __31st ACM Conference on User Modeling, Adaptation and Personalization (UMAP '23)__.

## How-to cite this work
```
@inproceedings{expected_experieced_utiltiy_trs,
author = {Katharina Hofschen and David Massimo and Francesco Ricci},
title = {Expected and Experienced Utility of Points of Interest in Tourism Recommender Systems},
year = {2023},
isbn = {978-1-4503-9891-6/23/06},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
doi = {10.1145/3563359.3597405},
pages = {},
numpages = {8},
location = {Limassol, Cyprus},
booktitle = {Adjunct Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization (UMAP '23)}}
```

# Dataset

The data is split into 3 files to allow for easier access and logical grouping.    

## The POI data - `poi_data.csv`
This file contains an edited version of the information on the 450 POIs that were selected for the experiment that been collected from the South Tyrol OpenDataHub (https://databrowser.opendatahub.com/).   
    
* poi_id	: The unique identifier for each POI
* title_en	: The english title of the POI
* city_it_de	: City the POI is located in or closest to
* types, types_filter, poi_types, subtypes: metadata on categories of POIs
* personality	: 7 factors for the personality types of each POI

Additional data, e.g., images and some details of the POIs, can be collected by querying the OpenDataHub API.

## The User data - `user_data.csv`
This file contains various summaries about how users engaged with the survey used to collect expected and experienced utilities.    

* user_id 		: The unique identifier for each user
* preferred_lang	: The language selected for the survey (italian, deutsch, english)
* region_visited	: Has the user previously been to the region of South Tyrol? (yes, no)
* region_length_stay	: Level of experience the user has with the region ("<1": not very much, "1+": visits frequently, "live": lives in the region)
* minutes_survey	: Minutes the user took from beginning to end of the survey
* num_presel_vis	: Number of POIs marked as visited in the preselection step
* presel_vis_list	: List of POIs marked as visited in preselection
* personality_rec	: 3 POIs recommended based on computed personality type
* pers_rec_feedback	: User feedback whether these 3 recommendations are liked or not
* date_start		: Timestamp of survey start
* date_end		: Timestamp of survey end if available
* complete		: Survey completed (true, false)


## The User x Item data - `ratings.csv`
This file contains all ratings as expressed by users in expressing expected and experienced utilities.    

* user_id		: The unique identifier for each user
* poi_id		: The unique identifier for each Point of interest (POI)
* utility_type		: Indicates whether the rating represents an expected or experienced utility
* rating		: The rating that user x gave to item i between 1 and 5
* context		: If it is an expected utility: True means that the POI is familiar and false that it is unfamiliar; If it is an experienced utility: the remembered contextual factors are represented as ["solo", "summer"] etc.
* timestamp		: The moment the user completed the rating
* duration		: If possible to compute the time the user spent to give the rating calculated as the difference between the timestamp of this rating and the previous rating.