All articles

The must-have for future systems: data quality

Artificial intelligence – as any form of intelligence – is only capable of being as good as the data on which its answers and recommendations are based on. Within the topic of Machine Learning, data quality is more important as well as more problematic than ever before: Bad data can no longer be simply deleted or switched up as it is the case with traditional data analysis. In future, the relevant volume of data will be organized in Big Data if they want to remain relevant to Alexa, chatbots and co. Due to the big amount of data, the origin of (possibly false) data can rarely be identified and a subsequent adjustment of content (often derived from a third party) is rather difficult.

 

The classical tourism marketing as we know it will change quite rapidly in the future as the habits of guests will change due to new technologies. To generate long-term success, destinations need to anticipate these new habits and focus on their needs. In order to provide future devices (intelligent assistants, language assistants and chatbots) with the right information to answer questions correctly, a provision of high-quality content is indispensable. This causes a difficult change in data subscription compared to the traditional systems. Future, intelligent systems get their data from gigantic data containers that are filled up with good as well as bad content. As the name already implies, intelligent systems are capable of learning by themselves. They search for information from said containers with the help of programmed algorithms. The machines’ generated answers are obviously only as good as the provided data.

Based on: https://www.slideshare.net/mrjain/predictive-analytics-big-data-artificial-intelligence

 

Important criteria for high-quality data:

 

Accuracy: Are sources that produce data (e.g. information, content) reliable and renowned? Especially with touristic content, original sources are vital as it is far too easy for a third party to distort information or influence it through personal opinions. For a high quality of data it is important to directly involve touristic partners (e.g. hotels, museums, etc.). Therefore, data needs to be checked for accuracy by humans before it is made available for automated systems.

 

Validity/Currency: If data is time-related and thus only has a limited validity, there is a possibility of the content becoming obsolete. A constant monitoring or an ongoing actualization of all data is inevitable.

 

Completeness: Does the data contain all content that is important for their target group? Are information provided with correct keywords? A high quality of data can only be ensured, if all information are detailed and understandable. Touristic data in particular is only interesting for guests, if it provides detailed information (e.g. connection to public transports, opening hours of sights, important tour equipment).

 

Incomplete database

Complete database

 

 

 

 

 

 

 

Data quality is in no case a onetime measurement – It is a long-term process. Data needs to be constantly analyzed, corrected and monitored. But the permanent commitment guarantees a high quality of data: The content remains relevant and can be reused for future developments.