D4.1 Initially available datasets and usage guidelines

This deliverable is produced in the context of Task 4.1, which aims to organize a meaningful and productive pilot. It collects, integrates and manages the different data sources.

D4.1 will allow us to use machine learning algorithms and tools while implementing the planned personalisation pilots. These pilots serve as self-contained proof-of-concept field trials. Promising pilots may then be presented to the publishing partners as a basis for efforts related to the ongoing push for personalised content and experiences prevalent in the publishing industry today.

The following data will be made available by the appropriate partners (DW, VRT and DIAS) at the very beginning of the project:

• Data structures of articles and coverage of article metadata

• Summary process diagrams on data collection and relevant management processes

• Content consumer profiles and their relevant contextual data

However, to guarantee a smooth project process, data will have to be collected by each broadcasting or publishing partner. We foresee a large data collection exercise for the following data: content (e.g. news articles), user context (e.g. location, device, time of day) and user behaviour.

The broadcasting and publishing partners in the consortium have varying levels of data collection capabilities. Partners will be improving their data collecting and management competences during the lifetime of the project. Thus, we hope to realize a seamless transfer of data from the live production environments of broadcasters and publishers to the CPN platform.