Second Evaluation round - what did we do and what did we learn?

The second year of the CPN project has come to an end and we just handed in our second evaluation report. As the name says, it was all about getting feedback on our latest version of the CPN prototype. But we also undertook a close inspection of some of the extra features we’ve been thinking about to improve recommendations. Here is what we learned:

Can you tell the difference?

Screenshot_2019-08-16-12-50-31-026_gr.blockachain.cpn.png

Of course we wanted to know from our test users whether they liked the CPN personalisation algorithm - by asking them how informed they felt, after using our prototype. But how do you get a proper answer for something so abstract? We tried it with a dual approach, by looking at the numbers as well through qualitative surveys.

So VRT, DIAS and DW, the media partners in the project, reached out to their audiences to form test groups (one per language offer). The users were invited to download the (Android) app onto their mobile phones and follow the news through CPN for a test period of four weeks.

The trick however was to have users experience both a personalised and a non-personalised offer throughout the whole test. So the users were split into two halves: one group (in each language) started out with just a random selection of articles, while the other received the real recommendations. The usage of the app was monitored and compared, and users were given a short feedback questionnaire at the end of every week. We switched the groups weekly - and checked whether users could tell the difference between the two versions and which one they liked better.

What did we learn?

Getting the right amount of users for and keeping them motivated throughout such a test is important - and somewhat difficult. We learned this the hard way. While we did have great responses from users in the first year on our surveys and also through individual tests e.g. with VRT’s experiment to integrate the CPN recommender in its own app], engagement in this evaluation round was much lower.

How often did users click on articles during the evaluation?

How often did users click on articles during the evaluation?

The core group that did the test from beginning through end gave us some indications as to the strengths and weaknesses of the CPN app. But the actual user click evaluation, while having a positive tendency, ended a bit inconclusive, statistically. This means people did appreciate the recommended results over the random results, felt more informed and gave us positive feedback on the application - but overall, the numbers were so low that statistically the differences were too slim to clearly say the CPN app fully convinced the testers.

How informed did users feel? More on the personalised side.

How informed did users feel? More on the personalised side.

Of course this isn’t what we wanted to hear. A clear “yes” or even a clear “no” would have been much more instructive. But we are also not completely lost. The results from the three language groups all had a tendency towards acceptance of the application and some individual comments clearly pointed out where our work convinced (“results did fit my profile”) and where we clearly still need to invest time and work (“I found it confusing to always get new results”).

How do we move on?

After analysing the evaluation process in more detail, we identified three main issues that we have to address in the third pilot next year:

  1. Clearly, we need a larger number of test-users - and one approach is to offer people the news in a known environment. People trust news from VRT, DIAS and DW as they know the brands. CPN however is completely new to them. So we need to make it more obvious where the news are coming from, by integrating the CPN recommendation in native apps and/or make the sources in the CPN app more obvious to the users.

  2. Testing an application remotely over four weeks and keeping people’s motivation high is a challenge when you’re not a well known app company. So we are thinking about improving our approach (and our app), so it will be more attractive for people to help us.

  3. The quality of the recommendation system is a key element in testing it’s acceptance. So we have started several rounds of quality checks to ensure we run the system on a good level. Furthermore we are evaluating which extra features could make the recommendations even better and more fitting for users to include them into the third version of the prototype.

The whole consortium is now working on those issues in preparation for the next year, both to run a third, more successful evaluation round with more tangible results. But also in order to have the system ready to test it with other media companies that are interested in using such a system as CPN. At the beginning of 2020, we will bring it all together and put it to the test - hopefully with a clearer picture afterwards.

About those extra features

Recommendation is based on prioritizing interests and matching them with the available content. So far, so easy. But this concept gets more complicated as you go along. How do you deal with breaking news for example? Are they subject to your interests, hence should only be shown when the topic is high enough up your personal profile list? Or should they always overwrite the algorithm, as probably many newsrooms and enemies of the filter bubble think? We tried to find an approach here through expert interviews and experiments, but have yet to put it to the (user) test.

Possible approach to integrate breaking news in a personalised news offer

Possible approach to integrate breaking news in a personalised news offer

Another big issue with recommendations is said ‘filter bubble’ as defined by Eli Pariser in 2011 - which is an even more complex topic. We worked our way through scientific papers and articles and started drafting several possible approaches to mitigate possible effects like this. The most promising approach we see as of now for CPN is to empower the users in getting a better understanding of what they read and what they don’t read - to eventually judge themselves, whether they are moving themselves into a dangerous direction. But this as well needs further testing and we are aiming to have a better understanding in the third pilot.

Interested in more?

Has this article sparked your interest in our evaluation? If you want to read more, feel free to access the full report, available on the website here. For any questions regarding the results or an interest in testing the application yourself, please go here to contact us.