Posted on 24 September 2018, updated on 20 November 2019
This week we are releasing our new Claims Test tool, which can help you test different product claims and messages. It may be a new tool, but the methodology behind it is based on a proven choice-based technique and was refined through multiple projects for FMCG brands. In this post, we detail how the method works and why we prefer it over MaxDiff or an array of rating scale questions. Specifically we look into:
The motivation behind the Claims Test tool is not only to make it easier for you to test multiple potential product claims, but also to:
The quest to find an optimal versatile questionnaire flow led us to the following solution:
MaxDiff will work fine in many cases, though in our experience MaxDiff has a few shortcomings that led us to developing the Claims Test:
The use of choice-based questions helped us build a simple and elegant respondent interface for optimal respondent experience and use the same robust quality checks to disqualify low-quality respondents as for conjoint analysis.
Another benefit of a choice-based approach is the presence of a “None of the above” option. Respondents can “opt-out” when none of the claims appeals to them. We believe this is very important, especially if the true aim is to find out which claim would trigger purchase.
But will results differ from MaxDiff? Not greatly. Across several tests we saw differences being within margin of error (and not differing substantially by method of calculating MaxDiff scores). Below is an example chart comparing preference scores for claims obtained through MaxDiff and through our methodology on a trial study on yoghurts (N=150, R2=0.92).
What does differ is the experience of respondents. Median length of interview: 5.4 minutes vs. 6.5 min on an equivalent test with MaxDiff (after removing speeders). We heard great feedback from respondents, and also observed higher survey satisfaction scores for the new Claims Test (4.4 out of 5 stars vs. 3.8 stars for the MaxDiff equivalent).
Claims Test scores are calculated using an analysis procedure called the Hierarchical Bayesian multinomial logit model. Preference scores are scale-less values assigned to claims to represent relative preference from each other. It is the same scale as partworth utilities in conjoint analysis.
One of the supposed benefits of MaxDiff is its ability to discriminate between items well. That is not disputed, but what we questioned was what it discriminates on: MaxDiff results (to simplify) add counts of “best” picks and subtract counts of “worst” picks for each claim. But when it comes to developing a winning product, the count of “worst” picks is much less relevant than count of “best” picks.
So we wanted a more relevant way to discriminate between claims. This led us to employ an adaptive experimental design algorithm (modified Thompson sampling). The algorithm listens to what claims respondents prefer and adapts the next set of questions (shown to other respondents) with the aim to clarify preferences around top claims (see Cavagnaro et al. 2009 for a deeper discussion).
The mathematics of this may be complicated, but outputs are not. Repeated tests of the same materials (with and without our adaptive design algorithm) show the ranking of top claims to be the same, except for greater certainty around top claims with the adaptive design algorithm. It is especially useful when testing fifty claims or more. To make use of this feature, select “Identify and zoom in on top claims” when choosing the goal of your experiment:
A pure MaxDiff does not answer another key question: Are any of the tested claims good enough? Both MaxDiff and choice-based questions will show a ranking of claims (relative to one another), but they will not show whether any of the claims hit the mark.
A potential solution is to supplement the study with Likert-scale questions. It’s a great approach, but we found time and again that rating grids are tiresome for respondents and hard to compare across countries because of acquiescence bias (the degree of which varies by country).
To combat this issue, we employ a well-known but underutilised technique: a dual negative-positive scale. For example, to diagnose a claim on relevance to consumer, we ask the question in two ways (each respondent will see only one of these questions):
With the help of a little arithmetic, scores are summarised in a digestible colour-coded table, which shows if a claim passed diagnostic questions and can qualify as a good one. Again, results are comparable with an equivalent study using rating scales (N=200, R2=0.65).
This post lists the various methodological advantages of the new Claims Test. But the key advantage is in ease of use, which you can now enjoy.