Multi-collinearity looks more complicated than it is ! |
Multi-collinearity. It is a big word – and a big mystery to many students of statistics and even practitioners – just like the word, heteroscedasticity!
The existence of multi-collinearity actually makes the world a simpler place in a practical sense.
Are your clients
currently enthused by Balanced Score Cards, Brand Metrics, Net Promoter Score and various other
tools that consist of many apparently independent measures for assessing the
health of a company and/or brand? Well,
stay tuned because it does not have to be that hard, as multi-collinearity will
show!
Multi-collinearity is simply the problem of two predictor
variables being correlated with one another such that the contribution of each
to the criterion is difficult to tease apart.
Imagine trying to predict purchase intentions, and we measure both
‘price’ and ‘value.’ Clearly both are
useful for predicting purchase intentions, but the two are also very likely to
be correlated to one another. This means
that once we know one, the other does not add much to our prediction.
Okay,
we understand the problem, but do we understand how often we encounter this
situation? And how often we may be misrepresenting
the results to our clients as a consequence of many correlations between the
predictor variables we report to our clients?
If you are using any kind of multi-attribute rating models (e.g.,
Vroom’s expectancy-valence model, Fishbein & Ajzen’s original
attitude-model, Gale’s Customer Value Analysis model, etc.), then you are likely
encountering this problem. These are the
models where you measure how customers rate various attributes of the brand,
and use these ratings to determine what are the ‘drivers’ of brand purchase.
Typically,
ratings of any brand on these attributes are highly correlated. For instance, if you chose to assess ratings
of ‘price’ and ‘value’ as two separate attributes, they will typically be
highly (negatively) correlated. In a
multiple regression, the result is that one will contribute significantly to the
regression, and the other one, because it is highly correlated to the first, will
not.
‘Aha,’
you say, ‘but I take explicit measures of the importance ratings.’ Yes, well unfortunately this does not solve
the problem. As most of us know,
respondents will typically tell us that all attributes are pretty
important. You can play games with constant-sum
scales that helps differentiate importance, but you are still not dealing with
the problem of what I might call non-statistical multi-collinearity. The problem is that if you ask a respondent
how important is ‘value’ and how important is ‘price’, they will probably give
a fairly equal importance rating to both.
Why wouldn’t they – they are really much the same!
What
is the solution? One suggestion is to retain just one of the multiple correlated items. This is certainly one solutionn – and links to a tangential issue about better quality drafting of questions. If we can anticipate ahead of time that two
attributes are going to be highly correlated, we can consider measuring just
one or the other.
However,
I am also a great believer in combining separate items collected on a
questionnaire as they provide a more stable (reliable) measure than using a
single item. That is, I measure multiple
attributes, even if they are likely to be correlated. Then, I conduct an examination of the
intercorrelations of the various attributes to see if I can simply combine two
or more items into one scale. If I want
to be really sophisticated, I could conduct a factor analysis for guiding the
combination of items. This allows for a sophisticated weighting of each
variable in the final ‘scale.’ However, I
generally find that clients (and analysts) find simple, averaged scales much
easier to interpret than factor scores.
However,
one rather disturbing result that I have found in examining these intercorrelations
among attribute ratings is many of the ratings are correlated with many of the
others! Even among sophisticated
respondents such as doctors, I find that the ratings they give to a drug in
terms of potency, efficacy, side-effect profile, drug-interactions, cost and
value are all likely to be correlated. More
broadly, I find that on many research projects, many of the intangible
qualities of the brand that we might measure (brand awareness, attribute
ratings, overall evaluations, satisfaction, usage, etc.) are all highly
correlated.
Many
researchers appear to be unaware of or unwilling to acknowledge such
correlations, and will happily make recommendations to tweak a particular quality
of the brand in order to improve overall image, satisfaction, purchase
intentions, etc. However, if all these
attributes are so highly correlated, advice to tweak one or the other is at
best rather meaningless and at worst, rather misleading.
However,
to offset the bad news, there is some good news. The good news is that the intercorrelations
between the many predictor variables means that we do not need to consider a
screed of so-called independent measures to assess the health of a brand. Some clients have had me explore various brand
metrics, and what I find is that often-times, we need only look at relatively
few numbers rather than many to assess the health of our brand! Why?
Because of multi-collinearity. The
independent measures are often so highly correlated that they can all be
combined into one or at least relatively few scales which capture most of the
important intangibles.
For
instance, in research we have conducted in both pharmaceutical and agricultural
domains, we have found that we can reduce many of the measures of the
intangibles (such as customer attribute ratings of the brand among other things)
down to perhaps three dimensions which operate as very strong predictors of
brand purchase.
Of
course, you might like to know what those dimensions are, right? Unfortunately, that would be telling! Nevertheless, I have given you the key to
simplifying the intangible assets of the brand.
You can work it out.
And
as to heteroscedasticity, I will leave that to another day.
Yes, that looks good. My one point would be that just because the metrics (er ROI) don't move doesn't mean things are bad. Might mean you're already doing well. Good metrics are often a bit "sticky". Bad metrics move lots but don't say anything useful.
ReplyDelete