60

Why so much “science” used in design is bullshit: Android, Losada and Frankfurt....

 4 years ago
source link: http://mjparnell.com/bullshit_science_ux_design/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Why so much “science” used in design is bullshit: Android, Losada and Frankfurt.

It looks like Google’s Android UX team used a now-debunked research paper to guide much of their UX work. Does this mean the Android interface now needs to change? Probably not, and that might be worse. I’ll look at what this means and how we can be more careful when using research to inform our work

There are few things as pleasurable as reading a truly acerbic academic paper, but Brown, Sokal and Friedman (2013) is just that. In it, they demolish (and I do mean demolish) a Fredrickson and Losada (2005) paper titled “Positive affect and the complex dynamics of human flourishing” as well as a number of previous papers by Losada. It’s important to note that this wasn’t just any paper, but a hugely influential foundational text for the emerging science of positive psychology, cited over 1000 times and much beloved by self-help books everywhere.

Fredrickson and Losada claimed to have empirically demonstrated a mathematical model that predicted the minimum ratio of positive-to-negative emotions humans need over time to “flourish” to be 2.9013. This “positivity ratio” was held as a general theory to explain our emotional needs across time, cultures and contexts. Fans of Dr. Ian Malcolm would be pleased to see terms such as “non-linear dynamic systems”, “chaos theory” and the famous Lorenz attractor being bandied around. Indeed, it was claimed that the positivity ratio could be represented by the Lorenz attractor itself. The authors explain this as:

I made a graph so it must be true!

Lorenz attractor graph, taken from Losada (1999)

The large, dark-gray structure presents the model trajectory derived from the empirical time series of the flourishing, high-performance teams. It reflects the highest positivity ratio (observed ratio = 5.6) and the broadest range of inquiry and advocacy. It is also the most generative and flexible. Mathematically, its trajectory in phase space never duplicates itself, representing maximal degrees of freedom and behavioral flexibility. In the terms of physics and mathematics, this is a chaotic attractor.”

Brown, Sokal and Friedman (2013) showed this to be, as my stepfather would say, a load of old cobblers. Nick Brown was taking a postgraduate course in applied positive psychology (a great profile of his backstory – and that of the paper – can be found here) and found the pedantically accurate ratio of “2.9013” in a field as fuzzy as social psychology to be deeply suspect. Teaming up with the famous Alan Sokal, they discovered that the entire mathematical foundation of the paper – Losada’s (1999) initial work into the dynamics of business teams that first derived the 2.9013 ratio – to be flawed. And not merely flawed – massively, howlingly riddled with sloppy errors and what can also be described as pseudo-mathematics. Arbitrary values were plugged into equations in order to give a pleasing result in the form of the Lorenz attractor. Other methodological problems existed in the empirical work, such as failing to adequately explain methods or the chosen analysis methods. Brown, Sokal and Friedman write:

Losada’s (1999) article followed few of the conventions that would normally be expected from a piece of scholarship published in a scientific journal. It presented very little primary data, and the experimental design, construction of models, and interpretation of results were made with little or no justification. Indeed, some aspects of the methods and results of the Capture Lab experiments were described for the first time only in subsequent articles or books, while many other crucial aspects remain obscure to this day.”

Fredrickson and Losada (2005) built upon the 1999 paper, testing these assumptions with a larger sample of students. Not only were the theoretical foundations of this paper (from Losada, 1999) shaky, but this paper continued the trend of plugging in arbitrary values into formulae in order to have the data fit the Lorenz attractor. As Brown, Sokal and Friedman note:

One can only marvel at the astonishing coincidence that human emotions should turn out to be governed by exactly the same set of equations that were derived in a celebrated article several decades ago as a deliberately simplified model of convection in fluids, and whose solutions happen to have visually appealing properties.”

The veneer of mathematics and complexity science – a culturally-powerful yet poorly-understood science – allowed many people to believe a very improbable thing: that a simple model from fluid dynamics could explain the influence of love, hate, anger, sadness, grief, joy, culture, time, geography, war, famine, birth, etc. etc. on human behaviour. It also suggests a degree of gullibility in the academy: researchers were willing to accept the claims of the papers because they didn’t understand the maths. Worse, of course, is the implication that people simply don’t critically read the papers that they cite.

Now this is a blog about interaction design and UX, so how does his war in academic heaven relate to our quaint little field? The paper was published last year, and I only just had a chance to thoroughly read it. At the time of its release though, my first thought was back to this interesting little article about the Android design team’s process and philosophy, from a talk they gave at Google I/O. From their slides:

Positivity ratio slide

Showing the ratio as a seesaw

A made a pretty diagram so it must be true

Look familiar? Looks like they took the discredited claims of Losada and Fredrickson uncritically. Note also that whilst the Losada (1999) paper was cited, they cite Fredrickson’s popular book, not the 2005 paper. The Google UXers (in particular, Helena Roeber, the head of the Android UX research team) discuss these findings as a model for all human emotional experience. More worryingly, this is then (allegedly) applied to design; in Roeber’s own words:

…we need three positive emotions to lift us for every negative emotion that drags us down. Does this apply to design? Absolutely.”

Jars of marbles are then used as an (apparently metaphorical) device to capture this: in the Android interface, for every bad emotion they cause they look to create enough positive emotions to counter-balance them and so allow their users to flourish. This is apparently achieved through the typical interaction design mix of delighters, accelerators, thoughtful design, etc. The Android team proudly and unequivocally claim to have been inspired in their design work by a paper that is, as it now transpires, total cobblers.

It’s very tempting to now decry Android’s UX as having been built on a house of sand, and drolly add “that’s why Android has a terrible user experience!”. Should we expect massive redesigns to the Android interface? A retraction of this talk? A mea culpa? After all, we were told that the positivity ratio “applies to design”.

The trouble is, Android doesn’t have a ropey user experience any more. The incremental improvements, from Project Butter to KitKat, mean stock Android is now smooth, intuitive and often delightful (the less said about TouchWiz the better, though). Indeed, the same presentation shows off some really great work the Android team have done. And yet, the apparent foundation of the design philosophy underlying this excellent user experience has been shown to be nonsense. How can this be? I don’t really buy the evasion that they were using the ratio as a metaphor; they clearly discussed the research as if was true and had influenced their design work, even if the marbles themselves were a metaphor.

The simple truth is that in no real way was the positivity ratio ever being applied to the design of Android. Nothing therefore needs to be changed. How could it have been applied to any design? Even if it were true, the sort of inter-human interactions it could have described (ignoring the degree to which it was being over-generalised by its authors) can only be applied metaphorically to our interactions with an interface. Indeed, it would mean that the designers would have to add a bad experience in for every three good ones in order to get the positivity ratio right!

Second, and more importantly, the positivity ratio was being invoked to truss up the same old design truisms about not annoying users, being mindful of their needs, delighting them, and so forth. The truth or falsity of it was irrelevant – it’s unlikely the Android design team was really looking to get a good 3:1 ratio of positivity, and at no stage was the number 2.9013 utilised. The team sought to find more ways to make the user happy and reduce the things that annoy users, and bully for them. As the talk continues, it’s clear that the positivity ratio really had little or no impact on their design decisions, whereas their design principles did. Indeed, Android’s Design Principles are admirable if unsurprising.

Why then even feel the need to allude to the positivity ratio? It could be that the Android UX team lacked the scientific or mathematical wherewithal to detect it was wrong – but even if it had been true, it didn’t really add anything to their design work, so why mention it at all? It speaks of the status of “science” in public discourse, and the increasing need to back up (often good) design decisions with evidence. That the evidence may be nonsense is irrelevant; no one will check your working, so the only important thing is the sheen of competence – and truthiness – given to your work.

It could be that the Android team cited the positivity ratio to give their design work the sheen of respectability that society affords “science”: as long as there is a published paper to back it up, it must be true. Never mind that a huge number of published articles are probably false, nor worry about actually reading the original research. It also helps to give a memorable spin to hoary old design messages that many of us will have heard before.

The use of “science” as a superficial lacquer designed to impress, with scant regard for truth, has only one description: bullshit. I don’t bring up this term simply to pedantically berate the Android team (who do some great work) but to consider a more troubling trend.

In his classic essay On Bullshit, Harry Frankfurt (2005) describes bullshit as differing to lies primarily by virtue of the the speaker’s intent and the relationship of what was said to the truth. Liars respect truth values, in the sense that for one to lie they must first acknowledge the truth about which they are trying to deceive us. The bullshitter, on the other hand:

…may not deceive us, or even intend to do so, either about the facts or about what he takes the facts to be. What he does necessarily attempt to deceive us about is his enterprise. His only indispensably distinctive characteristic is that in a certain way he misrepresents what he is up to.”

moreover,

For the bullshitter, however, all these bets are off: he is neither on the side of the true nor on the side of the false. His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.”

Through this definition, it should be clear why the Android team’s use of the positivity ratio could be considered bullshit. It wasn’t out of an intent to deceive about the ratio itself, but out of a desire to make their work appear as something that it was not. Even if they did actually add marbles to jars, it would just be redundantly confirming design decisions they would make anyway. Whether the positivity ratio was true or not was therefore irrelevant to their purpose, which was to make their design decisions appear more grounded in evidence. Precisely the same design decisions would have been made whether or not the ratio was cited. Mentioning it was therefore post-hoc bullshit.

Why then, do UX and interaction designers so often feel the need to tart up their work with invidious “science” bullshit? It certainly isn’t just the Android team that has done so: one doesn’t have to look long to find examples of bullshit “science” in UX. Neuroscience, as you can see, is particularly abused, but that’ll be a topic for another post.

Everyone bullshits at some point, but there seems to me to be a particular desire to show design decisions as being based in some empirical facts. In some ways this is admirable, and forces designers to justify their design decisions. Analytics have certainly been one cause of this. However, this also means that design decisions that have no theoretical justification now need one. As in the examples above, this is usually occurs when designers follow best practices and wheel out insights from cognitive psychology or “neuroscience” in order to justify them.

Such appeals to a scientific basis for design decisions are window dressing. The designer is neither on the side of the true nor of the false, rather they seek to use “science” to present their work in a certain light. The important thing is that the audience for this bullshit may be as much within an organisation as it is for the public or design community.

The nature of the grift then is a system that encourages interaction and UX designers to provide empirical or theoretical basis for their decisions but often fails to examine the quality of that empirical or theoretical basis. It could just be bullshit, but as long as it is internally consistent bullshit then the “backs up designs with facts” box can be ticked. This is also true of badly-designed user tests (of which I’ve witnessed a few in my time) that provide little useful information about the design beyond internally consistent bullshit that ticks the “passed user testing” box.

You might now suggest that I’m being awfully presumptuous about the Android team’s goals and attitudes – perhaps they aren’t bullshitters and perhaps they really did feel the positivity ratio was guiding their work. Even if they believed the positivity ratio was helping their work, I’d argue it probably wasn’t. As I noted, we’re encouraged to provide evidential support to stakeholders for our UX design decisions without necessarily being encouraged to check the validity of this empirical basis. I’m not sure that UX design decisions are typically made as a response to research or according to some system (involving marbles or otherwise). Rather, when asked why we made a design choice we first fool ourselves by spinning a plausible narrative for them. Often we don’t know why we designed something as we did, since either the design knowledge we have is subconscious and implicit or there was no real reason why. Design principles may work because they provide easily available rationalisations for design decisions, and this rationalisation occurs post-hoc.

Our minds seek this sort of narrative for our actions all of the time. Think, for instance, of why you like your favourite food. You may eventually find a few tenuous reasons – “I like salty things”, “the mix of cardamon and cloves is just right” – but initially it’s very hard to say why. Moreover, you chose those reasons out of what was readily available to your consciousness, namely, superficial properties of the foodstuff. The most important things that may influence your gustatory preferences – such as texture, aroma, what you associate the food with, etc. are not readily available to consciousness nor easy to verbalise. Your mind creates a story to fill the resulting void with explanations that might have nothing to do with the real reasons for your preferences. As David McRaney (2012) puts this:

Believing you understand your motivations and desires, your likes and dislikes, is called the introspection illusion. You believe you know yourself and why you are the way you are. You believe this knowledge tells you how you will act in all future situations. Research shows otherwise”

McRaney references the work of the psychologist Timothy Wilson, who along with a number of collaborators examined this effect. He found that introspection can result in temporary attitude change, and lead us to make choices that are less satisfying than those made without introspection. Wilson, Lisle and Kraft (1990) argue that we view rational cognitions as the most likely cause of our attitudes, as opposed to the myriad of minor factors that might influence our thoughts. They write:

For example, when explaining why we like various political candidates, we are more likely to call upon such factors as their stance on the issues than such seemingly implausible things as the number of times we have seen their ads on television or whether their party affiliation is the same as our parents’, even though these latter factors have been shown to influence people’s attitudes.”

Wilson et al (1989) give the example of a study they conducted into choices of posters. Student participants evaluated five posters, one of which they were allowed to take home and keep. Half of the participants were just asked to take a poster; the other half were asked to reflect on why they made their choice. Those asked to just take a poster typically chose artistic posters; those asked to reflect chose more “pop” style posters (the author’s words, not mine), such as one with a cat and the caption “Gimme a break”. As you might suspect, those that had to reflect were significantly less satisfied with their choices than those who did not when the researchers followed up six months later. Those reasons for choosing a poster that were most available to the participants were not the subjective aesthetic factors behind their attitudes towards the posters, but rational cognitions that did not match their actual preferences [1]. Pronin (2009) explains this as follows:

The former group placed too much weight on the introspections that they generated at that moment in time, and thus lost sight of their more enduring attitudes.”

The really important result though was that for subjects that were knowledgeable about art, introspection had little impact on their choices and subsequent satisfaction. If we extend these results to UX, it would at least mean that UX professionals can effectively introspect about why they prefer certain design features, but that introspecting may have little impact on either the decisions that we make or on our satisfaction with those decisions. I’d argue further, and suggest that many times UX professionals make judgements where they behave like the participants that were not knowledgeable about art, especially in situations where they do not know much about their users.

I’m not arguing that all of our introspection is suspect. User experience work should require we think about user needs a priori, and make design decisions from there. Where this happens, thinking deeply about what to do is just what we should be doing. It’s the times when we have to make a design decision where none of the available options can be easily be linked back to what we know about users that we should be wary of our introspections and the rationalisations we try to spin. What we don’t want to do is make decisions based on what is easier to rationalise, as opposed to what we feel is best.

For instance, placing the main navigation on the left “because people read from left to right” (as I am often told) is an example of introspection leading us astray. There is in fact little evidence for any difference in usability between left and right hand navigations on websites (Faulkner and Hayton (2011), Kalbach and Bosenick (2006), though these studies were small and may be too under-powered to detect a small effect). The real reason we stick to a left-hand nav is because it is a convention, but alluring rationalisations such as “because people read from left to right” are more available to us than this fact [2]. In doing so, our designs may be worse off and we may miss opportunities to innovate.

The best case would be for us to always have good empirical evidence to hand, for us to be engaged with that research in a critical way and for that evidence to truly drive our work. Where that isn’t possible, I think it’s better for us to accept that we don’t always have good reasons for our design decisions and that much of our work is intuitive. We may otherwise create false narratives that seem to explain our preferences and choices or, worse, lead us to make poorer design decisions.

This seems to me to be how the Android team may have felt the positivity ratio informed their work – it helped them to provide a narrative for design decisions that they would have made anyway, and this is why the Android interface doesn’t need to change despite the positivity ratio being debunked. Everyone falls prey to biases such as the introspection illusion (and indeed to bullshitting) at times (indeed, the UX community certainly seems to be aware of it), and science can provide a particularly alluring set of post-hoc rationalisations. It seems we’re in danger of fooling others or of fooling ourselves when we appeal to science. In my next post I’ll look at how we can avoid these pitfalls.

[1] Since my central argument is that we often fail to judge the quality of research, it’s first worth noting that the poster study comes from Wilson, Lilse and Schooler (1988c) – an unpublished manuscript, which always makes it hard to review the methods and statistics used. Still, subsequent studies – such as Wilson and Schooler (1991) – found a similar effect.

[2] Note that deciding to move the navigation to the right and then finding research to support your decision – one that you would still make without that research – is equally intellectually disingenuous.

References

Brown, N. J. L., Sokal, A. D., & Friedman, H. L. (2013). The Complex Dynamics of Wishful Thinking: The Critical Positivity Ratio. American Psychologist. Advance online publication. doi: 10.1037/a0032850

Frankfurt, H. G. (2005). On bullshit. Princeton: Princeton University Press.

Fredrickson, B.L., & Losada, M.F. (2005). Positive affect and the complex dynamics of human flourishing. American Psychologist, 60, 678–686

Losada, M. (1999). The complex dynamics of high performance teams. Mathematical and Computer Modelling, 30(9–10), 179–192. doi: 10.1016/S0895-7177(99)00189-2

McRaney, David (2011). You Are Not so Smart: Why You Have Too Many Friends on Facebook, Why Your Memory Is Mostly Fiction, and 46 Other Ways You’re Deluding Yourself. New York: Gotham /Penguin Group

Pronin, E. (2009). The introspection illusion. In M. P. Zanna (Ed.), Advances in Experimental Social Psychology (Vol. 41, pp. 1-67).

Wilson T.D., Dunn D, Kraft D, Lisle D (1989). Introspection, attitude change, and attitude–behavior consistency: the disruptive effects of explaining why we feel the way we do. Advances in Experimental Social Psychology, 19:123–205.

Wilson, T.D., Lisle, D. J., & Kraft, D. (1990). Effects of self-reflection on attitudes and consumer decisions. Advances in Consumer Research, 17, 79–85.

Wilson, T. D., & Schooler, J. W. (1991). Thinking too much: Introspection can reduce the quality of preferences and decisions. Journal of Personality and Social Psychology, 60, 181-192


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK