Is Experimental Philosophy Bad Science?

Wikipedia tells us that Experimental Philosophy (X-Phi) is:

an emerging field of philosophical inquiry that makes use of empirical data—often gathered through surveys which probe the intuitions of ordinary people—in order to inform research on philosophical questions. This use of empirical data is widely seen as opposed to a philosophical methodology that relies mainly on a priori justification, sometimes called "armchair" philosophy by experimental philosophers.

So what makes X-Phi experimental is the use of data rather than (presumably) data-less a priori reasoning. This is confusing. Even when employing 'pure reason', philosophers use data - if only the data of their senses, experience and consciousness. Would anyone deny that Descartes used data when he came up with the Cogito? That it was the data of his own experience doesn't make it less valid qua data.

So if the "armchair" philosopher is using data as well, what's the virtue of the X-Phi approach? It seems that they want to gather multiple data points (intuitions, perceptions) and specifically those of non-indoctrinated-into-philosophy regular shmoes. These can be used either to a) validate an armchair hypothesis or b) replace an armchair hypothesis with an 'experimental' one. The conceit is that consensus via multiple intuitions/perceptions has greater validity than a single POV and that 'untutored' or 'uneducated' intuitions/perceptions have more authenticity than armchair ones.

Let's look at this, starting with the approach. Does X-Phi seek to validate or supplant armchair conclusions? It seems clearly the latter. Validating armchair philosophical hypotheses is called studying and doing philosophy. When philosophers comment on whether a thinker's view accords with their own experience or intuition, they are in-/validating it. When a lecturer, teacher, professor, docent or quack like me brings such an idea to a classroom, coffee shop, blog or where ever non-philosophically trained people have access to it and asks whether it accords with their intuitions, she is validating the views of an armchair philosopher king against the yardstick of the great unwashed masses.

I don't see value in the approach of experimental philosophy unless it seeks to supplant armchair hypothesizing with 'crowdsourced' hypothesizing. Which commits them to prioritizing both quantity (many vs. one) and some notion of the authenticity of the common person's POV. Let us grant the latter: once one has gone down the rabbit hole, there is no coming back. Philosophical education and training estranges one from one's own authentic intuitions by drenching them in theoretical constructs (or some other such obfuscation). We want to pursue X-Phi because we want a volume of intuitions/perceptions, a statistically significant representation from the right population.

I'm not going to criticize X-Phi's approach to intuitions.  If you want that, check this out.  I want to look at its experimental approach.  You don't have to be a scientist to understand Design of Experiments (DoE). (We bust that bad boy out in the Six Sigma wild west of business too!) If you want to test the effect of a variable you set up an experiment to hold all things constant except that variable. You have the null hypothesis which assumes that there is no measurable difference between state A of the variable and state B. The experiment should be designed to isolate the variable in such a way that the same measure can be taken when state A obtains v. state B. If a statistically significant difference is indicated, you have a data point against the null hypothesis. Collect enough of those and you have a theory.

So let's take a look at a "celebrated finding" from X-Phi known as the "Knobe effect" for Joshua Knobe. I'll let you watch the video below (and check out this page) for the details but I encourage you to consider my abstracted formulation first.  I think the content of the study might have an effect on the outcome - above the formal structure of the experiment.  Participants are presented with two scenarios in which an agent T goes ahead with an action A knowing in advance that A will i) harm or ii) help X.  The intent of A is something else Y.  The question is:  in doing A for the purpose Y, if T knows A will harm X, does T harm X intentionally?  Conversely, if T knows that A will help X, does T help X intentionally?

So if that wasn't clear, let me present it this way:  Agent T takes action A to accomplish Y.  In the two scenarios:

  • #1:  T knows that A will harm X
  • #2:  T knows that A will help X

The question posed to participants is:  in #1 did T intentionally harm X and in case #2 did T intentionally help X?  The goal of the experiment is to get at people's intuitions around intentionality with respect to harm and help.  And the results of this experiment certainly suggest a difference (statistical analysis aside).  My question is whether this is a well constructed experiment in the fashion that we understand it.

First question is whether the same measure is being taken at the end of the study.  There are actually two separate questions being asked:  did T intentionally harm X and did T intentionally help X.  These are not the same measurement.  The experiment seeks to hold everything variable except the words "harm" and "help".  In this case, the null hypothesis would be that there is no difference in outcome when you use the word "harm" vs. the word "help" in the same sentence or situation.  This seems to me to be patently false from the get go which should call into question the null hypothesis and the entire experiment.

The second issue I see is that what is being measured is unclear.  Is it participant intuition about intention, foreknowledge, harm or help?  All of the above?  The third is the very, very heavily determined nature of the situation.  Again, I'll leave it to you to follow the details below, but I can imagine other experimental content set up in the same formal structure that would have different outcomes.  If true, what may be learnt from this is something more about how people feel about the specific agent than harm and help.

Suppose you structure the experiment this way.  T does A for Y.  In scenario #1, T knows that A will harm X and in scenario #2, T does not know that A will harm X.  You then ask the same question in both situations:  did T intentionally harm X?  Your null hypothesis is that knowing and not knowing about harm before taking action don't have any effect on 3rd party judgments of intention, which you are pretty sure will be falsifiable (based on your own intuitions).  If you find that people think knowing in advance is correlated with intention, then you have a reasonable claim for saying that foreknowledge is a prerequisite for ascribing intention.  If on the other hand you discover that people think T acts intentionally out of failure to investigate or willful ignorance, your null hypothesis is validated and you need to refine the experiment.

Variable selection in an experiment is critical.  This experiment is the equivalent of saying 'T ____ X' where  ____ could be just about any verb in the English language.  Because verbs have different meanings, substituting one for another isn't tweaking a variable, it's changing the experiment.  Which is not to say the Knobe effect (whatever it is) isn't interesting; it just isn't an experimental outcome that can be used to (in)validate a hypothesis.  It's just an interesting phenomenon.




  1. Profile photo of Wayne Schroeder says

    Don’t know if I’m missing something here, but the primary variable seems to be intent as measured by secondary variables of knowledge of a) will harm, b) will help, and with that knowledge the president knows of each outcome of a)/b), but does not care about the outcome of harm or help, and therefore is knowledgeable, responsible and thus intentionally disregards either harming or helping (in favor of profit). I do not see any intuition at play here, just the president’s statement of knowledge of not caring about harm or help and thus I don’t even see this as being an experiment so much as a true/false no-brainer. ???????

  2. dmf says

    might have something to do with the “myths” of so called common-sense, but the larger question is whether or not there are hard sciences of the social (not obviously whether or not there are people paid/trained/authorized to work under the rubric of social-scientists):

  3. says

    Seth, (say I in the hopes of Mr Paskin reading this):

    I’m not sure I agree with the last paragraph and I’d like to get your comments on it. I’m not trained in logic, but I have in PhD in the life sciences so I’m also familiar with experimental design. Here goes:

    1) Changing the verbs can still be the same experiment. Just like it would be the same experiment if we evaluate if food preservation by covering it in (a) water, or (b) acid. [this is a nonsense experiment of course]. It’s a different medium just like “it’s a different verb”.

    2) The alternative structure you propose is interesting: what it’s doing is evaluating the effect of “knowing” on the attribution of intentionality. Right? That makes sense. So why do you think it’s fundamentally different to evaluate the effect of “goodness of outcome” on the attribution of intentionality? It’s still tweaking a variable. The variable is made to be binary (harm vs help) but it’s essentially “goodness of outcome of action A”.

    Looking forward to a reply.

    • Profile photo of Seth Paskin says

      Thanks for commenting. The experiment shows that people’s intuitions about the attribution of intention to the actor (the executive) are different whether he has knowledge of the harm or the help he’ll be causing by acting. But it is less an experiment than two different examples meant to show that people think differently about responsibly in the case of knowing in advance the good or bad outcomes of an action.

      Basically what is suggested is that people think you are responsible if you do something that will have negative consequences and you know that it will, even if you do it for other reasons. If you do something with positive consequences, people only hold you responsible (give you credit) if you do it specifically to achieve the positive outcome.

      My point here is that the example is overdetermined by the fact that its a corporate executive and money and the environment are involved. You can imagine an identically structured model (actor, foreknowledge,opposite outcomes) that showed something different.

      • says

        Ah! So you are saying that the particular context (executive, money, environmental concerns, etc) confounds the larger point he’s trying to make about attribution of intention as a function of goodness of outcome. It’s a problem of generalizing the experiment.

        Thanks for the reply! Cheers,


Leave a Reply

Your email address will not be published. Required fields are marked *