The birth of Psychology as a science is often considered from 1879 onwards with the creation of the first laboratory of Experimental Psychology in Leipzig, Germany, by the physiologist, philosopher and psychologist Wilhem Wundt. In this sense, we could say that Psychology is a fairly young scientific discipline if we compare it with “hard” sciences such as Physics, Chemistry or Biology. At the same time, unlike the latter sciences, the scientific status of Psychology is often questioned, especially because it does not resemble sciences such as Physics, where the variables to be investigated tend to be better controlled in an experimental setting, contrary to the much more complex human behavior. This coupled with the development and influence of a multitude of psychological practices and currents of dubious scientific status, where philosophy, spirituality, pseudoscience, and even sometimes political ideologies are intermingled, has made Psychology a discipline that is not always considered serious in its affirmations. At the same time, it converges a lack of skeptical attitude and an almost dogmatic acceptance and defense of certain practices and supposed theories, which make the discipline a very particular one compared to other sciences.
In spite of this, Psychology is a scientific discipline with a very important accumulation of evidence in multiple fields, the key factor is to demarcate the scientific part from the pseudoscientific currents.
It is beyond the scope of this essay to describe the totality of pseudoscientific practices in Psychology, but we will mention several examples.
What is science?
As stated by Carl Sagan, one of the greatest promoters of scientific divulgation, science is more than a corpus of knowledge, it is a way of thinking. A way of thinking that requires the researcher to control the prejudices and presuppositions, carefully observe and manipulate a huge amount of data, question the hypothesis, induce conclusions, not accept them as absolute truths, all depending on what the contingent evidence tells us.
The inductive method plays a fundamental role in empirical research and yet has been questioned in philosophical terms as a solid basis for truth. We can say: “the apple is a fruit and it is sweet, the banana is a fruit and it is sweet, the pear is a fruit and it is sweet … all fruits are sweet”. The problem is that even accepting as valid the observations that the apple, banana or pear are sweet, that does not mean by logic that all fruits are, or even the fruits that are mentioned. In spite of this, science extracts generalities from observations, we go from a small sample to making statements that involve the whole population.
However, this does not invalidate science in its search for truth. Consider that apart from the fact that logically we can not make the jump to affirm with certainty that “all fruits are sweet”, one could say before any conclusion that because there may be a case that does not follow the rule, we can not accept the conclusion as true. Newton had a brilliant answer to that problem when he said: “I frame no hypothesis” (hypothesis in this sense refers to any claim that is not based on empirical evidence), which means that every claim that is not based on evidence is irrelevant in science, it would be absurd, therefore, to question the relevance of induction on the basis of a nonexistent entity, that is, that there may be a case that does not follow the rule, the important thing is that we base our claims on evidence, and that is what matters in science (Peikoff, 2012).
But even beyond this issue, we must bear in mind that there is an important asymmetry between proving and falsifying statements that declare something universal. One thing is to prove that all fruits are sweet, for this we have to experiment with the whole universe of fruits and check that each is sweet. This becomes very difficult when we have huge populations. However, to verify that the lemon is a fruit and is acid implies that we can reject or falsify the conclusion that all fruits are sweet. This means that to prove the conclusion we need many cases, however, to falsify it, one is enough. This implies that in science we learn much more from negative cases than from positive cases, and it is this ability to falsify a hypothesis or theory, more than anything else, that gives it its scientific character, following Popper. According to this author, science can never definitively confirm a hypothesis, due to the problem of verifying all possible cases, as in the previous example, but what it can do is to definitively refute it by means of the deduction of an observable consequence of the hypothesis or theory and showing that said consequence is not fulfilled. The refutation is thus a reasoning of modus tollens type (if p then q, not q then not p): the hypothesis p implies the observable consequence q, the observable consequence q is not the case, therefore, the hypothesis p is also not the case. This asymmetry denotes the validity of the modus tollens compared to the invalidity of the attempts to confirm a hypothesis, since “p then q, and q, then p” is a fallacy of affirmation of the consequent. While the rebuttals take the form of a deductively valid argument, the confirmations take the form of a deductively invalid argument.
If a theory is not falsifiable, however consistent it may be among its claims, we cannot classify it as scientific. In spite of this, erroneously it is usually given more importance in academy and in the publication of scientific articles (particularly in Psychology) to studies that show positive or significant effects with respect to a hypothesis than those that do not show an effect, the so-called “file drawer effect”. However, given the above, we must understand that negative cases or when we falsify a hypothesis are perhaps even more important for the advancement of science than the cases that prove something in particular. In turn, in Psychology perhaps more than in any other discipline, non-falsifiable theories abound and distort the scientific status of other psychological currents.
The scientific status of psychological theories
Following the previous approaches, we have that the empirical character of a theory increases with its degree of falsifiability, and this implies that the theory is more useful, because it says more about the world. This is so, since the more cases a theory prohibits the more predictive character it has, the theory is thus useful to apply it to reality while in the opposite case, where a theory allows every possible event, it does not really explain anything. That is why falsifiability as a criterion for evaluating a scientific theory is extremely important, and with this we respond to a common criticism of certain pseudosciences, which affirm that although their theories cannot be proven at all, they are useful to interpret reality and generate changes, simply this is not the case.
Suppose we have a theory about the weather that predicts that tomorrow at a certain time it will rain, with a degree of confidence of plus/minus one hour. If tomorrow does not rain and is sunny in that period of time, we will have falsified the theory, it was incorrect. Now, if the theory authors’ say that it does not refute it because the theory also predicts that in that time lapse it can turn sunny, or hail, or be foggy, etc., then the theory is not really falsifiable, because it allows any result, there is nothing that refutes it, and with this it loses in turn the degree of utility, since it does not predict anything about the world, since it says one thing, but then another happens, or anything possible can happen. The theory ends thus saying nothing accurate about reality, so we cannot rely on it to explain the world. This is perhaps the most important characteristic of pseudoscience. Think of the multitude of people who claim to predict the future, let’s observe how often if not always, the prediction is diffuse and admits that a multitude of results may occur. This is an ally of pseudoscience, since if the prediction were accurate it would be highly falsifiable, which would refute the supposed ability to predict the future.
Thus, a current such as Psychoanalysis and its different derivations, although they may be internally coherent in their postulates, admit any result, and therefore are not falsifiable and useful to explain reality. Psychoanalysts may give a detailed explanation of a person’s behavior, but they are not really explaining it, in the sense that they can predict that the person engages in A or B behavior for a reason, but if the person incurs in C, they quickly adjust what it would be a proof of refutation, to an explanation according to the theory of why it incurred in C, so that any result is explained based on the theory, without having a scenario where it can be falsifiable. So Psychoanalysis becomes like the climate theory of the previous example, it does not predict something in particular, so it does not really explain the climate, and in this case the behavior, since any result is possible. According to Popper, we can contrast this with the exact delimitation of Einstein’s theories for example, where it is stated that if it were the case that light does not curve around the sun (which is a prediction of the theory of general relativity), general relativity is incorrect. This type of clear and specific demarcations between the truth and the refutation of a theory is an absent fact in the case of Psychoanalysis and other psychodynamic currents.
In addition to this, there are theories or currents with little empirical support, in the sense of proofs in favor of the postulates of the theory. Here we find everything from popularized theories in the cinema as that of multiple personalities for which there is not even a recognized diagnosis by manuals such as the DSM-V or the ICD-11, and is purely fiction, to theories or currents such as bioneuroemoción or family constellations, among many others. The central issue is that while they may be interesting theories and with daring hypotheses about how the psyche works, they are disconnected from the evidence, as they lack sufficient data to support them. As indicated by Alonso (2005), in the case of family constellations, although some of its components deserve careful consideration, since despite its popularity, it is a controversial process, with practically no information on therapeutic effectiveness, where the theoretical principles have not been demonstrated, the approach of the theory is interesting and could be a powerful tool to discover significant dynamics among human relationships. In this field we have also the new “Neuro-Theories”, which fallaciously combine the suggestion of being supported by evidence because they use “Neuro” in the name, when in reality they misrepresent and misuse the knowledge of Neuroscience. This would be the case of neurolinguistic programming (NLP), for example, as indicated by Witkowski (2010), the enormous popularity of therapies and the training of NLP has not been accompanied by knowledge of the empirical foundations of the concept. According to the author’s review, where among 315 articles, 63 studies were selected, showing that out of 33 studies, 18.2% show results that support the principles of NLP, while in 54.5% the results do not support the principles of NLP and 27.3% present uncertain results. The qualitative analysis indicates the greater weight of non-support studies and their greater methodological value compared to those that support the principles. Such results contradict the assertion of an empirical basis of NLP.
Although we must bear in mind that the non-significant effects of a particular study does not necessarily mean that the evidence counts against the theory, because the data may be insensitive, or we are not taking into account confidence intervals properly and misinterpret the meaning of the p values.
As Demidenko (2016) points out, there is growing frustration with the concept of p-value in statistics. In addition to having an ambiguous interpretation, the p-value can be made as small as desired by increasing the sample size (n). The p-value is outdated and does not make sense with “big data”, because everything becomes statistically significant. The root of the problem with the p-value is in the comparison of means. The author argues that statistical uncertainty should be measured at the individual level, not the group level. Consequently, the standard deviation (SD), not the standard error (SE), should be used with error bars to graphically present the data in two groups, for example.
The misuse of statistics and the interpretation of p-values has generated the idea that psychological studies are unreliable, and therefore can not be replicated (as indicated by a story a few years ago, where more than half of the studies in Psychology do not pass the reproducibility test: link). This has brought the thought that the scientific bases of Psychology are not very good, but the problem is not that, but the way in which we interpret the data and how the statistics of the studies are used.
Furthermore, as Ferguson and Heene (2012) mention, publication bias remains a controversial issue in psychological science. The tendency of psychological science to avoid the publication of null results produces a situation that limits the assumption of replicability of science, since replication cannot be significant without the possible recognition of failed replications. The authors argue that the field often constructs arguments to block the publication and interpretation of null results and that the null results can be further extinguished through questionable practices of the researchers. Given that science depends on the process of falsification, the authors argue that these problems reduce the capacity of psychological science to have an adequate mechanism for the falsification of theories, which results in the promulgation of numerous theories that are ideologically popular but have little empirical basis.
On the other hand, as Morey, Romeij and Rouder (2016) point out, a central aspect of science is the use of data to assess the degree to which the data itself provide evidence for assertions, hypotheses or competing theories. Evidence is, by definition, something that should change the credibility of an assertion in the mind of a reasonable person. However, according to these authors, common statistics, such as significance tests and confidence intervals have no interface with belief concepts, and therefore it is not clear how they relate to statistical evidence. Given this problem, many authors are now turning to Bayesian statistics, where statistical evidence can be quantified using the Bayes factor. As mentioned by Wagenmakers et al. (2018), Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. Many of these advantages translate into concrete opportunities for pragmatic researchers. For example, Bayesian hypothesis testing allows researchers to quantify the evidence and monitor its progression as data comes in, without needing to know the intention with which they were collected.
The validity of psychological tests
Psychometric tests involve a series of steps to investigate the validity and reliability of them. These steps involve linguistic aspects that involve translating an instrument into a language in the case of being in a different one, which can include transforming a reagent so that it adapts to its content (semantic) and where a collaboration with judges is usually required. More than one translation option is used until a majority agreement is reached. Also, there are statistical questions specific to the test, such as conducting a piloting with the material made in a test presentable to users. Based on this, analysis of descriptive statistics are made, such as frequency, to ensure that there is response in the different options, in addition to evaluating the direction of the reagents based on what one tries to measure and the normality of the distribution, among other aspects. We also make divisions of distributions of variables in quartiles, with t student comparisons to assess discrimination of the test items. Generally, with the reagents that meet passing scores of the previous steps, an internal reliability test is performed with the Cronbach’s Alpha measurement. We work here with correlations and then factorial analyzes to see the consistency and adjustment of the dimensions or groupings of the test. With these statistical analyzes, results are obtained that allow to visualize indices of reliability and validity that support or not the psychometric properties of the test, analyzing the pertinent adjustments to apply them correctly in a particular population. So the axis of the methodology in terms of instrumental validation focuses on the use of an extensive database, carrying out the analysis of the construct validity of the scale, using factorial analyzes of an exploratory and confirmatory nature, studying the explained variance, in addition to the descriptive analysis.
It is important that the different psychological tests follow this adequate empirical path so that they can be really used for what they claim to measure. However, there is an important body of evidence indicating that many tests normally used by psychologists are not valid or reliable. This is the case of many tests called “projective”. Take the example of the test of the drawing of the human figure, according to Lilienfeld, Wood and Garb (2000), who indicate according to a multitude of reviewers during the last four decades that have converged on a unanimous conclusion, that the overwhelming majority of the signs in the drawing of the human figure have negligible or no validity.
Following these authors, although projective techniques continue to be widely used in clinical and forensic settings, their scientific status remains highly controversial. Reviewing the current state of the literature regarding the psychometric properties (norms, reliability, validity, incremental validity, usefulness of treatment) of three main projective instruments: Rorschach Inkblot Test, Thematic Apperception Test (TAT) and drawing of the human figure, the authors conclude that there is empirical support for the validity of a small number of indexes derived from the Rorschach and TAT. However, the vast majority of the Rorschach and TAT indexes are not supported empirically. The evidence of validity of the drawing of the human figure is even more limited. With some exceptions, the projective indexes have not consistently demonstrated an incremental validity above and beyond other psychometric data. The authors indicate that there are “file drawer effects”, since the size of the effects of the published studies exceeded that of unpublished studies.
Psychological therapies and pseudoscience
Taking into account the previous problems, it is not unexpected that many psychological therapies have serious problems of empirical support, and may in fact be harmful to people. According to Thomason (2010), not all psychotherapies are equally safe and effective, and a list of treatments to avoid would help both psychotherapists and clients to avoid potentially harmful therapies. It is relatively simple and easy for any counselor, social worker or psychologist to create a new form of psychotherapy, practice it and offer training workshops on it, even if there is little or no evidence of its safety or effectiveness. As the author points out, creativity and innovation should be encouraged, but the creators of new therapies should be expected to conduct appropriate research to demonstrate the safety and efficacy of the new approach.
According to Cummings and O’Donohue (2010), some of the psychotherapies that are not tested, probably ineffective and/or potentially harmful include rebirth, the treatment of post-traumatic stress disorder when applied to civil situations rather than situations of combat, rebirth (attachment) therapy, eye movement desensitization and reprocessing (EMDR), treatment for dissociative identity disorder, psychoanalysis, psychotherapy to help clients with self-realization, acceptance and commitment therapy, among others.
Alonso, Y. (2005). Las constelaciones familiares de Bert Hellinguer: un procedimiento psicoterapéutico en busca de identidad. International journal of psychology and psychological therapy, 5(1), 85-96.
Cummings, N. A., & O’Donohue, W. T. (2010). Eleven blunders that cripple psychotherapy in America: A remedial unblundering. Routledge.
Demidenko, E. (2016). The p-value you can’t buy. The American Statistician, 70(1), 33-38.
Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead theories: Publication bias and psychological science’s aversion to the null. Perspectives on Psychological Science, 7(6), 555-561.
Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective techniques. Psychological science in the public interest, 1(2), 27-66.
Morey, R. D., Romeijn, J. W., & Rouder, J. N. (2016). The philosophy of Bayes factors and the quantification of statistical evidence. Journal of Mathematical Psychology, 72, 6-18.
Peikoff, L. (2012). The DIM hypothesis: Why the lights of the West are going out.New York: New American Library.
Popper, K. (2005). The logic of scientific discovery. Routledge.
Thomason, T. C. (2010). Psychological Treatments to Avoid. Alabama Counseling Association Journal, 36(1), 39-48.
Wagenmakers, E. J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., … & Matzke, D. (2018). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic bulletin & review, 25(1), 35-57.
Witkowski, T. (2010). Thirty-five years of research on Neuro-Linguistic Programming. NLP research data base. State of the art or pseudoscientific decoration?. Polish Psychological Bulletin, 41(2), 58-66.