Neuroscience and optical illusions


Leandro Castelluccio


The vision system has been widely studied within the field of Neuroscience and Cognitive Sciences. It is a fascinating and at the same time complex area, from which many insights about cognitive and cerebral functioning in general arise, providing new possibilities for understanding the nervous system, as well as new applications in the field of Computational Neuroscience. In this essay we will give a brief overview of the vision system, the optical illusions and what they imply to understand what the brain does when it processes sensory stimuli (actively predicts or passively activates?).

The vision system

One of the basic elements of brain functioning lies in the synapse. This is the point where electrical signals from one neuron generate electrical effects in another. The initiation of an action potential, for example, that which allows transmitting nerve impulses, and which is the basis of brain functioning at the level of the neuron, requires the opening of sodium ion channels in the mound of the axon of the neuron. At this level, the phenomena depend on physical and chemical properties and on the characteristics of the biological structures at play. To unleash an action potential, a certain trigger threshold must be overcome thanks to the action of graduated potentials that are added temporally and spatially, provided by the connections with other neurons. In order for activity to occur in the latter, the same phenomena must occur (Kandel, Schwartz, & Jessel, 2001).

Synapse (image taken from:link)

The process of vision is a good example to understand what perception means at the level of the sense organs and brain processing. At the level of the retina, in our eyes, there are two types of photoreceptors: the cones and rods. Photoreceptors are light sensitive cells that transduce light into an electrical signal. The cones are sensitive photoreceptors in photopic conditions (daylight). There are 3 types: S (sensitive to short wavelength), M (sensitive to medium wavelength) and L (sensitive to long wavelength). Rods, on the other hand, are sensitive photoreceptors in scotopic (dark) conditions. The rods, for example, have a visual pigment called rhodopsin, which has two parts, a protein portion located in the membrane of the rods, and a retinal, derived from vitamin A. This last part has two different isometric configurations: 11-cis and all-trans. Light, through a physical process, what it does is to change the configuration of this part of the rhodopsin from 11-cis to all-trans, and here it starts a cascade of events that end in the vision experience that we have (Kandel, Schwartz, & Jessel, 2001).

Another key concept is that of the fovea, an area in the central retina at the center of the gaze where photoreceptors are more compact and visual acuity is the highest. The retina represents a neural surface on which the eye projects an image. It contains several layers of processing that end in the ganglion cells. Before that, we also have amacrine cells, which are lateral processing cells in the retina, for which there are more than 40 anatomically distinct types. On the other hand, horizontal cells are lateral processing cells that underlie the center-surround organization of bipolar receptor fields and ganglion cells. A key concept here is that of receptive fields, which have a center-surround structure. The receptive field of a visual neuron is the area of ​​the retina that, when stimulated by light, causes a response in the neuron. Then we also have the bipolar cell, which is a processing cell of the retina that carries an electrical signal from the cones to the ganglion cells. Finally, the ganglion cells are neurons that exit the retina, which connect to the NGL (lateral geniculate nucleus, which transmits visual information from the retina to the cortex) through the optic nerve. Its receptive fields have a center-surround structure. There are more than 30 different types, each one tuned to different visual information. Midget cells, for example, are specialized ganglion cells for high acuity and “red-green” color. The parasol cells, on the other hand, are specialized cells for high temporal frequencies and low spatial frequencies. Another example is the small bistratified cells, specialized retinal ganglion cells for the violet-lime color (Kandel, Schwartz, & Jessel, 2001).

(Image taken from: link)

Following these authors, two key concepts are the receptive field of the visual neuron, which is the area of the retina, which, when stimulated by light, will cause a response in the neuron, and the notion of on-center and off-center cells. On-center cells refer to those in which the light that falls in the center of the receptive field is excitatoryand the light that surrounds it is inhibitory. The Off-center cells refer to those in which the light that falls in the center of the receptive field is inhibitory and the light that falls in the surroundis excitatory.

(Image taken from: link)

And then we move to a higher level of processing At the cortical level in the brain, the process of vision is organized in two ways from the occipital lobes: the dorsal, directed towards the posterior parietal cortex, and the ventral directed towards the lower temporal cortex. The dorsal pathway involves the processing of elements such as depth, movement and location in space of the objects. The ventral pathway involves the processing of elements such as shape or color. Each processing nucleus in the brain would respond to particular aspects of the image captured, where a specific neuronal group would be activated depending on a certain aspect of the image, such as a certain color, for example, but not for another aspect, such as shapes or contours from the image. Visual processing represents sparse coding, where the presence of a stimulus is encoded by the activity of a scattered network of neurons. Key concepts are those of the magnocellular system and the parvocellular system. It is believed that the magnocellular system, which originates from parasol cells tuned to high and low spatial temporal frequencies, gives rise to the “where system “. The parvocellular system, on the other hand, refers to a visual flow that originates in the midget cells that are believed to give rise to the “what” route. The “where” pathway represents information flow from the primary visual cortex (an area containing cells that prefer lines with different orientations, different spatial frequencies, and regions of color-sensitive “bubbles”) through other visual areas, including the MT (specialized visual area for movement) to the parietal lobe, and is related to location and movement (the aforementioned dorsal pathway). The “what” route represents information flow from the primary visual cortex through other visual areas, including V4 (specialized in color) to the temporal lobe, and is related to shape and color (the ventral pathway) ( Kandel, Schwartz, & Jessel, 2001).

Hierarchical organization of the visual system, figure taken from Kafaligonul (2014).

The figure above shows a schematic example of the organization of the visual system. However, it is much more complex than it has been classically considered, with interconnections between several processing cores, as exemplified by the scheme of Felleman and Van (1991):


Following Kandel, Schwartz and Jessel (2001), within the visual cortex, single cells represent cells of V1 (primary visual cortex) with receptive fields that prefer oriented edges. Complex cells are cells found in V1 and V2 (secondary visual cortex) that are similar to simple cells but with a degree of spatial invariance. The edge could fall anywhere in the receptive field or it could be moving and the cell would still respond. “End-stopped cells” are neurons in V1 and V2 that respond to oriented lines of a particular length. Simple cells respond to the edges. Their receptive fields can be mapped in excitatory and inhibitory fixed zones. Complex cells also respond to oriented edges, but have a degree of spatial invariance: they respond to a light pattern of a given orientation, regardless of exactly where it is in the receptive field. Therefore, it responds to a stimulus in motion and can be directionally selective for movement. Simple and complex cells can have receptive fields that stop at the end, which means that the response is reduced for edges longer than a certain length. More complex receptive fields are created by wiring cells that have simpler receptive fields:

Captura de pantalla 2018-11-03 a la(s) 13.16.11

When lesions occur at the level of the visual cortex, different syndromes arise, such as Achromatopsia or Blindsight. Color blindness occurs when the world is perceived completely achromatically but the cones function normally. It is caused by a localized brain injury that often includes V4 and/or V8. Blidsight on the other hand, refers to cortical blindness caused by a lesion in the primary visual cortex. Some visual perception, particularly of movement through a subcortical pathway, is still possible, but patients with blind vision may have vision without consciousness.

Optical illusions: why do they happen?

A key factor in understanding optical illusions is the notion of constancy. For example, the constancy of size refers to perceiving the size property of an object as constant regardless of the distance or location of it. On the other hand, the constancy of color is the constant perception of the color of an object despite changes in the color of the object’s illumination and, therefore, in the color of the light that it reflects. These are just a series of constancies, which also include shape and roughness, for example. These constancies allow us to objectify our reality, that is, it allows us to see the qualities of objects with respect to changes and conditions, which forms a clear image of the external world and the types of entities that make up reality (Gilchrist, 2010). As we have seen previously, the visual system can be divided into three levels of information processing: low level vision, which focuses on the physiological mechanisms of the retina and the resulting neural signals; high level vision, which is cognitive and would imply prior knowledge; and a medium level that is sometimes associated with the Gestalt psychology and an emphasis on the organizational structures of perception (Adelson, 2000). Both low-level and high-level processing are factors that are associated with common optical illusions. Low-level processing, such as in the retina and early cortical processes, provide an underlying mechanism for the constancy of lightness, for example, where our visual system is less sensitive to gradual changes in luminance than to sudden local contrasts, and the uniform regions are “filled in”. In certain perceptual cases, this factor generates an optical illusion. However, some low-level explanations have failed to account for these illusions and high-level explanations have been formulated. At the high level, perception could be considered as the result of unconscious inferences. The constancy of lightness, for example, would be achieved through the visual system inferring and then discounting the illuminant. The illusions in these cases are the result of a failure in our inductive inferences of what is in the world. Therefore, by using premises based on sensory evidence, perceived lighting conditions and previous experiences, our visual system comes to a false conclusion. However, we must also point out that there seems to be no empirical evidence for the existence of unconscious inferences, and that modern theories of the perception of lightness incorporate high and low level mechanisms to explain our experience (Kindom, 2011).

A classic optical illusion is that of “Ames room”, an illusion in which two figures appear of different sizes:

ames room
(Image taken from: link)

As this video explains, one person is farther from the observer than the other, but the signals in the room are arranged to appear to be the same distance. Therefore, the constancy of size does not apply to the most distant stimulus to adapt its perceived size. In this example, the observer is deceived by modifying the normal characteristics that a room would have and playing with the angle of the camera that takes the image of the room:

(Image taken from: link)

Another example is the Hermann grid illusion, a square grid where illusory spots are perceived at bright intersections:

(Image taken from: link)

An explanation of this phenomenon takes into account the low level of visual processing, making reference to the visual receptive field, where as we mentioned, in the center of the field, the individual photoreceptors excite the ganglion cell when they detect an increase in luminance while the photoreceptors in the surrounding area inhibits the ganglion cell. Therefore, since a point at an intersection is surrounded by more intensity than a point in the middle of a line, the intersection appears darker due to greater inhibition:

Captura de pantalla 2018-11-05 a la(s) 16.57.09

However, if we perform a Fourier analysis, where any function can be expressed as the sum of a collection of sinusoids and we transform the grid, the illusion disappears, although the same low level visual processing still exists, indicating that a probable high-level visual process is required to explain the illusion:

Captura de pantalla 2018-11-05 a la(s) 16.57.36

On the other hand, we have the checker shadow illusion, where a difference is perceived between the clarity of two identical surfaces. The illusion seems to be the product of lightness constancy:

(Image taken from: link)

As the following image indicates, in fact both square A and B have the same gray tone:

(Image taken from: link)

The constancy of lightness is achieved by inferring and then discounting the illuminant. While there is a shadow represented through the chessboard, there is actually no shadow there, since in the illusion there is only an image of a shadow, which would cause the illusion. This illusion would exemplify the result of a failure in our inductive inferences.

On the other hand, an interesting illusion described by Tangen, Murphy and Thompson (2011) shows a new face distortion effect, which results from the rapid presentation of faces aligned to the eyes. These authors recognize the experience as “each face becoming a caricature of itself, some faces appearing highly deformed, even grotesque” (Tangen, Murphy and Thompson, 2011, page 628). Apparently, the degree of distortion is greater for the faces that deviate from the others in the set in a particular dimension: if a person has a large forehead, it seems particularly large, if someone has a thin nose, then it looks noticeably thin or if a person has a large jaw, it seems particularly large, almost like an ogre. Observe the following videoto see an example of this illusion.

According to these authors, the relative coding seems to boost the effect. This means forcing the observer to code each face in light of the others, and by aligning the faces with the eyes, it is much easier to compare their shape and the relative location of their characteristics, making the differences between them more obvious. The fast and constant presentation speed can also encourage this relative coding, evident by the fact that the effect decreases if the faces are not aligned to the eyes or are too slow or too fast. In addition, if a short space is inserted in the sequence, the effect disappears almost completely (Tangen, Murphy, & Thompson, 2011)

A similar illusion, the “face distortion effect after the effect” could indicate that adaptation is key to generate the effect of distortion of the face (Webster & MacLin, 1999). In a study conducted by these authors, the observers matched or classified the faces before or after seeing distorted images of the faces, which shows that the previous adaptation strongly influences the perception of the face by making the original face appear distorted in a direction opposite to the adaptation of the distortion. In their experiment, the side effects were weaker when the adaptation and test faces had different orientations, concluding that the side effects depend on which images are distorted and not simply on the type of distortion introduced. In addition, according to Hills, Holland and Lewis (2010), children (6 to 12 years of age) showed greater side effects than adolescents (13 to 18 years of age) and demonstrated subsequent effects of a similar magnitude for asymmetric and symmetrical distortions. However, adolescents only showed side effects due to symmetrical distortions. These authors propose that children can have a more flexible facial norm and neural responses that allow a wider range of adapted states compared to adolescents, which implies even more adaptation as a mechanism for the generated effect. This would indicate that visual systems are based on previous learned information and current flexibility to process visual stimuli, which could support the notion of vision as predictive coding.

According to Tangen et al. (2011), the distortion effect seems to depend on the external dimensions between the images of the set, and these dimensions do not seem to be limited to facial features or configurations, if one photograph is very bright compared to the others, then it appears overexposed. But in general, the mechanism under which the effect occurs is not fully understood.

Predictive coding?

As highlighted in a previous essay (“Applying Bayes’ concepts to the understanding of consciousness-Some errors“), there are many cases of visual optical illusions that could give us the idea that the conscious aspect of our experience is a process of inference from the brain of what is outside and not an exact reflection of reality (although it never really is, we speak here of certain errors as it happens in certain optical illusions). There is an illusion that can be an example of this, the rotating mask: link.

This illusion is based on the processing of the human face. The theme here has nothing to do with the direction of the turn but with what seems to be an impossibility to perceive hollow faces. The brain seems to be conditioned to see normal faces, which is very interesting, because it is an illusion, among others, that could be considered as an argument of consciousness as a Bayesian product: what we consciously perceive are not totally objective perceptions, to put it in a way, what the person sees, but predictions of the brain of what it should be seeing (predictive coding). As in the aforementioned checker shadow illusion, the brain assumes that there is a shadow and that conditions the perception of the square B. In the illusion of the hollow face, when the face is turned around and the hollow part is exposed, at that moment, the brain seems not to process the hollow face and reverses the image of the face to see it normal.

But the interesting thing about these illusions in relation to predictive coding, is that it does not take into account the visual character of them, which can alone explain the illusions, this has been overlooked in relation to consciousness as a product of inference. What I mean by this? That if such visual illusions speak of a predictive nature of the brain, other types of perceptions that must confirm the rule are ignored, why are there no such diversified illusions in sensory modalities as touch or the sense of smell? Is it not more likely that there is a misinterpretation of these illusions in that they involve predictive coding?

As mentioned in the previous essay about Bayes, think when we go by bus or train next to another, and while being stopped the other begins to move but it seems that it is oneself who does, we could ask: is the brain conditioned or predisposed to see oneself moving instead of other things? The truth is that sometimes it seems one thing and sometimes another. In such a case the brain does not expect that it is oneself that moves, in fact it contemplates both possibilities, and yet that illusion exists and the brain does not have the expectation that it is oneself that moves, that “prior” does not exist.  In such a way, the illusion would obey to perceptive factors that go beyond a purely predictive functioning of the brain. And again, if it were a predictive coding in absolute terms for the brain, why cannot we correct the prediction errors in the visual illusions by incorporating new information of a perceptual nature? Would not this be more adaptive in evolutionary terms? If these optical illusions were the result of a Bayesian-type product, the illusions should change after we have new information of their erroneous character as in the case of the checker shadow illusion or in the illusion of the hollow mask. But this does not happen. As indicated by Ramachandran and Hirstein (1997), qualia (our conscious subjective experience) present three laws. The human being exhibits certain processes that are irrevocable, the quale would be one of them. The first law indicates that qualia are irrevocable, that is, once one has identified a sensation, one cannot reject it, once one perceives an object of blue color, one cannot see it of another color, no matter how much one wants it consciously. So the experience has a definite character that simply cannot change. This contradicts the possibility that we can adjust our direct perceptions of things, certain qualia, given new information that fits our predictions or the “prior”. So as much as we would like to see the two squares with the same shade of gray in the previous example of the checker shadow illusion, we can not do it, although we have seen that they have the same color. This would indicate that consciousness goes beyond a prediction, since the prediction can be adjusted, and is part of the Bayes model, but qualia implies something that can not be adjusted.

One could argue that there is no such possibility of new information to re-adjust the predictions, and thus perceive things differently, in which case I would say that we enter a complicated area where we run the risk of not being able to falsify the theory of the predictive brain taking into account their own postulates of what should happen. If indeed the brain cannot adjust its “prior” idea by some perceptive impossibility of a biological nature, would we be talking about an unmodifiable genetic “prior” for example? Or is the theory incorrect and simply, are there things that the brain can perceive and things that cannot?

In any case, the predictive process should be part of the phylogeny of the species rather than the ontogeny and development of the organism (although it is true that during development multiple aspects of the vision are learned). One thing is that the brain has certain patterns of processing of visual stimuli, genetically encoded, and another thing is that the brain actively engages in predictive coding, it is an important conceptual difference. It could be argued that the brain does not predict that you should see a normal face instead of a hollow, simply the area responsible for seeing faces is activated by visual stimuli, and therefore we do not see a hollow face. In fact, there is a defined area for this, the fusiform face area (FFA), an area on the central surface of the temporal lobe that contains cells that respond to the faces, which denotes the importance of facial processing in human beings. To say that this phylogenetic quality is a Bayesian process also presents its problems, since the main engine that made us perceive things in a particular way is natural selection, which has made us capture things that are outside in the world more accurately or with more details. We apply wrongly our psychology, since in evolution nothing is expected, there is no “prior” that then fits the evidence, there are simply characteristics that adapt better to the environment, and those that do not disappear. We can hardly say that the brain expects a source of light to come from above (the sun), but that the characteristics that reacted to this fact were consolidated in the evolution of the particular species, so that things are processed in such a way, but one cannot adjust the perception to make it more accurate, they are structural aspects that cannot be modified. If we speak of a Bayesian process, the “prioir” must be able to be modified, so that our perception would be different, and perhaps this kind of optical illusions would not exist.

Therefore, as indicated in the previous mentioned essay, we must be cautious in the conclusions we draw from this type of optical illusions. We could even say that what the brain does is not to predict, but that the prediction is confused with the simple activation of certain neural patterns, that is, certain things will activate certain patterns with more probability, they refer to the conscious experience, and the previous activation of certain patterns will make these activated in turn with greater probability, that does not imply a constant predictive act based on what is perceived moment by moment, but rather there is a perception that conditions the perception, but predicting would be something different, a further process. That one is constantly making predictions based on what is perceived would be inefficient too, since with the other process is enough to explain perception. At the same time there are things that we are not able to perceive, and if this were the case for hollow faces, that isolated fact would explain the illusion and not the Bayes concepts. And again, let’s not forget the practical absence of other illusions of different sensory modalities, which I would say would be expected and would generate greater support for the hypothesis of the brain and consciousness as something predictive.


Adelson’s Checker-Shadow Illusion. (n.d.). Recuperado de

Adelson, E. H. (2000). “Lightness perception and lightness illusions” in The New Cognitive Neurosciences, M. Gazzaniga (Ed), 2nd ed. pp. 339–352, MIT Press: Cambridge MA.

Felleman, D. J., & Van, D. E. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991)1(1), 1-47.

Gilchrist, A. L. (2010). “Lightness Constancy” in Goldstein, E. B. (Ed.). (2010). Encyclopedia of perception (Vol. 1). Sage.

Hills, Peter J., Holland, Andrew M., & Lewis, Michael B. (2010). Aftereffects for Face Attributes with Different Natural Variability: Children Are More Adaptable than Adolescents. Cognitive Development,25(3), 278-289.

Kandel, E. R, Schwartz, J. H. & Jessel, T. M.(2001). Principios de Neurociencia. Madrid: McGraw-Hill.

Kafaligonul, H. (2014). Vision: A Systems Neuroscience Perspective.

Kingdom, F. A. (2011). Lightness, brightness and transparency: A quarter century of new ideas, captivating demonstrations and unrelenting controversy. Vision research51(7), 652-673.

Ramachandran, V.S &Hirstein, W. (1997). Three Laws of Qualia: What Neurology Tells Us about the Biological Functions of Consciousness, Qualia and the Self.Journal of Consciousness Studies, 4, 429-458.

Tangen, J., Murphy, S., & Thompson, M. (2011). Flashed Face Distortion Effect: Grotesque Faces from Relative Spaces. 40(5), 628-630.

Webster, M., & Maclin, A. (1999). Figural aftereffects in the perception of faces. Psychonomic Bulletin & Review,6(4), 647-653.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s