From Human Vision To Neural Networks
The article below will be part one of a two part series discussing how research in the 1960s has brought us to developing complex neural networks that are now capable of analyzing pictures, videos, mp3s and much more. Machine learning and deep learning has required decades of research to get where we are today. From tracking the first snap crackle and pop of neurons being stimulated, to developing programs that can dictate what is occurring in a scene.
Vision, From Human To Robot
Vision is a complex task. For instance, we take for granted processes such as depth perception, object tracking, edge detection, and many other features that our brains keep track of. Scanning the environment and localizing where we are in space is an undertaking that our brain is constantly doing. In some point in the past, researchers may have never thought it possible to create systems that can perform similar tasks to that of our own brains. Yet, in the last 50 years, we have gone from what might seem like small steps in neuroscience to computers being able to describe scenes in pictures.
Neuroscience course utilize many anecdotes and examples to help students understand how the brain operates. One example is the story of Phineas Gauge getting his left frontal cortex destroyed by a railroad rod and surviving or Britten’s paper depicting when the brain can detect a signal in a chaotic mess of moving dots. Such insights are likepuzzle pieces, allowing us glimpses of how the brain operates.
Hubel and Wiesel are two amazing neuroscience researchers who gave a lot to the field of human vision. They were given the Nobel Prize in 1981 for their work in psychology. They had made groundbreaking “discoveries concerning information processing in the visual system.” By connecting an electrode to a neuron, they were able to listen to the neuron responding to the stimulus of a bar of light.
The researchers gained an understanding of how neurons in the primary visual cortex (which can be seen in the image above) operated. In particular, they discovered that there were three different types of neurons that responded to different stimuli.
They classified the cells into three groups, simple cells, complex cells, and hypercomplex cells.
A simple cell would respond to a stimulus if it matched the angle of light which lined up with the cell’s excitatory region.This can be seen in the picture below and figure b.
A complex cell not only requires the correct angle, but also requires the stimulus to be moving. This is the most prevalent cell in the primary visual cortex, also known as the V1.
Finally, hypercomplex cells have the same qualities as the previous two cells. However, there is one more requirement.Hypercomplex cells require orientation of stimuli, movement, and direction of movement. So, a hypercomplex cell might respond to a stimulus at 90 degrees that is moving from left to right, but not right to left.
This video below depict how neurons in the brain only respond to bars of lights in specific locations and at certain angles. As the bar of light was moved, there is a crackle: You are hearing the neuron of a cat respond to the stimulus.
With this experiment, the researchers demonstrated how several types of neurons were activated only under certain stimulation. Another fascinating feature was the fact that the cells seemed to naturally map for different angles. In other words, each section of V1 would contain a very specific set of neurons that mostly responded to bars of light with a specific angle.
These results led to the theory that, by creating a sort of “bottom-up image” of the world, the human brain can “draw a picture” of what’s going on around us.
Fast forward nearly 30 years to BA Olshausen and DJ Field, two researchers in computational neuroscience. This is the study of how the brain encodes and decodes information. Instead of just focusing on single bars of lights, this team focused on how a statistical model could recognize edges and other low level features in a natural image.
Natural images have predictive features like edges, shadows, and other low level features that help our brains discover depth, e.g., where one object ends and another begins. Being able to locate these features help our brains make sense of the world around us.
One of their seminal papers, Natural Image Statistics and Efficient Coding, was written back in 1996. The purpose of it was to discuss the failures of Hebbian learning models in image recognition, specifically the utilization of principal component analysis.
Now science has gone from detecting bars of lights with a cat’s neuron, to a mathematical model of a network that outputs actual features from images.
The last line of the 1996 paper stands out:“An important and exciting future challenge will be to extrapolate these principles into higher cortical visual areas to provide predictions.” This was the challenge: to create models that could take neurons that could detect edges that were currently being modeled by various computational research scientists and then create a bottom-up network that could actually predict the context of an image.
The outputs of Olshausen and DJ Field model were similar to the one above.
If you are a deep learning fan, then this matrix of outputted lower level features looks familiar. This is a similar set of features that is utilized in a convolutional neural network.
The next article will discuss the jump from detecting features of images, to the classification of objects using convolutional neural networks.
If you would like to read more about data science and machine learning, feel free to check out the articles below: