Humans understand a lot of information through vision. Will computers be able to understand the world like humans by giving them images and videos as inputs? Associate professor Takao Yamanaka from the Faculty of Science and Technology undertakes such research in computer vision.

The human brain understand the world by processing information obtained through vision. Light signals are received by receptors in the retina, with some receptors reacting to the three primary colors of light (red, green, and blue), and some receptors detecting the intensity of light. Computer vision seeks to reproduce in computers the same kind of information processing that takes place within human brains using visual information given as inputs.

Machine learning programs that carry out decision-making using methods that simulate the brain’s neural circuits are known as neural networks. In 2012, a deep learning approach that advanced neural networks was proposed at an image recognition competition, significantly improving previous image recognition capabilities.

Neural networks learn from data to find methods for extracting information useful for recognition. Previously, methods for extracting useful information were developed through trial and error. Using large amounts of image data and neural networks, it became possible to carry out information extraction automatically. Since then, deep learning has come to be used for image recognition.

Predicting the overall state of the surroundings using a single landscape photograph

Omni-directional image generation is a theme that I am currently working on. This technology uses a single photograph—such as one taken on a grassy plain—to automatically generate omni-directional images by photographing a full 360 degrees at that same location.

Omni-directional images can be obtained using a normal camera with some effort. For example, Google’s smartphones have a function that generates 360-degree omni-directional images by stitching together several photographs taken in a number of designated directions at a certain location. There are also special cameras called omni-directional cameras that can take omni-directional photographs at one go. However, it is not easy to obtain omni-directional images, requiring either effort or a special camera.

Therefore, I am proposing a method that uses deep learning to generate such omni-directional images from a single photograph. Humans can somewhat understand the state of the surroundings from a single photograph. This information is used to generate omni-directional images that look as natural as possible from a single photograph.

Examples include using an old film to generate a 360-degree film offering an omni-directional view, and using photographs of several tourist attractions taken at a certain location to generate an omni-directional image that contains all of these tourist attractions.

Understanding the brain’s activity to estimate a person’s gaze

I also conduct research on estimating the locations where a person’s gaze tends to fall when looking at an image. Such information is useful in video compression, for example, to save information at high resolution only for areas where the gaze tends to fall. In addition, if we know where the gaze tends to fall, such information can also be used for creating advertisements, among other uses.

Through research in computer vision using these deep learning techniques, I seek to develop technologies that can be used generically for various applications. The research field is also expanding year after year, and by developing deep learning, it may become possible to reproduce the function used by humans to recognize objects within the brain.

If we can unravel brain mechanisms that operate at extremely low levels of energy, we may be able to create computers with low power consumption or computer vision applications using these mechanisms. These are just some examples. Currently, the potential for expansion in this research field continues to grow beyond our imagination.

The book I recommend

“Konpyuta Bijon Saizensen”(The Forefront of Computer Vision)
journal by Kyoritsu Shuppan

This series is suitable for undergraduate students in their final year. It introduces cutting-edge technologies such as computer vision, generative artificial intelligence, and machine learning. It is the best informational magazine for learning about the latest technologies.

Takao Yamanaka

  • Associate Professor
    Department of Information and Communication Sciences
    Faculty of Science and Technology

Graduated from the Department of Electrical and Electronic Engineering, Tokyo Institute of Technology, and completed the master’s program in Electrical and Electronic Engineering at the university’s Graduate School of Engineering. Worked at Canon Inc. before obtaining his Ph.D. in Engineering after completing the doctoral program of the Department of Physical Electronics, Graduate School of Engineering, Tokyo Institute of Technology. Took on several positions—such as research fellow (DC2) at Japan Society for the Promotion of Science, postdoctoral fellow at the Department of Computer Science, Texas A&M University, and Assistant Professor at the Department of Electrical and Electronics Engineering, Faculty of Science and Technology, Sophia University—before assuming his current position in 2008.

Department of Information and Communication Sciences

Interviewed: July 2024

Sophia University

For Others, With Others