Researchers from UCLA Samueli School of Engineering and Stanford have demonstrated a computer system that can discover and identify the real-world objects it “sees” based on the same method of visual learning that humans use.
The system is an advance in a type of technology called “computer vision, ” which allows computers to read and identify visual images. It is an important step towards general artificial intelligence systems — computers that learn on their own, are intuitive, make choices based on reasoning and interact with humans in a more human-like way. Although current AI computer vision systems are increasingly powerful and capable, they are task-specific, meaning their ability to identify what they see is limited by how much they have been trained and programmed by humans.
Even today’s best computer vision systems cannot create a full picture of an object after seeing only certain parts of it — and the systems can be fooled by viewing the object in an unfamiliar setting. Engineers are aiming to make computer systems with those capabilities — just like humans can understand that they are looking at a dog, even if the animal is hiding behind a chair and only the paws and tail are visible. Humans, of course, can also easily intuit where the dog’s head and the rest of its body are, but that ability still eludes most artificial intelligence systems.
Current computer vision systems are not designed to learn on their own. They must be trained on precisely what to learn, usually by reviewing thousands of images in which the objects they are trying to identify are labeled for them.
Computers, of course, also cannot clarify their rationale for identifying what the object in a picture represents: AI-based systems do not build an internal picture or a common-sense model of learned objects the way humans do.
The engineers’ new method, described in the Proceedings of the Nationwide Academy of Sciences, shows a way around these shortcomings.
The approach is made up of 3 broad steps. First, the system breaks up a picture into small chunks, that the researchers call “viewlets. inch Second, the computer discovers how these viewlets fit together to create the item in question. And lastly, i think at what other objects have been in the encircling area, and whether information about those objects is relevant to describing and identifying the primary item.
To assist the new system “learn” more like humans, the engineers decided to immerse it within an internet replica of the surroundings humans live in.
“Fortunately, the internet provides two things that help a brain-inspired computer vision system learn the same manner humans do, ” said Vwani Roychowdhury, a UCLA professor of electrical and computer architectural and the study’s primary investigator. “One is a wealth of images and videos that depict the same types of items. The second is these objects are shown from many perspectives — covered, protected, bird’s eye, up-close — and they are put in several sorts of conditions. ”
To build up the platform, the researchers drew information from cognitive psychology and neuroscience.
“Starting as babies, we learn what something is because we come across many examples of it, in many contexts, ” Roychowdhury said. “That contextual learning is a key feature in our brains, and it helps us build strong models of objects that are part of the incorporated worldview where everything is functionally connected. inch
The particular researchers tested the device with about 9, 000 images, each showing people and other objects. System was able to build a detailed type of the human being body without external assistance and without the images being labeled.
The engineers ran similar tests using images of motorcycles, cars and airplanes. In all cases, their system performed better or at least as well as traditional computer vision systems which have been developed with many years of training.