How Google plans to improve web searches with multimodal AI
Join the leaders of online gaming at GamesBeat Summit Next on November 9-10. Learn more about what comes next.
In an event broadcast live today, Google detailed how it uses AI techniques – specifically a machine learning algorithm called a Unified Multitasking Model (MUM) – to improve web search experiences in different languages and devices. From the start of next year, Google Lens, the company’s image recognition technology, will be able to find items such as clothing based on high-level photos and descriptions. Around the same time, Google search users will start to see an AI curated list of things they should know about certain topics, like acrylic paint materials. They’ll also see suggestions for refining or broadening searches based on the topic in question, as well as related topics in videos discovered through search.
The upgrades are the result of a multi-year effort by Google to improve search understanding and the purpose of the relationship between language and visuals on the web. Google Vice President of Research Pandu Nayak says MUM, which Google detailed at a developer conference last June, could help better connect users to businesses by showing products and reviews and improving “All kinds” understanding languages, whether in customer service or in a research setting.
“The power of MUM lies in its ability to understand information at a broad level. It’s inherently multimodal, meaning it can handle text, images and videos at the same time, ”Nayak told VentureBeat in a phone interview. “It promises that we can pose very complex queries and break them down into a set of simpler components, where you can get results for the different, simpler queries, and then put them together to figure out what you really want. “
Google does a lot of testing in search to narrow down the results that users ultimately see. In 2020 – a year in which the company launched more than 3,600 new features – it conducted more than 17,500 traffic experiments and more than 383,600 quality audits, Nayak said.
Yet, given the complex nature of the language, problems arise. For example, a search for “Is sole good for kids” several years ago – “sole” referring to fish, in this case – brought up web pages comparing children’s shoes.
In 2019, Google set out to tackle the problem of linguistic ambiguity with a technology called Transformers Bidirectional Encoder Representations, or BERT. Building on the company’s research on the architecture of the Transformer model, BERT forces models to consider the context of a word by examining the words that precede and follow it.
Since 2017, Transformer has become the architecture of choice for natural language tasks, demonstrating an ability to summarize documents, translate between languages and analyze biological sequences. According to Google, BERT helped Search better understand 10% of US English queries – especially longer, more conversational searches where prepositions like “for” and “for” matter a lot to meaning.
For example, Google’s previous search algorithm would not understand that “2019 Brazilian traveler to the United States needs a visa” is for a Brazilian traveling to the United States and not the other way around. With BERT realizing the importance of the word “to” in context, Google search provides more relevant results for the query.
“BERT started to understand some of the subtleties and nuances of the language, which was quite exciting, as the language was filled with nuances and subtleties,” Nayak said.
But BERT has its limits, which is why researchers in Google’s AI division have developed a successor in MUM. MUM is approximately 1,000 times the size of BERT and formed on a web document dataset, with content such as explicit, hateful, abusive, and misinformation images and text. He is able to answer questions in 75 languages, including questions such as “I want to hike to Mount Fuji next fall – what should I do to prepare?” And realize that this “preparation” could include things like physical training as well as the weather.
MUM can also rely on context and more so on images and dialogues. Given a photo of hiking boots and asked “Can I use it to hike Mount Fuji?” MUM can understand the content of the image and the intent behind the request, advising the requester that hiking shoes would be appropriate, and directing them to a lesson in a Mount Fuji blog.
MUM, which can transfer knowledge between languages and does not need to be explicitly taught how to perform specific tasks, has helped Google engineers identify more than 800 COVID-19 name variations in more than 50 languages. With just a few examples of official vaccine names, MUM was able to find interlingual variations in seconds compared to the weeks it could take for a human team.
“MUM gives you a generalization of languages with a lot of data to languages like Hindi and so on, with little data in the corpus,” Nayak explained.
After internal pilots in 2020 to see the types of queries MUM might be able to resolve, Google says it’s expanding MUM to other corners of the search.
Soon, MUM will allow users to take a photo of an object with Lens – say, a shirt – and search the web for another object – say, socks – with a similar pattern. MUM will also allow Lens to identify an object unknown to a searcher, such as the rear sprockets on a bicycle, and return the search results based on a query. For example, given an image of sprockets and the query “How can I fix this?” », MUM will post instructions on how to repair the bicycle sprockets.
“MUM can understand that what you are looking for are repair techniques and what that mechanism is,” Nayak said. “That’s the sort of thing the multi-model lens promises, and we plan to launch it early next year, hopefully.”
By the way, Google unveiled “Lens mode” for iOS for users in the United States, which adds a new button in the Google app to make all images on a web page searchable through Lens. Also new, Lens in Chrome, available worldwide in the coming months, will allow users to select images, videos and text on a website with Lens to see search results in the same tab without leaving. the page they are on.
In the research, MUM will come up with three new features: things to know, refine and expand, and related topics in videos. Things to Know takes a broad query, like “acrylic paints,” and highlights web resources such as step-by-step instructions and painting styles. Refine and Expand allows you to find more narrow or general topics related to a query, such as “painting styles” or “famous painters”. As for the related topics in the videos, he selects topics in the videos, such as “acrylic painting materials” and “acrylic techniques”, based on the audio, text and visual content of those videos.
“MUM has a whole host of specific applications,” Nayak said, “and they’re starting to impact a lot of our products.”
A growing body of research shows that multimodal models are susceptible to the same types of bias as models of language and computer vision. The diversity of questions and concepts involved in tasks such as visually answering questions, as well as the lack of high-quality data, often prevent models from learning to ‘reason’, leading them to make educated guesses in s ‘pressing data set statistics. For example, in a study involving 7 multimodal models and 3 bias reduction techniques, the co-authors found that the models did not answer questions involving infrequent concepts, suggesting that there was work to be done in this area. .
“[Multimodal] models, which are trained at scale, result in emerging capacities, making it difficult to understand their biases and failure modes. Yet the business incentives are to deploy this technology throughout society as a whole, ”said Percy Liang, professor of computer science and professor of computer science at Stanford HAI, in a recent email.
Undoubtedly seeking to avoid generating a negative channel of publicity, Google says it has taken care to mitigate bias in MUM – primarily by training the model on “high quality” data and asking humans to evaluate MUM’s research results. “We are using [an] evaluation process to look for bias issues in any set of applications we launch, ”Nayak said. “When we throw in potentially risky things, we go the extra mile to be very careful.”
VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member