Google AI Forum 8th Round: AI Innovation and Natural Language Processing

On December 5, 2017, Google hosted 'Google AI Forum' on the theme of 'AI Innovation and Natural Language Processing' at the conference room of Google Korea in Gangnam-gu, Seoul. At this forum, Google introduced ways and examples of how to improve the user experience through natural language processing using machine learning.

Google has been conducting research on natural language processing (NLP) for a long time, focusing on the development of algorithms that can be applied directly to a variety of languages and domains. This system is used in a variety of ways across Google products and services, helping to improve the user experience. Google is dealing with overall traditional natural language processing tasks, also showing a strong interest in algorithms that work efficiently in highly scalable and distributed environments including universal syntax and semantic algorithms that support more specialized systems.

Google's syntactic system predicts the morphological features of each word in a given sentence, such as tags of speech parts, gender, singular, plural, etc., and classifies them into subject, object, modifier, etc. Google is also focusing on efficient algorithms that use large amounts of unclassified data, and has recently introduced neural network technology. Google, on the other hand, has recently focused on improving text analysis by incorporating knowledge and information from a variety of sources, or applying frame semantics at the noun phrase, sentence and document level.

▲ Director Hadar Shemtov from Google Research Team

Hadar Shemtov, Director of Google Research Team, pointed out “mobile” as the driving force behind a change in the user environment, and said that more than half of today’s queries are being generated in the mobile environment. He introduced that, as a result, the search result requires an immediate "answer" rather than a "link," and the movement of interaction to “interactive” is noticeable. Recently, the core works of Google are to recognize input values made by speech, convert them into text, understand them, and output the result in a voice form.

The feature of the voice form query is that it is a longer form and is close to natural language. Sequential queries, which consist of conversational forms and refer to the elements of the previous question, have also been introduced as an important feature of voice form queries. Also, while the voice response technology corresponding to these queries is changing, the answer needs to be shorter and fluent at the user level. Likewise, Google has been focusing on two NLP elements: a way to take a long sentence and process it in short sentences, and a way to get high quality voice synthesis.

In order to present an “answer” that focuses on the answer, it is necessary to reconstruct the long question with the natural form into proper form in a short and effective way. At this point, Google searches through related documents to search for answers from long questions, and then down to paragraphs and sentences related to the answers in the document. And simply put out the relevant answers. As a result, since additional searches are being made in the document, it can be seen as "search in search".

The NLP system defines grammatical relations and groups between words in a sentence. What is important at this point is how to find the core of the sentence containing the desired answer simply. So, Google has grouped various words through the process, and then a single node value that is most likely to fit the context is figured out by using statistical processing through several examples and cases. In addition, through model construction that applied machine learning, it is possible to get the correct answer in grammatical terms as well as maintaining the essence of the sentence.

Moreover, in the method of reducing the sentence, there is a need to decide whether to keep or discard each word in the sentence. By classifying all the words in the sentence and modeling signature values and examples of several sentences, the sequence-to-sequence value using LSTM can be checked. Consequently, a simple sentence with only the core will be produced by eliminating unnecessary parts. In this way, NLP system can summarize the sentences through the operations and derive simple, accurate values that include only the core.

▲ WaveNET technology, multiple layers between input and output, improves quality by combining multiple elements.

In Google Assistance, the quality of voice output is very important as the assistance only uses voice-based interface. However, existing voice and text synthesis techniques used a method in which syllables are recorded separately, and then classified and re-combined when necessary, resulting in limitations in terms of quality. However, WaveNET, a probability-based new voice synthesis technology introduced by Google, uses digitized speech samples to acquire waveform information of speech, construct models and learn based on them. Then, the new text is applied with modeling, and finally high-quality results will be produced.

In respect of voice, WaveNET technology recognizes the linguistic characteristics after vocalization and textization based on the waveform information, and proceeds with the voice synthesis process through the constructed model. Then, based on this model, when a new text is given, it is combined with modeling and the existing linguistic characteristics to grasp the new phonetic form and produce new voice. In addition, this algorithm has several layers between various input data and output data, and various factors are combined together to improve the quality of the result.

He emphasized that although voice processing is a fairly expensive operation and cost, he did his calculations and was able to achieve a higher level of quality than traditional voice synthesis techniques. Moreover, in terms of "waveform", which is a morphological feature in the analog domain, by digitalizing it and charting sound wave through per-ms prediction method, it became possible to produce a sound output similar to the actual voice.

▲ Choe Hyunjeong, Lead of Google Computational Linguistics Team (NLU)

According to Choe Hyunjeong from Google Computational Linguistics Team, Google is making a lot of effort in internationalization, introducing an assistant in about 15 countries, although the devices presented for each country are different. Google also introduced an assistant available in Android in Korea. In addition, in order to quickly launch an assistant in many countries, 'scalability' is important to make it easier to expand into more languages, including building a solid system and taking full advantage of data-based machine learning.

About the process of globalization of assistant, Google enhances the entire language system by implementing the basic NLP system in English primarily and expanding to other languages after defining and designing functions to be implemented. Most of the systems that make up the assistant are using machine learning, and recently, the deep learning of the neural network model is also being used. In the model which is difficult to solve by the conventional rule-based machine learning such as voice synthesis, recognition, conversation model construction, etc.

For both machine learning and deep learning, data is important for learning, and high quality data collected for the purpose is essential. Moreover, since Google Assistant is a conversational model, there is a need to consider more points in data. The aspect changes depending on whether it is a conversation between human and human, or human and machine. The data also shows a different pattern for the domain, such as the difference between spoken and written words, search words, news and blog data. It is also mentioned that paralleling data in multiple languages is necessary for extensions to various languages.

▲ 'Implicit Mention Detector', which can make the omitted part fit to the context

Korean is one of the most difficult languages for data acquisition and modeling. In the case of English, the conversation between human and machine is not much different from the conversation between human and human, but Korean is different. In Korean dialogue, subjects and predicates are frequently omitted, and the difficulty of understanding the context is high. On top of that, there are various honorific expressions. Along with this, honorific forms are also diverse and complex, and there are subtleties of spacing and rhyme. Therefore, it is very difficult to understand and model these points in terms of machinery. So, Google is solving these difficulties with a knowledge-based model.

Google introduced that it uses the machine learning-based 'Implicit Mention Detector' for omitting sentence elements that are common in Korean conversation, recognizing omitted parts in the sentence, and constructing it as a complete sentence. The system finds and displays all predicates and restores implicitly hidden pronouns. In this case, all the subjects are came out as restored state, and all the words referring to an individual are grouped by using 'Co-Reference' model. Through this, a number of omitted subjects or object words are restored and are being trained.

In addition, when understanding human language, Google uses 'Query Matcher' for various expressions to understand similar meanings. It uses deep learning to understand various language systems by converting input values to vector values, grasping similar meanings through calculating distance at vector values, and finally grouping them in a single group. In addition to this, for the implementation of rhyme, Google is developing a model that can understood and implemented in a proper form in the modeling of phrases and rhyme.

Industry	Economy	TECH	GAME
Society	Comfort	AUTO	MEDIA