Amazon and SK Telecom Are Working on a Korean Natural Language Processor
Amazon Web Services is working with South Korean tech giant SK Telecom to build a Korean natural language processor for voice assistants, chatbots, and other products. The new Korean language Generative Pre-trained Transformer-2 model, known as KoGPT-2, improves machine learning in Korean, a language known for being challenging to incorporate into AI platforms.
To build a Korean natural language processor, people from the Amazon Machine Learning Solutions Lab at AWS teamed up with the Conversational AI Team from the SK telecom AI Center. Amazon and SK Telecom’s KoGPT-2 is built from the Generative Pre-trained Transformer-2 (GPT-2) model, a platform released last year by OpenAI. GPT-2 is used to train AI to predict how someone will finish writing a sentence or piece of writing. AWS compared it to an advanced version of the text prediction system on a smartphone. Researchers fed the KoGPT-2 model 125 million sentences, more than 1.6 billion words in Korean, generating a pre-trained open-source model for Korean chatbots and voice assistants.
“We needed a lot of computing power to train the model,” said Amazon senior data scientist Muhyun Kim in a news release. “we used 64 GPUs (graphics processing units) for one week. Before that, though, we did a lot of experimentation to find the right configuration for analyzing the data and to troubleshoot possible errors. Without human expertise, however, nothing can happen. Our experience helped us work with SK telecom to refine their models and speed up training.”
Korean, spoken by around 80 million people in the world, is unrelated to any other language spoken today. That’s made it notorious as a tough language to teach virtual assistants. That’s limited the availability of such AI to domestic creations. Google began selling Google Home and Google Home Mini smart speakers in South Korea in the fall, right after adding Korean to Google Assistant’s language options. Adoption of Google Assistant devices reportedly been slow despite the widespread use of the Android operating system in South Korea. Amazon’s Alexa voice assistant doesn’t speak Korean at all, and Echo smart speakers are not available in the country. Any Alexa devices in South Korea are not localized and use English.
SK Telecom plans to use KoGPT-2 to make its AI more human-like when conversing with people. That AWS was involved might suggest the model will be used by Alexa, but that’s not necessarily true. Amazon’s work in one arena doesn’t always mean they will be used by other branches of the company. Also, the open-source nature of the model doesn’t fit the proprietary approach to Alexa’s other language modules. The benefit is more to AWS specifically, as that’s what SK Telecom uses for much of its cloud computing.
“We wanted to help scale out SK telecom’s burgeoning natural language efforts by training the state of the art KoGPT-2 model,” SK Telecom conversational AI leader Kim Tae Yoon said. “Open source and contribution back to the growing Korean NLP community are core values of our team, so it was only natural to open source this mode.”