Microsoft and Nvidia Unveil Enormous Language Model With 530B Parameters
Microsoft and Nvidia today have unveiled a new natural language model they claim to be larger and more powerful than any previous contender. The new Megatron-Turing Natural Language Generation (MT-NLP) merges elements from models developed by both companies and 530 billion parameters to break records for accuracy, reading comprehension, reasoning, and other aspects of natural language processing.
Parameters are how a language model is trained, so more parameters and more data as a whole make for a better tuned AI. The MT-NLG’s training set incorporated 270 billion tokens, text units, mostly from the 835GB collection known as the Pile created by open-source AI researchers at EleutherAI. Microsoft and Nvidia filtered the Pile and combined it with data from the Common Crawl’s collection of online data including social media and news stories. After training on 560 Nvidia DGX A100 servers packed with eight Nvidia A100 80GB GPUs a piece, MT-NLP came out ahead of any competition, according to the developers. The AI can infer questions beyond the initial statement and discern what is written down even if the letters and symbols are messy.
“We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters,” Microsoft and Nvidia Nvidia senior director Paresh Kharya and Microsoft Turing group program manager Ali Alvi explained in a blog post. ” It is the result of a joint effort between Microsoft and NVIDIA to advance the state of the art in AI for natural language generation. As the successor to Turing NLG 17B and Megatron-LM, MT-NLG has 3x the number of parameters compared to the existing largest model of this type and demonstrates unmatched accuracy in a broad set of natural language tasks.”
Sizing Up GPT-3
The model that MT-NLP claims to triple the parameters over is GPT-3 and its 175 billion parameters. Microsoft acquired exclusive licensing rights to GPT-3 over a year ago, but it’s not the only AI project under development, as the Nvidia partnership indicates. GPT-3’s scale and design make it ideal for some conversational AI and reduces the time and effort needed to train it. The attraction to Microsoft is how GPT-3 can augment its commercial offerings, prompting Microsoft to invest $1 billion in OpenAI to develop new ideas. The MT-NLP model is impressive but doesn’t seem to be ready for commercial implementation. That could change soon, as AI development continues to accelerate, according to Nvidia and Microsoft.
“We live in a time where AI advancements are far outpacing Moore’s law. We continue to see more computation power being made available with newer generations of GPUs, interconnected at lightning speeds. At the same time, we continue to see hyperscaling of AI models leading to better performance, with seemingly no end in sight,” Kharya and Alvi wrote. “The quality and results that we have obtained today are a big step forward in the journey towards unlocking the full promise of AI in natural language. The innovations of DeepSpeed and Megatron-LM will benefit existing and future AI model development and make large AI models cheaper and faster to train.”