Microsoft Builds The Largest Language Generation Model With 17 Billion Parameters

  • Microsoft introduces the Turing Natural Language Generation, the world’s largest model with 17 billion parameters.
  • It generates abstractive summaries of text documents, direct answers to questions, and words to complete sentences. 
  • The model responds as accurately, directly, and fluently as humans can in different situations.

Large scale deep learning language models (like GPT-2 and BERT), with billions of parameters trained on all the text available on the internet, have enhanced various natural language processing (NLP) tasks, such as document understanding, conversational agents, and question answering.

It has been observed that larger models with more diverse and comprehensive pretraining data perform better, even with fewer training samples. Thus, it’s more efficient to train a massive centralized model and share its features across different tasks instead of training a new model for each task individually.

Following this trend, researchers at Microsoft have introduced Turing Natural Language Generation (T-NLG), the world’s largest model with 17 billion parameters. It outperforms existing start-of-the-art models on different language modeling benchmarks.

T-NLG can generate words to complete unfinished sentences, summaries of input documents, and direct answers to questions. Unlike other NLP systems that rely on extracting content from documents to create a summary or answer questions, the new generative model responds as accurately, directly, and fluently as humans can in different situations.

Instead of copying passage, T-NLG directly answers the question with a complete sentence. 

Training T-NLG

Since one GPU (even with 32 GB memory) cannot process billions of parameters, you need to parallelize the model itself or break it into slices to train it across multiple GPUs.

In this study, researchers leveraged the NVIDIA DGX-2 hardware setup (to make communication between GPUs faster) and tensor slicing (to break the model across 4 NVIDIA V100 GPUs). Using the DeepSpeed library and Zero optimizer, they were able to train T-NLG very efficiently with fewer GPUs.

Performance against standard tasks 

They then compared the performance of the pre-trained T-NLG against other powerful transformer language models on two standard tasks: LAMBADA next word prediction accuracy (higher is better) and Wikitext-103 perplexity (lower is better). In both cases, T-NLG performed better.

Reference: Microsoft | GitHub 

Performance in question answering 

To test qualities like grammatical correctness and factual correctness, researchers sought help from human annotators. They compared the new model with the LSTM model (similar to CopyNet).

Performance in active summarization 

T-NLG can write human-like abstractive summaries for a variety of text documents (including Word documents, blog posts, emails, PowerPoint presentations, and even Excel sheets), but how good it is, compared to other existing NLP models.

To make the new model more versatile so that it can summarize all kinds of text, researchers trained it on publicly available summarization datasets. They then compared it with another large transformer-based language model named PEGASUS and its previous version. This time, they reported the ROUGE score – a set of metrics used for evaluating automatic summarization in natural language processing.

Applications

Microsoft has achieved a breakthrough in conversational artificial intelligence. In the coming years, they will integrate T-NLG in the Microsoft Office suite, which will not only save users time by summarizing emails and documents but also offer writing assistance and answer questions that readers may ask about the content.

Read: Microsoft Builds A Completely Automated DNA Data Storage

Moreover, the findings pave the way for more accurate, fluent digital assistants and chatbots, helping businesses with sales and customer relationship management.

Written by
Varun Kumar

I am a professional technology and business research analyst with more than a decade of experience in the field. My main areas of expertise include software technologies, business strategies, competitive analysis, and staying up-to-date with market trends.

I hold a Master's degree in computer science from GGSIPU University. If you'd like to learn more about my latest projects and insights, please don't hesitate to reach out to me via email at [email protected].

View all articles
Leave a reply