News | Microsoft Open-sources Pi-3 Mini Language Model to Outpace Meta’s Llama 2

Microsoft Open-sources Pi-3 Mini Language Model to Outpace Meta’s Llama 2

Published by: Insights Desk Released: Apr 24, 2024 Source: DemandTalk

Highlights:

The popular LLM series Llama 2, created by Meta Platforms Inc., served as the model for Pi-3 Mini, which Microsoft researchers designed.
Microsoft claims that the Pi-3 Mini outperformed Llama 2 in the 16,000-question MMLU neural network evaluation test covering hundreds of disciplines.

Microsoft open-sources Pi-3 Mini, a small language model featuring 3.8 billion parameters that can surpass neural networks over ten times its size.

According to the business, the Pi-3 Mini is small enough to function on the 2022 iPhone. On the other hand, even with the most sophisticated data center graphics card, the most significant large language models available are frequently too intricate to fit.

The decoder-only Transformer architecture, a popular language model, serves as the foundation for the Pi-3 Mini. One kind of neural network that analyzes a word’s meaning by looking at its context is called a Transformer. These models usually approach the task by examining the text preceding and after the target word.

The decoder-only Transformer is one version of the architecture that makes judgments with less contextual knowledge. It simply examines the prose that comes before a word, not the text that comes before or after it. Compared to typical Transformer models, decoder-only models frequently perform better at text production jobs and require less hardware to operate.

The popular LLM series Llama 2, created by Meta Platforms Inc., served as the model for Pi-3 Mini, which Microsoft researchers designed. The researchers repurposed the tokenizer from Llama 2, which converts text into a format that language models can comprehend more readily. Because of their comparable designs, Llama 2’s open-source tools can be utilized with the Pi-3 Mini.

The underlying architecture is not the reason behind Pi-3 Mini outperforming significant LLMs. Instead, “the innovation lies entirely in our dataset for training,” said researchers at Microsoft who developed the model.

The dataset is an enlarged version of the database that the business utilized to create Pi-2, a small language model from a prior generation. The dataset for the Pi-3 Mini contains 33 million tokens of data. A token is a data point consisting of a few letters or digits.

The web’s highly filtered data was used to train the Pi-3 Mini. Microsoft claims that its researchers only used data that could improve the reasoning abilities of the model. They eliminated everything else from the dataset, including webpages that offered some helpful information but not enough to optimize the process of artificial intelligence learning.

In two stages, Microsoft trained the Pi-3 Mini. Initially, it gave the model access to the filtered dataset that its researchers had taken from the public domain. After that, it received synthetic data—that is, training data produced by an AI—along with an even larger subset of the dataset from the initial training phase.

Microsoft compared the Pi-3 Mini’s performance against two more extensive open-source language models. One of the benchmarks was a version of Meta’s Llama 2 with 70 billion parameters. Microsoft claims that the Pi-3 Mini outperformed Llama 2 in the 16,000-question MMLU neural network evaluation test covering hundreds of disciplines.

Despite using a lot less hardware than Meta’s model, the Pi-3 Mini was still able to perform better. Microsoft researchers were able to run the model on an iPhone 14 during testing.

The researchers also provided a sneak peek at two larger, as-yet-unreleased variants of the model in the publication describing Pi-3 Mini. Seven billion and 14 billion parameters are included. The two models outperformed Pi-3 Mini in the MMLU test, scoring six percent and nine percent higher, respectively.

pros enterprise ai for the industrial industries (...

unlocking ai’s potential: challenges and opportu...

transforming procurement with ai: opportunities, c...

ai, automation, and the strategic cao...

an introduction to ai in customer service...

5 ways ai can transform your customer experience...

ciso guide to generative ai attacks...

10 reasons to hire a customer-led voice assistant...

10 reasons to hire a customer-led voice assistant...

the definitive buying guide for contact center her...

cfo's guide to ai...

discover the future of business innovation with ge...

preparing for the future of cx by harnessing the p...

tableau gpt: innovate for the future with generati...

2023 mid-year cyber security report...

chatgpt security risks: a guide for cyber security...

empowering ai systems with scalable and resilient ...

leveraging ai to cut costs: use cases, examples, ...

consumers expect ai to radically transform service...

intelligence report: the latest on ai infrastructu...

how chatbot marketing supports today’s business ...

advanced adaptive ai bolsters business intelligenc...

the dynamic impact of ai in procurement...

ai in customer service – revealing common applic...

how to use dall-e for marketing success...

rpa vs ai: a comparative analysis for business aut...

maximizing business efficiency through ai integrat...

7 trendiest ai marketing campaigns igniting commer...

liquid neural network unveiling the fluid intellig...

the art of prompt engineering in general & marketi...

what is amazon bedrock?...

decode data like never before: chatgpt for data an...

workforce planning models –the power of ai skil...

black friday and the impact of ai in e-commerce...

how digital brain is a game changer for business s...

how ai chips are the driving force of modern techn...

exploring ai chip types, benefits, comparative an...

choosing between edge ai and cloud ai for business...

exploring haptic technology: applications and scop...

unveiling federated learning in the ai landscape...

datarobot updates generative ai with intervention ...

coreweave raises usd 1.1 b in funding to expand it...

anthropic pbc unveiled claude team, a subscription...

mongodb’s ai applications initiative shapes next...

ai chip manufacturing startup, blaize inc. raises ...

nist launches nist genai for detecting ai-generate...

github copilot workspace to transform development ...

salesforce’s einstein copilot is now available w...

uk inquired microsoft and amazon ai partnerships o...

nvidia run:ai acquisition revolutionizes ai perfor...

microsoft open-sources pi-3 mini language model to...

perplexity ai secures usd 63m for generative ai se...

fpt nvidia strategic partnership for ai cloud ser...

mistral ai to raise usd 5 b in valuation from the ...

microsoft invests usd 1.5 b in g42, an ai enterpri...

document ai startup upstage raises usd 72 m for gl...

rivos’s start-up capital raises usd 250m in fun...

openai plans new office in tokyo and gpt-4 new ver...

elon musk-led xai corp. launches first multimodal ...

mistralai launches mixtral 8x22b, a leading open-s...

role of machine learning in networking...

Microsoft Open-sources Pi-3 Mini Language Model to Outpace Meta’s Llama 2

Highlights:

Insights Desk

Related posts

DataRobot Updates Generative AI with Intervention ...

CoreWeave Raises USD 1.1 B in Funding to Expand it...

Anthropic PBC Unveiled Claude Team, a Subscription...

MongoDB’s AI Applications Initiative Shapes Next...

AI Chip Manufacturing Startup, Blaize Inc. Raises ...

NIST Launches NIST GenAI for Detecting AI-generate...

GitHub Copilot Workspace to Transform Development ...

Salesforce’s Einstein Copilot is Now Available w...

UK Inquired Microsoft and Amazon AI Partnerships o...

Nvidia Run:ai Acquisition Revolutionizes AI Perfor...

Our Brands