News | Meta Launches Two Clusters of Graphic Processing Units to Develop More Sophisticated Generative AI

Meta Launches Two Clusters of Graphic Processing Units to Develop More Sophisticated Generative AI

Published by: Insights Desk Released: Mar 13, 2024

Highlights:

According to Meta, the company will train newer, more potent AI systems, such as Llama 3, the anticipated replacement for Llama 2, and improve its current ones using the additional clusters.
The two clusters have distinct designs, despite having the same number of GPUs connected via 400 gigabyte-per-second endpoints.

Meta Platform Inc., commonly known as Meta launched two clusters of graphics processing units (GPUs), which are incredibly powerful. It is claimed that these will help train the next generation of generative artificial intelligence models, such as the emerging Llama 3.

The two 24,576-GPU data center scale clusters, according to Meta engineers Kevin Lee, Adi Gangidi, and Mathew Oldham, were constructed to accommodate far larger and more sophisticated generative AI models than those previously released, like the well-known open-source algorithm Llama 2, which competes with OpenAI’s ChatGPT and Google LLC’s Gemini. According to the engineers, they will also support ongoing AI research and development.

Thousands of Nvidia Corp.’s most powerful H100 GPUs are packed into each cluster, which is significantly larger than the company’s previous large clusters, which held about 16,000 Nvidia A100 GPUs.

The business has allegedly been busy purchasing thousands of Nvidia’s latest chips. A report recently claimed that the business has grown to be one of the chipmaker’s biggest clients.

According to Meta, the company will train newer, more potent AI systems, such as Llama 3, the anticipated replacement for Llama 2, and improve its current ones using the additional clusters. Although it was commonly assumed that Meta was working on Llama 3, this blog post is the first time the company has officially said that it is. The engineers stated that Llama 3 is still under development, but they did not indicate when an announcement may be made.

Longer term, Meta wants to develop artificial general intelligence (AGI) systems that, compared to current generative AI models, will be far more creatively human-like. It was said in the blog post that the new clusters will aid in scaling these goals. Furthermore, Meta disclosed that it is currently developing its PyTorch AI framework to prepare it for supporting higher GPU counts.

Beneath the Surface

The two clusters have distinct designs, despite having the same number of GPUs connected via 400 gigabyte-per-second endpoints. One of them is based on Arista Networks Inc.’s Arista 7800 with Wedge400 and Minipack2 OCP rack switches and offers remote direct memory access, or RDMA, via a converged Ethernet network fabric. The other is constructed with Nvidia’s proprietary Quantum2 InfiniBand network fabric technology.

Both clusters were constructed with the Grand Teton open GPU hardware platform from Meta, which is intended to handle high-demanding AI tasks. Grand Teton is said to have twice the computation and data network bandwidth, twice the power envelope, and four times the host-to-GPU bandwidth of its predecessor, the Zion-EX platform.

According to Meta, the clusters use its most recent Open Rack power and rack infrastructure architecture, which is intended to provide data center designers more design freedom. The engineers claim that Open Rack v3 offers additional configuration flexibility by allowing power shelves to be mounted anywhere inside the rack as opposed to fastened to the busbar.

Furthermore, the number of servers per rack can be adjusted, allowing for a more effective distribution of throughput capacity across the servers. Thus, according to Meta, it has been able to somewhat lower the total rack count.

The Linux-based Filesystem in Userspace application programming interface that the clusters employ for storage is supported by Meta’s distributed storage technology, Tectonic. Additionally, Meta collaborated with Hammerspace Inc., a startup, to develop a brand-new parallel network file system for the clusters.

Finally, the engineers clarified that the clusters use the company’s most cutting-edge E1.S solid-state drives and are built on the YV3 Sierra Point server architecture. The team reported that they implemented Nvidia’s Collective Communications Library, a collection of communication protocols tailored for the company’s GPUs, and that they altered the cluster’s network topology and routing architecture.

More GPUs on the Way

Meta stated that it is still totally dedicated to fostering open innovation within its AI hardware stack. The engineers informed that the business is a part of the newly formed AI Alliance, whose mission is to establish an open ecosystem that will improve transparency and trust in AI research and guarantee that everyone can profit from its advancements.

The engineers mentioned, “As we look to the future, we recognize that what worked yesterday or today may not be sufficient for tomorrow’s needs. That’s why we are constantly evaluating and improving every aspect of our infrastructure, from the physical and virtual layers to the software layer and beyond.”

Additionally, Meta disclosed that it plans to acquire over 350,000 H100 GPUs from Nvidia by the end of the year and would keep adding to its collection. It will employ these to keep expanding its AI infrastructure, and in the not-too-distant future, even more robust GPU clusters should become available.

don’t miss out: 4 business initiatives the c-sui...

unlock the full potential of ai-powered software d...

seeking greater operational efficiency in an infla...

mission-critical software: delivered...

why it struggles with innovation - and what to do ...

3 limitations to it innovation - and 3 ways to sol...

früher gewünscht, heute unerlässlich: der neue ...

the new standard in electronics design...

2024 cloud security report...

how to master ai demand forecasting in retail...

dora content hub...

market guide for cloud web application and api pro...

the eight components of api security...

pros enterprise ai for the industrial industries...

beyond the buzzword: ai’s real benefits to fashi...

kundenservice von morgen im mittelstand gb cs...

future of manufacturing od webinar intentsify...

customer experience (cx) trends...

landis+gyr ot case study intentsify...

operational technology management data sheet inten...

promising customer retention strategies aimed at b...

the rise of progressive web applications for busin...

the types of display advertising solutions and its...

what is cyber espionage? attacks jeopardizing busi...

how brand extensions can fuel explosive growth...

insights into the google pagerank algorithm...

why are businesses turning to enterprise content m...

menace of ping flood attacks: a growing network pe...

how chatbot marketing supports today’s business ...

what is domain-based message authentication, repor...

explore reasons & steps to stop social engineering...

a comprehensive guide on saas risk management...

what are the applications of swarm intelligence (s...

advanced adaptive ai bolsters business intelligenc...

a comprehensive guide on executive branding to hel...

steering away from social media marketing mistakes...

why is operational technology cybersecurity essent...

promising benefits of real-time payments to help b...

what is the internet of energy (ioe) & what are it...

what are web application firewalls (wafs)? robust ...

tsmc’s new chipmaking process shows power distri...

carv raises usd 10 m to build blockchain data laye...

dropzone ai cybersecurity funding reaches usd 16.8...

salesforce’s einstein copilot is now available w...

ibm announces acquisition of hashicorp inc. for us...

uk inquired microsoft and amazon ai partnerships o...

nvidia run:ai acquisition revolutionizes ai perfor...

microsoft open-sources pi-3 mini language model to...

perplexity ai secures usd 63m for generative ai se...

the potential hashicorp acquisition by ibm could b...

salesforce will not acquire informatica, a data ma...

zscaler threatlabz 2024 phishing report flags ai's...

hr software maker rippling people center funding a...

fpt nvidia strategic partnership for ai & cloud se...

red hat updates trusted software supply chain to b...

aptos labs partners with microsoft & sk telecom fo...

wiz inc. acquisition negotiations to buy lacework ...

armis inc. acquires silk security inc. in a deal w...

vorlon secures usd 15.7m to address third-party ap...

mistral ai to raise usd 5 b in valuation from the ...

14 interesting trends that affect innovation and t...

what is web hosting?...

data privacy best practices every business should ...

Meta Launches Two Clusters of Graphic Processing Units to Develop More Sophisticated Generative AI

Highlights:

Beneath the Surface

More GPUs on the Way

Insights Desk

Related posts

TSMC’s New Chipmaking Process Shows Power Distri...

Samsung Secures USD 6.4B CHIPS Act Funds for Texas...

Cyabra Exposes Intel Stock Manipulation Campaigns ...

Google Launches First Android 15 Beta with Enhance...

Google LLC Updates Android Studio Toolkit with its...

: Intel and Altera Debut Chips and FPGAs for AI En...

Viam’s Recent Funding Round Reaches USD 45M...

Firecell Raises USD 7.2 M to Develop 5G Solutions ...

Chip Interconnect Startup Eliyan Secures USD 60M F...

Microsoft Corp.’s Hardware Event to Debut New Ve...

Our Brands