“The updated models and open weights in Lamar 3.1 give enterprises an option to avoid and reduce the use of closed, proprietary LLMs, but they will also create confusion for solution providers who build and sell LLMs,” said Tobias Zwingmann, partner at AI prototyping service platform Rapyd.AI.
Benefits for Businesses and Developers
According to experts, the advantages of the Rama 3.1 model for enterprises and developers lie in its open weights and performance efficiency in benchmark tests. “By providing language models with open weights, enterprises can build customized AI solutions for their use cases without the huge licensing costs associated with using proprietary models,” said Paul Nashawati, Principal Analyst at Pugh Group. “Enterprises benefit from reduced costs and increased flexibility, while developers can accelerate innovation by leveraging a solid foundation.”
IDC research vice president Anal Dayaratna said enterprises can fine-tune Rama 3.1’s open model using their own proprietary data without worrying about their data and intellectual property being shared with other vendors, avoiding vendor lock-in.
According to Bradley Simmin, senior analyst at Omdia, the massive parameter count of 450 billion is also important to enterprises, as they can now access free LLMs that match the performance efficiency and intelligence of models like GPT-4, Gemini, and Claude.
In a blog post, Meta stated that the 405B Rama 3.1 model outperformed models like Nemotron-4 340B Instruct, GPT-4, Claude 3.5 Sonnet on benchmark tests like MMLU, MATH, GSM8K, and ARC challenges. Rama 3.1 performed similarly to GPT-4o on these tests. For reference, on the MMLU benchmark, GPT-4o scored 88.7 while Rama 3.1 405B scored 88.6. MMLU, MATH, GSM8K, and ARC challenges are benchmarks that test LLMs in the areas of general intelligence, mathematics, and reasoning.
Meta also said that the smaller Llama 3.1 models, the 8B and 70B, have been updated with larger context windows and multiple language support, performing as well or close to the proprietary LLM in benchmark tests.
In early April, Meta released its predecessors, the Rama III 8B and Rama III 70B models, which feature improved architectures over Rama II, including a dedicated transformer architecture for the standard decoder, improved techniques such as Grouped Query Attention (GQA), and higher-quality training datasets.
Anton McGonnell, director of product at SambaNova Systems, a specialist in generative AI platforms, said the 405B version of Lama 3.1 could deliver better accuracy for general-purpose tasks, allowing enterprises to accelerate improvements in employee and customer use cases.
“We expect developers to use techniques like speculative decoding, where less complex models handle the bulk of the processing and then call in larger models to verify their work and correct errors when needed,” McGonnell said, adding that this could be an efficient way to run AI models, opening up new avenues for optimizing compute resources and speeding up responsiveness in real-time applications.
IDC’s Dayaratna also noted that the Rama 3.1 405B model can perform knowledge distillation—the transfer of knowledge from larger models to smaller ones—in addition to synthetic data generation. This capability, Dayaratna added, allows enterprises to run additional analytics workstreams.
Still expensive LLM distribution costs
Experts say that while Rama 3.1 is more intelligent than previous versions, deploying this model may be too expensive for small and medium-sized businesses.
Dylan Patel, principal analyst at semiconductor research firm Semianalysis, noted that the Llama 3.1 405B version is very expensive to run because it requires two Nvidia H100 servers (with 16 GPUs) to run the unmodified model. Previous models required one low-end GPU (Llama 8B) or two high-end GPUs (Llama 70B).
“It would be prohibitively expensive for a small business to build their own Rama 3.1 405B, as leasing two H100 servers for a year would cost upwards of $300,000,” Patel explained.
The reason for the increased infrastructure cost is that the model parameters that provide more accurate results have increased. Typically, all LLM vendors release several variants of models so that companies can choose between latency and accuracy depending on their use cases. Models with more parameters can be more accurate, but models with fewer parameters require less computation and take less time to respond, so they are less expensive to run.
However, Patel notes that most large enterprises will likely use the Rama 3.1 405B model to fine-tune and train other models, or for production use cases such as chatbots. “Large enterprises may find the 405B model to be not that expensive to run, given the intelligence and usability it provides,” Patel said.
There are also other ways to reduce the cost of running larger models. Several large cloud service providers and other model service providers are already working to provide new models to enterprises.
“Most enterprises will rely on cloud APIs to leverage Rama 3.1 405B. All the major cloud providers offer this model,” Patel explained. Using APIs allows enterprises to access the compute resources they need on a pay-as-you-go basis, reducing their initial investment.
Meta announced partnerships with Accenture, AWS, AMD, Anyscale, Cloudflare, Databricks, Dell, Deloitte, Fireworks.ai, Google Cloud, Gruuk, HuggingFace, IBM Watson X, Infosys, Intel, Kaggle, Microsoft Azure, NVIDIA DGX Cloud, OctoAI, Oracle Cloud, PwC, Reflate, Sabam AI, Scale AI, Snowflake, TogetherAI, and UC Berkeley vLLM Project to make the Llama 3.1 model easier and more convenient to use.
While cloud service providers like AWS and Oracle offer cutting-edge models, partners like Grok, Dell, and NVIDIA are helping developers use synthetic data generation and advanced search augmentation generation (RAG) technologies. Mehta added that Grok has optimized low-latency inference for cloud deployments, and Dell has achieved similar optimizations for on-premises systems.
Other large models such as Claude, Gemini, and GPT-4o are also available via API.
McGonnell also predicted that the release of Rama 3.1 will spark competition among AI cloud service providers and model service providers to provide the most efficient and cost-effective API solutions for Rama 3.1 405B deployments. According to Patel, Meta’s latest model propagation partners, TogetherAI and PowerWorks, are delivering innovative inference optimizations that can significantly reduce costs.
Impact on competing LLM providers
Experts say the launch of open LLMs that perform as well or better than closed, proprietary LLMs will pose a significant challenge to competing LLM providers, regardless of their size. “Coheer, Aleph Alpha, and similar startups developing proprietary LLMs will either not exist in the next year or will survive in a much smaller, niche, more expensive form,” says Rapyd.AI’s Zwingman. “It’s like betting on Solaris when the rest of the world is focused on Windows, Mac, and Linux.”
McGonnell also noted that as LLM begins to become commoditized due to its open nature, proprietary vendors like OpenAI will have to compete by either reducing costs or improving performance. “The fact that OpenAI has started releasing cheaper versions of GPT-4 suggests that they are focused on reducing costs,” McGonnell said.
Additionally, just 24 hours after Meta released the Rama 3.1 update, OpenAI announced via its now-rebranded Twitter account that it would be launching a free tier that would allow customers to customize the GPT-4o Mini model.
Rapyd.AI’s Zwingman believes this competition between open and proprietary LLMs will be beneficial for businesses, explaining, “We can expect to see LLM token costs come down even further. No one is brave enough to charge significantly more than the market average anymore.” Tokens are a measure of the amount of text that LLM APIs process when analyzing a user’s query.
According to Omdia’s Simmin, the Rama model family already dominates direct and large-scale LLMs, including proprietary LLMs from Google, Antropic, and OpenAI. Omdia’s research is based on data collected from job postings seeking skills to work in Meta’s LLM product line.
Open weights and open source
Meta and CEO Mark Zuckerberg describe the latest Llama model as open source, but some analysts disagree, arguing that Meta’s model is not truly open source as defined by the Open Source Initiative.
“All of the llama models are not actually open source, as you might see with MIT or Apache licensed software,” explains Omdia’s Simmin. “Rather, I would say it’s an open and permissive community license that provides everything an AI practitioner needs to build AI artifacts for commercial use.”
They also pointed out that while Meta provides model weights for all LLMs, it does not provide full transparency into the data used to pre-train the LLMs. The biggest problem, according to experts, is that there is currently no definition of what an open-source LLM is or should be.
editor@itworld.co.kr
Source: www.itworld.co.kr