“Smaller is Better” Why the Size of Generative AI Models Matters

As businesses look to adopt generative AI tools and platforms to increase efficiency and productivity, concerns about the high cost and complexity of AI technology are growing.

In general, the foundation of generative AI and AI is language models, the algorithms and neural networks that power chatbots such as OpenAI’s ChatGPT and Google’s Bard. The most popular and widely used models today are known as large language models (LLMs).

ⓒ Getty Images Bank

LLM is characterized by its massive scale. The technology is connected to a large amount of diverse information, and the models contain billions, sometimes trillions, of parameters (or variables), making them imprecise and unspecific for domain tasks or vertical industries.

This is why small language models (SLMs) are gaining attention, and some even say that they have already become mainstream enterprise technologies. SLMs are designed for simple tasks and are easier for resource-constrained enterprises to access and use. In particular, they are inherently more secure because they exist in a completely self-managing environment, can be fine-tuned to specific domains and data security, and are less expensive to run than LLMs.

Ritu Jyoti, group vice president of AI research at IDC, explains that SLM is suitable for enterprises that want to build applications that can run locally on devices rather than in the cloud, and “where the task does not require extensive inference or requires fast responses.” Conversely, LLM is better suited for applications that require orchestration of complex tasks involving advanced inference, data analysis, and better understanding of context.

Because SLM can be built from scratch using open-source AI frameworks, businesses can create highly customized AI tools for any purpose without needing permission from anyone, study how the system works, examine its components, and modify the system to suit their purposes, including changing its output.

Open source with free custom implementation

SLMs are seeing increasing adoption because they can be fine-tuned or tailored to specific tasks, and “sometimes perform as well as much larger LLMs on narrow tasks,” says Dhiraj Nambiar, CEO of AI prototype developer Newtuple Technologies.

For example, there is SLM, which “performs very well” on OCR-type tasks or text-to-SQL tasks. “Some of the open source ones show similar performance to LLM,” Nambiar added.

IDC’s Joti also says that the most popular SLMs today are actually open source. The major open source SLMs are:

  • Meta Rama 3
  • Microsoft Pie-3
  • Google Gemma
  • Mitral AI Mixtral8x7B
  • Apple OpenELM

The most popular commercial SLMs are:

  • Deep Seek AI Coder_
  • Microsoft Pie-2
  • Microsoft Orca-2

“These models are typically used within a specific organization or are provided as part of a commercial service, providing advanced capabilities while maintaining control over deployment and usage,” Joti explained.

AI models infer from inputs the outputs they will produce, such as predictions, content, recommendations, or decisions that can affect physical or virtual environments. Different AI systems have different levels of autonomy and adaptability after deployment.

In simple terms, SLM is a lightweight generative AI model. “Small” here refers to the size of the model’s neural network, the number of parameters, and the amount of data it is trained on, according to Rosemary Thomas, a senior technical fellow at Version 1’s AI Lab, a management consulting and software development firm. Thomas says that while some SLM implementations require significant compute and memory resources, there are SLMs that run on a single GPU and have as many as 5 billion parameters, including Google’s Gemini Nano, Microsoft’s Orca-2-7b and Orca-2-13b, and Mehta’s Rama-2-13b.

The need for more efficient models and faster training and setup speeds are driving the adoption of SLMs. “SLMs are gaining popularity due to practical considerations such as compute resources, training time, and specific application requirements,” Thomas explains. “Over the past few years, SLMs have become increasingly relevant, especially in scenarios where sustainability and efficiency are critical.”

The main difference compared to LLM is scale. Large-scale models are trained on massive data from various sources, allowing them to capture a wide range of language patterns, while SLMs are trained on smaller, often proprietary, small datasets. This allows for faster training and inference times.

LLM also requires more computational resources and longer training times, which makes SLM a more viable choice for applications that are resource-constrained or require fast implementations. According to Thomas, LLM shines on tasks such as content generation, language translation, and complex query understanding, but smaller models can achieve similar performance if fine-tuned properly. “SLM is particularly efficient for domain-specific tasks because of its small size and fast inference times,” Thomas added.

Choose between building your own or purchasing a service

Businesses looking to build their own AI models from scratch using open source frameworks should understand that fine-tuning existing models can be costly and time-consuming.

“There are several ways to build your own AI model, either from scratch or by fine-tuning an existing open-source model,” Nambiar said. “The former requires sophisticated configurations of GPUs, TPUs, access to a lot of data, and a huge amount of expertise. The software and hardware stacks needed for this are available, but the biggest hurdle will be the remaining components.”

Again, “For domain-specific use cases, it is best to ‘tweak’ an existing SLM or LLM rather than building one from scratch. There are many open source SLMs available, many of which have very permissive licenses. This is currently the way to build your own model. This applies broadly to all Transformer models,” he explained.

Andrew Brown, Red Hat’s senior vice president and chief revenue officer, says an SLM strategy shouldn’t be an all-or-nothing strategy. For one thing, training a single general-purpose AI model takes a lot of resources. “Some of the largest models can require 10,000 GPUs, and those models may already be outdated,” Brown says. “In fact, research suggests that by 2026, the cost of AI training will be $22 trillion, the equivalent of the U.S. GDP,” he says. “The average CIO doesn’t have an IT budget the size of the U.S. GDP, and they don’t have thousands of GPUs to spare. So what’s the answer? Specialized, small-scale AI models powered by open-source innovation.”

One of the big challenges in comparing costs across AI service providers is that they use different terminology—OpenAI uses tokens, Google uses characters, and Cohere mixes up “generations,” “classifications,” and “summary units,” says Nambiar, who is building AI for business automation. Nambiar decided to use “price per thousand tokens” to evaluate the different prices.

Tailoring an LLM to business objectives means that companies rely on AI service providers to host the infrastructure. Nambiar said companies should plan for a two- to four-month project that takes into account both infrastructure and staff, and costs typically start at $50,000 or more.

If a company is hosting an open-source model, fine-tuning an SLM is typically more expensive, as it requires not only infrastructure with GPUs or TPUs, but also effort in fine-tuning and specialized personnel. “Assume it is more expensive than an LLM,” Nambiar added.

Delivering reliable results with clean data

Whether you build your own or use a cloud-based SLM, data quality is critical to accuracy. Like LLMs, small models can still fall victim to hallucinations, which often occur when AI models produce incorrect or misleading information due to flaws in the training data or algorithms. However, these models are easier to fine-tune and are more likely to be based on proprietary data from the company.

Like LLM, augmented generation (RAG) techniques can tailor models to improve the accuracy of responses and reduce the possibility of hallucination.

At the same time, SLM is less likely to capture the broad linguistic patterns that LLM does because of its smaller data set and size, which can make it less efficient. Additionally, while SLM can be fine-tuned for specific tasks, LLM tends to outperform more complex and poorly defined queries because it can fetch vast amounts of data. “In short, SLM provides a more efficient and cost-effective alternative for specific domains and tasks, especially when fine-tuned to exploit its full potential,” Thomas emphasized.

Adam Kentoshi, North American Field CTO at Digital.ai, emphasized the importance of SLM in cleaning data and fine-tuning data storage to improve performance and sustainability, and reduce business risk and bias. Many AI initiatives, Kentoshi said, are falling into the “abyss of disillusionment” that could be avoided by addressing data quality issues.

Gartner predicts that by 2028, more than 50% of enterprises building LLMs from scratch will abandon their efforts due to the cost, complexity, and technical debt of implementation. “One of the big challenges that existing customers continue to face is the heterogeneity of data sources, even in the common areas of software development,” Kentoshi explains. “For example, most enterprises have more than one agile planning solution. There is also little consistency when it comes to software releases. So data preprocessing is critical, and it’s something that many enterprises have not done well.”

Obtaining well-curated, domain-specific data suitable for fine-tuning models is never easy, Nambiar notes, noting that “Transformer models require a specific kind of instantaneous response pair data that is difficult to procure.”

Furthermore, once a company decides to fine-tune its own SLM, it must continually invest in keeping up with the benchmarks provided by the latest models. “As each new SLM model is released, the bar for inference capability is raised, so if you build your own fine-tuned SLM, you also need to increase the inference capability of that model, or else there will be no more use cases for the model,” Nambiar said.

Open-source AI models are no longer uncommon, with Meta opening up its llama model earlier this year. “This is great news for enterprises because it offers many benefits, including avoiding vendor lock-in, a broad partner ecosystem, and cost-effectiveness,” Brown said. “Unfortunately, none of this is going to work if you don’t have the data scientists to use the model.”

Brown likens data scientists to today’s unicorns, rare and often mythical creatures that command a salary comparable to that of a mythical creature. Most companies can only afford to hire a handful of data scientists at best, either because they lack the talent or because they are expensive to hire. This, Brown adds, “creates a bottleneck in effectively training and tuning models.”

AI is a hybrid workload

Brown points out that CIOs have been moving away from monolithic technologies for a long time, starting with the transition from Unix to Linux in the early 2000s. He also believes AI is at a similar turning point, arguing that a hybrid strategy, similar to hybrid cloud, is best for deploying AI models. While the somewhat amorphous LLM is currently gaining attention, the future IT environment will be 50% applications and 50% SLM.

“Data is everywhere—on-premises, in the cloud, at the edge. So data is inherently hybrid, and AI has to run where the data is, so AI has to be hybrid,” Brown said. “In fact, we tell our customers and partners that AI is the ultimate hybrid workload.”

He also said, “Basically, CIOs will have as many AI models as they have applications. That means they need to learn faster, tune faster, and reduce costs. The key to this challenge is open source,” emphasizing that “just as open source democratized computing, it will do the same for AI, and it is already doing so.”
editor@itworld.co.kr

Source: www.itworld.co.kr