Ad Image

AI For All: How a More Cost-Effective Model and Hybrid Technology Transform AI Development and Deployment

Phison Technology’s Michael Wu offers insights on how a cost-effective model and hybrid technology transform AI development and deployment. This article originally appeared on Solutions Review’s Insight Jam, an enterprise IT community enabling the human conversation on AI.

Whether Chromebooks supplanting traditional laptop PCs in the education market, open-source software versus proprietary software for everyday applications, or cloud services versus conventional IT infrastructure for business processing, we have seen numerous times where more affordable, functionally acceptable options have upended incumbent technology approaches. We now see this again with a similar impact in the artificial intelligence (AI) industry.

The AI sector witnessed its disruptions recently: a significant breakthrough by the Chinese AI startup DeepSeek, as it unveiled its latest open-source large language model (LLM), DeepSeek-R1. It demonstrates exceptional reasoning capabilities, excelling in mathematics, coding, and natural language reasoning tasks, and in some areas, surpassing OpenAI’s o1 model. Developed at a reported cost of approximately $5.6 million with the stated use of 2,048 AI accelerators over a development period of less than two months, this presents a cost-effective and time-efficient alternative to other large-scale models.

This compares to the $100 million to 500 million dollars estimated to correlate with LLMs from giant incumbents typically. It offers an open-source solution under the MIT license, allowing global developers to freely use, adapt, and modify it, promoting artificial intelligence technology’s widespread adoption and advancement. As it is cost-disruptive to incumbent approaches, it will have long-term benefits by bringing down the price of AI and making AI more affordable to deploy by a broader swath of the market. This will, in turn, spawn further innovation in AI.

Whereas DeepSeek lowered the cost of AI development by employing older and less powerful GPUs and accomplishing the LLM training in record time, additional technologies make deploying AI infrastructure budget-friendly and secure.

Less expensive infrastructure approaches demand more resourcefulness, which often inspires fresh ingenuity. When the outcomes deliver comparable results to longstanding approaches, but at a more affordable price, then the market shifts to the innovators. Customers lead the adoption. Incumbents must evolve to continue to compete, accept lower profit margins or find less contested markets.

Specifically, GPU suppliers and High-Bandwidth Memory (HBM) vendors must create more affordable price/performance options . Otherwise, they risk replacement in existing applications or will need to find new applications with less price pressure.

Now, there is no longer the need for the most expensive GPUs to create performance-competitive LLMs. Furthermore, cloud service providers(CSPs) and Fortune 500 buyers of the most costly GPUs will reconsider which GPU offerings they procure and look for more optimal price/performance processors. They will even consider buying alternative processors, such as more AI-specific Neural Processing Units (NPUs) from other silicon suppliers.

Given the premium prices and the limited number of suppliers for HBM high-speed memory vendors, customers will look for new options based on affordable flash memory. If CSPs and Fortune 500 buyers go “down market” for their future GPU purchases, pursuing alternative fast-growing applications may prove the easiest path to recovering GPU revenue growth.

A key opportunity exists in AI at the edge of IoT and robotics applications and smartphones. Future GPU growth will be dependent on this shift towards AI at the edge with compound annual growth rates (CAGR) projected in the 20 to 30 percent range over the next decade.

Furthermore, with lower AI infrastructure costs, more buyers in the global 2000 class of customers can afford to take advantage of processing to enhance their product functionality and competitiveness. When outfitted with the cost-efficient memory capability and midrange GPUs, then AI PCs in the $3000-$4000 range can readily handle training of LLMs in the 1-8 billion parameter size, or AI servers in the $50,000-$100,000 price band can train LLMs of 405billion parameters and beyond. This is about 1/10 the cost of an all HBM solution running on the highest-end model GPUs.

Thus, the incumbent GPU and HBM vendors must offer more compelling price/performance to continue revenue growth to their traditional CSP and Fortune 500 customers and attract new buyers in the Global 2000. Pursuing the growing opportunity in edge computing will also unleash new applications and new buyers. To grow their opportunity even further, the GPU and HBM vendors should look to team up with cost-efficient providers in the AI infrastructure stack and bring more optimum prices to more quickly attract these Global 2000 and edge computing buyers.

At Phison, our engineers developed a hybrid hardware and software platform that integrates SSDs with GPUs, expanding memory capacity beyond the costly VRAM (i.e., HBM) on the GPU card by pooling it with affordable, high-NAND flash memory. This added memory capacity dramatically increases the size of the AI models that can be trained while significantly reducing costs compared to the all-VRAM approach.

The solution, called aiDAPTIV+, is targeted primarily as an on-premises infrastructure for training LLMs. Companies and public sector organizations that require their proprietary training data to remain private can rest assured that their sensitive data stays secure and compliant with data governance needs.

This not only enables cost-effective training of LLMs but also enhances inference performance. It does this by improving the user experience with faster time to the first token. It provides greater accuracy of results with greater context enabled by support for longer token lengths. The result is improved efficiency and effectiveness of AI applications in LLMs’ training and inference prompts.

The future of AI stands to be proven in real-world use cases within diverse industries. For instance, a single-card system in a desktop PC can serve as a machine for teaching LLM development, providing affordable AI education solutions for academic institutions worldwide. More richly configured workstations are suitable for various industries benefitting from on-premises deployments of LLMs. These include financial modeling, product development, and medical research applications.

Server-grade systems employing these cost-efficient solutions offer medium to large enterprises a way to build on-site AI infrastructure in a “closed loop” configuration of centralized LLM training with inferencing performed by client devices on the in-house network. Lastly, IoT and robotic devices will benefit from use at the edge, such as real-time applications in autonomous vehicles, healthcare monitoring, and industrial inspection systems, particularly when there is limited or no internet connectivity.

As we have seen in past technological shifts driven by new classes of affordability, like with open-source software and cloud services, AI is at the cusp of a similar shift driven by more cost-efficient innovations. Asset forth in Clay Christensen’s business leadership book, “The Innovator’s Dilemma”, miniature, initially unrecognized entrants, can place established market leaders. However, as also asserted in his follow-on book, “The Innovator’s Solution”, incumbents can survive and even thrive in technological change if they are willing to incorporate the disruptive technology or partner with disruptive vendors.

Share This

Related Posts