A GPU farm is a system that processes large amounts of data by connecting multiple graphics processing units (GPUs), which are the core of generative AI operations. Building an ultra-high-performance GPU farm is not just about GPUs. It is also essential to design a network that guarantees fast transmission and a storage system that stably stores and optimizes data.
In this situation, Jeon Sang-ho, Managing Director of IBM Korea, introduced a method of designing a storage system when building a GPU farm for AI at the ‘TNBT Korea 2024’ event hosted by Foundry Korea on June 19. This IBM announcement was made in conjunction with Kolon Benit, the domestic distributor. Kolon Benit (CEO Kang I-goo), an IT service specialized company of Kolon Group, recently launched the ‘AI Alliance’ with about 53 companies specializing in AI technology, solutions, and infrastructure, thereby expanding its position in the AI market and building sustainable competitiveness.
NVIDIA: “The Core of GPU Farm Storage is Performance, Integration, and Cost”
According to the ‘DGX SuperPod’ white paper released by Nvidia, which is currently leading the AI and GPU market, in 2020 and 2023, storage considerations when building a GPU farm can be largely compressed into three. The first consideration is the ‘performance’ of storage that does not cause delay in machine learning/deep learning (ML/DL), the second is ‘integration’ with other systems to automate the data collection, analysis, storage, and distribution management process, and the third is ‘cost’ that is balanced with performance and capacity.
Managing Director Jeon Sang-ho said, “The reason we use GPU farms is because we need high performance. As we use high computing power, it is important to prevent the power from going idle. To do this, we need high-performance storage because the supply of data must be continuous.”
“IBM SSS 6000, Single File System Expands to 8YB”
IBM, which is introducing various storage scale system (SSS) product lines, launched the SSS 6000, a GPU farm-only platform, in January. IBM Storage Scale System, which currently has a market share of 70-80% in Korea, is used in weather and air quality forecasts, earthquake, volcano, and forest fire prevention systems, and ground station systems for domestically launched satellites, as well as HPC centers of large corporations.
The SSS 6000 is a device that adds a file system and network services to IBM’s FS9500 high-end SAN storage controller. The controller bay supports 48 NVMe disks, and when 102 NL-SAS disks are installed in 9 stages on one building block, the maximum available capacity exceeds 12PB (petabytes). In addition, up to 8,192 types can be theoretically expanded in parallel, allowing a single file system to expand to 8YB (yottabytes).
Mr. Jeon Sang-ho, Executive Director, introduced the SSS 6000 system in terms of performance, integration, and cost, which the NVIDIA white paper emphasized as storage considerations. First, in terms of performance, the SSS 6000 showed an effective read performance of 310GB/s per array when reading files through an actual application. Mr. Jeon Sang-ho, Executive Director, explained, “Considering that the performance of the second-best device among commercially available products is 115GB/s, the performance is a big advantage. Since the SSS 6000 is a scale-out device, the corresponding speed is when configured as a single array, 600GB/s for two arrays, 900GB/s for three arrays, and 1.2TB/s for four arrays can achieve a read performance.” The special feature of the SSS 6000 storage that IBM is emphasizing is the performance in a single analysis node. When connecting one InfiniBand Ethernet (200GE) line based on a PCI 5.0 server, the single analysis node performance is at least 20 GB/s, and when using GPU direct storage, the read performance reaches up to 95 GB/s.
‘Spectrum Scale’ supporting data integration and automation
Meanwhile, Managing Director Jeon Sang-ho mentioned that IBM systems have high compatibility in terms of integration with other systems. SSS 6000 supports most protocols for accessing data, and has configured NooBaa, a high-performance S3 protocol, for S3 cloud native applications, and also supports full functions in HDFS, NFS SMB, and CSI. Integration can be automated by installing GPU servers and protocol servers on storage InfiniBand, and scale-out expansion is possible up to 32 for NFS, 16 for SMB, and 4 for S3. Managing Director Jeon Sang-ho cited IBM’s acquisition of Red Hat as another strength of IBM’s solutions, explaining, “After the acquisition of Red Hat, IBM is pursuing a policy of directly applying the code developed upstream by NFS-Ganesha, SMB, and NooBaa developers to IBM storage, and when improvements are discovered, applying them back upstream. This allows us to provide better functions.”
According to Vice President Jeon Sang-ho, IBM’s SSS 6000 also offers cost advantages. He said, “While many companies provide services that configure data tiering by bundling flash drives and general drives, IBM can configure a single file system by bundling tapes, storage servers with a large number of internal drives, and even external object storage.” He explained that since data can be stored in various repositories and storage managers can allocate repositories by policy, applications can be maintained cost-effectively and transparently.
The direction of IBM’s storage system is summarized in the name of the solution, ‘Spectrum Scale’. Spectrum Scale is a strategy that supports a wide ‘spectrum’ in the literal sense, configuring not only flash drives, tapes, internal drives, but also external object storage as a single file system and accessing it simultaneously through various protocols such as S3, NFS, and CSI to optimally provide application requirements.
“You can also cache and map external storage by utilizing AFM active file management, which is one of the advantages of IBM storage,” said Managing Director Jeon Sang-ho, adding that providing full functions for overall data management is the core of IBM Storage Scale System. yuseong.kim@foundryco.com
Source: www.itworld.co.kr