POWERCHALLENGEARRAY Impressing Researchers

by Ginny Hudak-David and John Towns

"SGI's cluster strategy will be important for the long term," said SGI's Brian Totty in an early April meeting with NCSA scientists and researchers. The Performance Engineering technical staff member spent a day at NCSA along with SGI's Horst Simon, research market development manager of SGI's Advanced Systems Division. NCSA's SGI cluster-called a POWERCHALLENGEARRAY-offers users networked servers with high-speed interconnects, an enhanced operating system and optimization features, and support utilities and libraries. NCSA's array is already impressing researchers.

The POWER CHALLENGE series is designed by Silicon Graphics Inc. headquartered in Mountain View, CA. SGI is a leading manufacturer of high-performance visual computing systems. MIPS Technologies, Inc., an SGI subsidiary, designs leading RISC microprocessor technologies that are used in the POWER CHALLENGE series hardware.

NCSA's SGI POWERCHALLENGEARRAY (Photo by Wilmer Zehr)

High-performance strategies

SGI is relatively new to high-performance computing, but it is enthusiastic about its contributions to this arena. The company is committed to a cost-effective, short-term scalability path. The solid performance of its individual systems combined with the price-for-performance ratio make the systems attractive to buyers. Because the systems can be reconfigured readily, SGI's Totty says arrays can be "supercomputers at night and workstations or servers by day."

NCSA's high-performance strategy is to reduce the cost- performance ratio in supercomputing by using shared-memory, microprocessor-based technology. One path to realizing this strategy depends on Silicon Graphics technology. The Center has a long-standing relationship with Silicon Graphics. In 1989, the company was instrumental in the development of the Renaissance Experimental Laboratory that houses many of the more than 85 SGI workstations at the Center.

Hardware installed

In April 1994, SGI shipped a 32-processor CHALLENGE machine to NCSA. Dubbed Loki by the time it hit the machine room floor, NCSA's CHALLENGE system is a symmetric multi-processor system based on the 32-bit MIPS R4400 microprocessor. It runs SGI's IRIX 5.3 operating system.

When the 16-processor POWER CHALLENGE arrived in October 1994, Loki was reduced to 12 processors. The POWER CHALLENGE, called Odin, is also a symmetric multiprocessor system, but the chip is the 64-bit MIPS R8000 microprocessor. Each processor has a theoretical peak speed of 300 megaflops, giving the system a total peak speed of 4.6 gigaflops. The POWER CHALLENGE is binary compatible with 32-bit SGI workstations, and it runs IRIX 6.0 (the 64-bit extension of IRIX 5.3).

"The R8000 RISC processor is the most powerful CPU in any microprocessor-based supercomputer," said Forest Baskett, senior vice president of research and development and chief technology officer of Silicon Graphics, in a press release last year announcing NCSA's purchase. "The 64-bit computing environment of the POWER CHALLENGE ensures that NCSA will not outgrow the addressing capabilities of the system."

Odin and Loki were opened to friendly users in December 1994. Loki serves as the front-end to the system. Users attack Grand Challenge-class problems with the POWER CHALLENGE system [see access, Spring 1995]. More than 250 users have accounts on the system.

In March 1995, the Center added four 8-processor POWER CHALLENGE systems to the machine room floor. Together with Loki and Odin, all the SGI "boxes" combine to form the NCSA POWERCHALLENGEARRAY, creating a tightly coupled set of systems with single user logins and a single allocation process. A HIPPI (high-performance parallel interface) connection is used to provide cost-effective, high-performance communication between the POWER CHALLENGE systems (named Thor, Freya, Magna, and Sif). The resulting array has a peak performance rating of over 14 billion floating point operations per second (Gflops) and offers 8 gigabytes of memory.

NCSA processes batch jobs on the array with the job manager lsbatch. Built on top of the Load Sharing Facility (LSF) by Platform Computing, lsbatch has been a useful addition to the array environment. The Network File System (NFS) cross-mounts file systems to provide shared file systems. Ultimately, NCSA hopes to queue jobs from SGI workstations at the Center.

Silicon Graphics is also donating four POWER Onyx graphics supercomputers to the Center-two at NCSA and two at UIC's EVL. Linked to the POWERCHALLENGEARRAY, these additional systems will be part of the NII/Wall that will premiere at Supercomputing '95.

A satisfied user

One of the earliest users of NCSA's POWER CHALLENGE system is Hsinchun Chen, a faculty member in the University of Arizona's Department of Management Information Systems. Part of the NSF/ARPA/NASA Digital Library Initiative (DLI) [see access, Spring 1995], Chen is working on a concept space approach to the information retrieval vocabulary problem. Online, networked information retrieval is hampered by information overload, scattered data, and different vocabularies (ones that change over time, over domains, and with individuals).

Professor Hsinchun Chen at the University of Arizona, who utilizes NCSA's POWER CHALLENGE in his research on information retrieval vocabulary, presented his findings at the first annual meeting of the Digital Library Initiative held at the Beckman Institute. (Photo by Fran Bond, NCSA Publications)

Concept space is defined as a network of terms and weighted associations that represents the concepts and their semantic relationships contained within underlying documents in a database. More simply put, developing a concept space is akin to a human generating a thesaurus of domain-specific terms (except that the concept space approach is algorithmic and computational).

With the system-generated concept space, a user is prompted with associations that he might not have considered and thus can discover more information to satisfy the queries. Chen's objective is to understand cross-domain term association patterns and whether conjoined automatic thesauri across different domains can help researchers bridge vocabulary differences. Concept space is generated from a large-scale, domain-specific document collection (e.g., a database of abstracts, full-text articles, product descriptions) using object filtering, automatic indexing, and cluster analysis techniques. Once a concept space is created, a searcher can then perform associative, concept-based retrieval based on the terms and their strengths of associations.

In a recent NSF National Collaboratory project generating worm and fly concept spaces for molecular biologists, Chen used a document collection of abstracts, conference proceedings, newsletters, and scanned reference books covering a 10-year period. The 4- and 6- megabyte concept spaces he developed for the worm and fly communities were used to assist in cross-domain information retrieval (e.g., fly biologists used fly terms to retrieve worm documents). Initially, when he ran his C programs on a DEC Alpha 2100 system, the concept space generation process took 1.5 hours. On the 512-node CM-5, the runs got down to 25 minutes. On the NCSA 16-node POWER CHALLENGE, he was able to produce the same concept space in approximately 20 minutes.

Chen is happy with those results. At an April 1995 DLI meeting held at the UIUC, he reported that his techniques are extremely robust and promising for solving the vocabulary problem in large- scale, networked information access in Internet and digital libraries. "The POWER CHALLENGE is the only way I can do my research," he declared.

What's ahead?

Chen can now do in one day on the POWER CHALLENGE what he formerly did in one month on a high-end UNIX workstation. He recently completed a computer engineering concept space generation experiment (funded by NSF/ARPA/NASA DLI) that initially took 23 hours on NCSA's 16-node POWER CHALLENGE to generate a computer concept space of 400 megabytes-270,000 computer engineering terms and 20 million weighted relationships. Compared to the INSPEC computer engineering thesaurus generated by human professionals (by about 40 subject experts and indexers over a period of 25 years), the automatic concept space appears to be more precise and contains richer associations. Reducing the time is paramount, and Chen is looking to NCSA's POWERCHALLENGEARRAY to make the improvements possible.

NCSA continues to push the limits of technology in the areas of high-performance computing as well as information serving, retrieval, and analysis. For SGI, the future holds continued improvements in the array environment and the next generation R10000 processor.

Other new and upgraded computational resources at NCSA are listed below.

Ginny Hudak-David coordinates and edits NCSA's user guides and is a member of the Publications Group. John Towns leads NCSA's National Consulting Office and is a researcher in relativity.


access / Summer 1995 / NCSA