|
It's enough to ruin any computational scientist's day. You've done
the back breaking labor of building a mathematical model. In running the
model, you've written an enormous system of linear equations that will
actually do the calculations and give you results. You click the ON button
and the supercomputer begins to crank through it. Then, days later, the
unspeakable happens: the mathematical software grinds to a halt.
Who are you gonna call?
Randall Bramley,
a computer scientist at Indiana University in Bloomington. Bramley is an
expert in numerical linear algebra who is often called on for assistance
in such situations. Bramley and a team of scientists and students at
Indiana University have created a software tool that will help scientists
to help themselves. Called the Linear Systems Analyzer, it will enable scientists, engineers, and students
to find solutions to large systems of linear equations with the help of
the distributed computing resources -- and all from a desktop computer.
Large systems of linear equations, Bramley says, lie at the
heart of many of the types of computing tasks that Alliance members
encounter routinely. "This is something that permeates many of the
computations the Alliance partners are carrying out," he says. The systems
include tens to hundreds of thousands of variables. Linear systems are
what models have to solve when they "do the math" of a simulation.
A software tool called a solver repeatedly works through
the matrix of variables and finds a unique set of numbers that will solve
the calculation. But no one solver will fit all problems. According to
Bramley's rough estimate, probably no single solver will work more than 25
percent of the time across the spectrum of tasks in scientific computing.
The calculations may fail outright, and then the researcher has to analyze
the problem and find a new solution. "Except for a few unimportant cases,
there's nothing that will tell you which method will work and which one
will fail," Bramley explains. "You have to experiment with it."
With large systems, that experimentation may demand much more
working memory than a single workstation can offer. Also, the variety of
solvers a researcher could try out are scattered across the computational
science world on different machines. That's where the Linear Systems
Analyzer (LSA) comes in. Bramley conceived of it in May 1997. Then he and
his graduate students teamed up with Indiana's Dennis Gannon, a member of the Alliance's Enabling Technologies
Distributed Computing team, and Gannon's students to develop the idea into an
application for networked computing.
The LSA enables users to select and "wire together" self-contained software
building blocks on a distributed computing system without having to compile code
or perform other labor-intensive tasks every time they try out a new solution.
"You're wiring together ready-to-go components," Bramley says. "It's an environment
that lets you quickly experiment with many different solution methods in a
distributed fashion."
The heart of the analyzer are the solvers. The user chooses from a
menu of machines connected to the network. The LSA then displays which
solvers and other components are resident on that machine. Then the user
simply clicks mouse buttons on a series of menus to connect components --
indicated by a rectangle on the computer monitor -- into a graphical
display resembling a flowchart. It's also possible to run multiple
strategies simultaneously.
With Alliance support, Bramley and Gannon plan to extend
the capabilities of the LSA. Future versions will, for instance, enable
multiple users to collaborate in real time on the same problem from
different locations. The LSA may also include "intelligent guidance" that
can keep track of the solutions that work best for a particular user's
type of application.
Bramley also wants to build fault tolerance into the LSA. Right
now, the program is still in R&D mode, Bramley says, so when something
goes wrong in the circuit of components the LSA shuts down completely.
That has allowed the researchers to debug the program. But that will
change as the LSA evolves into a mature application, Bramley says. "Each
machine has a copy of the linear system at a certain stage," he says. "So
in principle, if one component drops out of the network you could plug in
another one without losing all the other components that are still
running."
Faisel Saied, who leads NCSA's Performance Engineering and Computational Methods group, notes that the Linear System
Analyzer can offer researchers more than just a way to restart a balky computer
model. Because the LSA allows experimentation, it could help researchers identify
bottlenecks in their calculations and perhaps find a way to boost the
performance of their code. "If they could plug into the LSA," Saied says,
"they could try a dozen solvers and select the one with the highest
performance."
That's one less phone call for help researchers will need to make.
LSA is part of the PSEware,a joint research project funded by the
National Science Foundation.
|