

Can We Solve Linear Algebra Problems at Extreme Scale and Low Precisions? – Nick...
source link: https://nhigham.com/2021/09/14/can-we-solve-linear-algebra-problems-at-extreme-scale-and-low-precisions/comment-page-1/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Can We Solve Linear Algebra Problems at Extreme Scale and Low Precisions?

The Fugaku supercomputer that tops the HPL-AI mixed-precision benchmark in the June 2021 TOP500 list. It solved a linear system of order 10^7 using IEEE half precision arithmetic for most of the computations.
The largest dense linear systems being solved today are of order , and future exascale computer systems will be able to tackle even larger problems. Rounding error analysis shows that the computed solution satisfies a componentwise backward error bound that, under favorable assumptions, is of order
, where
is the unit roundoff of the floating-point arithmetic:
for double precision and
for single precision. This backward error bound cannot guarantee any stability for single precision solution of today’s largest problems and suggests a loss of half the digits in the backward error for double precision.
Half precision floating-point arithmetic is now readily available in hardware, in both the IEEE binary16 format and the bfloat16 format, and it is increasingly being used in machine learning and in scientific computing more generally. For the computation of the inner product of two -vectors the backward error bound is again of order
, and this bound exceeds
for
for both half precision formats, suggesting a potentially complete loss of numerical stability. Yet inner products with
are successfully used in half precision computations in practice.
The error bounds I have referred to are upper bounds and so bound the worst-case over all possible rounding errors. Their main purpose is to reveal potential instabilities rather than to provide realistic error estimates. Yet we do need to know the limits of what we can compute, and for mission critical applications we need to be able to guarantee a successful computation..
Can we understand the behavior of linear algebra algorithms at extreme scale and in low precision floating-point arithmetics?
To a large extent the answer is yes if we exploit three different features to obtain smaller error bounds.
Blocked Algorithms
Many algorithms are implemented in blocked form. For example, an inner product of two
-vectors
and
can computed as
where and
is the block size. The inner product has been broken into
smaller inner products of size
, which are computed independently then summed. Many linear algebra algorithms are blocked in an analogous way, where the blocking is into submatrices with
rows or
columns (or both). Careful analysis of the error analysis shows that a blocked algorithm has an error bound about a factor of
smaller than that for the corresponding unblocked algorithm. Practical block sizes for matrix algorithms are typically
or
, so blocking brings a substantial reduction in the error bounds.

Backward errors for the inner product of two vectors with elements of the form -0.25 + randn, computed in single precision in MATLAB with block size 256.
In fact, one can do even better than an error bound of order . By computing the sum
with a more accurate summation method the error constant is further reduced to
(this is the FABsum method of Blanchard et al. (2020)).
Architectural Features
Intel x86 processors support an 80-bit extended precision format with a 64-bit significand, which is compatible with that specified in the IEEE standard. When a compiler uses this format with 80-bit registers to accumulate sums and inner products it is effectively working with a unit roundoff of rather than
for double precision, giving error bounds smaller by a factor up to
.
Some processors have a fused multiply–add (FMA) operation, which computes a combined multiplication and addition with one rounding error instead of two. This results in a reduction in error bounds by a factor
.
Mixed precision block FMA operations , with matrices
of fixed size, are available on Google tensor processing units, NVIDIA GPUs, and in the ARMv8-A architecture. For half precision inputs these devices can produce results of single precision quality, which can give a significant boost in accuracy when block FMAs are chained together to form a matrix product of arbitrary dimension.
Probabilistic Bounds
Worst-case rounding error bounds suffer from the problem that they are not attainable for most specific sets of data and are unlikely to be nearly attained. Stewart (1990) noted that
To be realistic, we must prune away the unlikely. What is left is necessarily a probabilistic statement.
Theo Mary and I have recently developed probabilistic rounding error analysis, which makes probabilistic assumptions on the rounding errors and derives bounds that hold with a certain probability. The key feature of the bounds is that they are proportional to when a corresponding worst-case bound is proportional to
. In the most general form of the analysis (Connolly, Higham, and Mary, 2021), the rounding errors are assumed to be mean independent and of mean zero, where mean independence is a weaker assumption than independence.
Putting the Pieces Together
The different features we have described can be combined to obtain significantly smaller error bounds. If we use a blocked algorithm with block size then in an inner product the standard error bound of order
reduces to a probabilistic bound of order
, which is a significant reduction. Block FMAs and extended precision registers provide further reductions.
For example, for a linear system of order solved in single precision with a block size of
, the probabilistic error bound is of order
versus
for the standard worst-case bound. If FABsum is used then the bound is further reduced.
Our conclusion is that we can successfully solve linear algebra problems of greater size and at lower precisions than the standard rounding error analysis suggests. A priori bounds will always be pessimistic, though. One should compute a posteriori residuals or backward errors (depending on the problem) in order to assess the quality of a numerical solution.
For full details of the work summarized here, see Higham (2021).
References
- Pierre Blanchard, Nicholas Higham, and Theo Mary, A Class of Fast and Accurate Summation Algorithms, SIAM J. Sci. Comput. 42(3), A1541–A1557, 2020.
- Pierre Blanchard, Nicholas Higham, Florent Lopez, Theo Mary, and Srikara Pranesh, Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores, SIAM J. Sci. Comput. 42(3), C124-C141, 2020
- Michael P. Connolly, Nicholas J. Higham, and Theo Mary, Stochastic Rounding and Its Probabilistic Backward Error Analysis, SIAM J. Sci. Comput. 43(1), A566–A585, 2021.
- Nicholas J. Higham, Numerical Stability of Algorithms at Extreme Scale and Low Precisions, MIMS EPrint 2021.14, Manchester Institute for Mathematical Sciences, The University of Manchester, UK, September 2021.
- Nicholas Higham and Theo Mary, A New Approach to Probabilistic Rounding Error Analysis, SIAM J. Sci. Comput. 41(5), A2815–A2835, 2019.
- G. W. Stewart, Stochastic Perturbation Theory, SIAM Rev. 32(4), 579–610, 1990.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK