Abstract
In this paper, we derive upper bounds that characterize the rate of convergence of the SOR method for solving a linear system of the form , where is a real symmetric positive semidefinite matrix. The bounds are given in terms of the condition number of , which is the ratio , where is the largest eigenvalue of and is the smallest nonzero eigenvalue of . Let denote the related iteration matrix. Then, since has a zero eigenvalue, the spectral radius of equals 1, and the rate of convergence is determined by the size of , the largest eigenvalue of whose modulus differs from 1. The bound has the form , where The main consequence from this bound is that small condition number forces fast convergence while large condition number allows slow convergence.
1. Introduction
The SOR method is one of the basic iterative algorithms for solving a large sparse linear system of the form where , and denotes the vector of unknowns. That is, we need to solve a system of linear equations with unknowns. The abbreviation SOR stands for “successive overrelaxation.” For a detailed description and discussion of this method, see [1–41]. In this paper, we investigate the SOR rate of convergence in the special case when is a real symmetric positive semidefinite matrix. This means that has at least one zero eigenvalue and that the system (1) can be inconsistent. It is also assumed that all the diagonal entries of are positive. This has two justifications. First, it is not possible to apply the SOR iteration without this assumption. Second, since is positive semidefinite, a zero diagonal entry implies that the corresponding row and column are null and can be deleted. The last assumption enables us to express in the form where is a strictly lower triangular matrix and is a positive definite diagonal matrix. The th SOR iteration, , starts and computes by the rule
The matrix is called the SOR iteration matrix, and is a relaxation parameter that satisfies
Remark 1. In this paper, we study the asymptotic rate of convergence of the SOR iteration (3). Thus, when using simpler terms like “fast rate of convergence” or “fast convergence,” we always refer to asymptotic behavior.
The role of is seen from the following known observation. Assume for a moment that the linear system (1) has a solution . Then, the iteration (3) satisfies where denotes the starting point. The last equality reveals the importance of , the spectral radius of . If , then the sequence converges toward , and the rate of convergence depends on the size of . The smaller is, the faster is the convergence. Otherwise, when , the SOR iteration diverges. Yet, when has a zero eigenvalue, then and the situation depends on the consistency of (1). If the SOR iteration attempts to solve an inconsistent system, it diverges. Otherwise, when the system to solve is consistent, it converges. The question discussed in this paper is how the spectral properties of affect the rate of convergence in this case.
The treatment of the positive semidefinite case becomes easier by noting the relation with Kaczmarz’s method. For a detailed description and discussion of Kaczmarz’s method, see [4–6, 8, 10, 11, 13, 15, 20, 21, 31, 33, 34, 36] and the references therein. Let denote the rank of . Then, and has zero eigenvalues. Moreover, using the spectral decomposition, it is possible to express in the form where has orthogonal columns. Let the sequence , , be generated by Kaczmarz’s method for solving the linear system where denotes the vector of unknowns. Then, the following observation is well known, e.g., [4, 8]. If the starting points satisfy then the equalities hold for all . This relation implies that Kaczmarz’s method obeys the rule where is the corresponding iteration matrix. The role of will be clarified in the coming discussions.
The literature on the SOR method is huge and includes various types of convergence results. However, many of the early results are derived under certain assumptions on , such as being an “M-matrix” or “consistently ordered,” e.g., [1, 17–19, 35, 37, 41]. One of the first results without such assumptions is Ostrowski’s theorem [29, 37], which ensures that whenever is positive definite. Yet, it took about forty years until Oswald [30] lowered the bound on . See also Axelsson [1] (pp. 241-242) for a similar bound. The current paper extends Oswald’s bound to the positive semidefinite case. The difficulty here is that . So we need to see what dominates the rate of convergence in this case.
The interest in the semidefinite case was initiated in the work of Keller [22]. Then, the surprising behavior of the SOR method when solving inconsistent linear systems was studied in a small number of papers, e.g., [2, 7, 8, 28]. It is shown there that if the linear system (1) is inconsistent then although the SOR sequence diverges it obeys the rule where is a converging sequence and is a fixed vector that belongs to . Otherwise, when the linear system is consistent, and . This explains why the related Kaczmarz sequence (11) always converges.
The convergence properties of iterative methods for solving consistent positive semidefinite linear systems have attracted the attention of several authors. See, for example, [2, 3, 7, 8, 14, 22–26, 40], and the references therein. In particular, if is positive semidefinite as above then the SOR iteration matrix, , is known to be “semiconvergent”. Let , denote the eigenvalues of and assume that they are sorted to satisfy which implies that . Then “semiconvergent” means that , and that any eigenvalue of that satisfies must equal 1 and has a Jordan block. Furthermore, since has zero eigenvalues, the eigenvalues of satisfy
Consequently, the Jordan canonical form of shows that the rate of convergence is determined by the size of , which is sometimes called the “convergence factor,” e.g., [26]. This situation means that we need an upper bound on .
The bound is gained in two stages. First, we show that
Then, we establish the inequality and derive a bound on .
The plan of the paper is as follows. We start by exploring the relations between the eigenvalues of and , showing that the two matrices share all the eigenvalues that differ from 1, which proves (17). Then, we study the relations between the iteration matrices of the symmetric SOR (SSOR) method and the symmetric Kaczmarz method and use these relations to simplify the expression for . The bound on the spectral radius of this matrix is derived in the third section. The bound has the form where denotes the condition number of and
The condition number is defined as the ratio , where is the largest eigenvalue of and is the smallest nonzero eigenvalue of . The bound shows that small condition number forces fast rate of convergence, while large condition number allows slow convergence. However, as this is an upper bound, a large condition number does not force slow convergence. Hence, it is worthwhile to have a close look at the reasons behind slow convergence. This issue is discussed in Section 4. It is shown there that small nonzero eigenvalues of are likely to cause slow asymptotic rate of convergence. Finally, in the last section, we compare our approach with former attempts to derive such bounds.
2. Iteration Matrices and Their Eigenvalues
The assumption that has positive diagonal entries allows us to make the following simplification. Consider the SOR iteration for solving the normalized system , where and . Then, the related iteration matrix is similar to . Thus, when studying the rate of convergence of the SOR method, it is possible to replace (1) with its normalized form. That is, there is no loss of generality in assuming that . Hence, from now on, we assume that has the form where denotes the identity matrix and is a strictly lower triangular matrix. As before, denotes the rank of and . Consequently, can be factorized in the form where the matrix has orthogonal columns. Moreover, let denote the th row of . Then, (21) implies
That is, the rows of have unit length. The SOR iteration splits in the form where and
Recall that is a given relaxation parameter that satisfies . The th SOR iteration, , starts with and ends with , which is computed by solving the linear system
In other words, is obtained from by the rule where is the related iteration matrix, and
Observe that (24) enables us to express in the form
Multiplying (28) by and using (11) gives while substituting instead of shows that
This means that the iteration matrix of Kaczmarz’s method has the form
Note that is an matrix while is an matrix. However, as the next theorem shows, these matrices share several eigenvalues.
Theorem 2. Let be a nonzero eigenvalue of the matrix ; then, is also an eigenvalue of the matrix . Conversely, let be a nonzero eigenvalue of ; then, is also an eigenvalue of .
Proof. Let be a unit eigenvector of that corresponds to a nonzero eigenvalue . Then, the equality
implies , and multiplying this equality by gives
which means that is an eigenvalue of .
The converse direction is proved in a similar way. Let be a unit eigenvector of that corresponds to a nonzero eigenvalue . Then, the equality
implies ,
and
which means that is an eigenvalue of .☐
We have seen that the eigenvalues of satisfy (16). The next theorem shows that equals the spectral radius of .
Theorem 3. The eigenvalues of satisfy (16) and (17) with .
Proof. Recall that and , where the matrix is nonsingular. Now the last theorem implies that the largest eigenvalue of equals the largest eigenvalue of whose modulus differs from 1.☐
The bounds which are derived in the next section are using the close relations between the Kaczmarz-SOR method and its symmetric version. The symmetric iteration is combined of two parts. The first one is the usual (“forward”) iteration, while the second is a “backward” iteration in which the rows of the linear system are approached in the reverse order. See, for example, [1, 13, 14, 26, 33, 34, 37, 41]. The iteration matrix of the backward SOR method has the form where denotes the matrix . This implies that the iteration matrix of the backward Kaczmarz method is
Consequently, the iteration matrix of the symmetric SOR method (SSOR in brief) has the form while that of the symmetric Kaczmarz method is
The next assertion expresses these matrices in a useful form.
Theorem 4. The iteration matrix of the SSOR method has the form while that of the symmetric Kaczmarz method is
Proof. The second equality is a direct consequence of the first one, which is derived from the following identities. ☐
The importance of the last theorem is that it gives a better insight into the eigenvalues of these matrices. In particular, by following the proof of Theorem 2, we obtain the following conclusions.
Theorem 5. Let be a nonzero eigenvalue of the matrix ; then, is also an eigenvalue of the matrix and vice versa. Moreover, since the last matrix is symmetric and positive definite, both matrices share the same positive eigenvalues. The other eigenvalues of are zeros.
Corollary 6. Let denote the smallest eigenvalue of the matrix . Then
The next section uses these results to derive upper bounds on and .
3. Upper Bounds on the Spectral Radius
Let be an arbitrary square matrix, and let denote the spectral norm of . Then, it is well known that where denotes the largest singular value of . It is also well known that the spectral radius of cannot exceed its spectral norm. That is, and
Combining these relations with Corollary 6 yields the following useful observation.
Theorem 7. The Kaczmarz iteration matrix, , and the symmetric Kaczmarz iteration matrix, , satisfy the relations
The rest of this section is aimed at deriving an “effective” upper bound on the right hand side of (50). In particular, we are looking for a bound that shows how the condition number of affects the rate of convergence. The first step is to establish a lower bound on the value of .
Let denote the smallest nonzero eigenvalue of . Then, is also the smallest eigenvalue of the matrix . Note that the smallest eigenvalue of the matrix is . Now from (47), we see that where the last inequality follows from the triangle inequality for the matrix . Observe that is not expected to be much larger than . Indeed, using induction on , one can verify that
For a detailed proof of this assertion, see Oswald [30]. Let denote the largest eigenvalue of . Then, (52) can be rewritten as where
Combining (51) with (53) gives and from (50), we obtain that for any from the interval . A further improvement is gained by noting that the bound function has a unique minimizer in this interval. Computing the derivative of and eliminating from the equality shows that the minimizer lies at the point
It is also easy to verify that while the assumption that has unit rows implies and
In other words, the spectral radii of and satisfy the inequalities where and denotes the condition number of . The bound on can be simplified by using the inequality which shows that
Let be a value of for which attains its smallest value. That is, is an optimal relaxation parameter for the symmetric Kaczmarz method. Then, clearly, and the inequalities (61) and (64) remain valid when replaces . Similarly, let denote the optimal relaxation parameter for Kaczmarz’s method. Then, the inequality implies and
Since is often considerably larger than , the point is often much smaller than 1. On the other hand, in many cases, is larger than 1, and the function is decreasing in the interval , which implies that the bound is likely to hold for all , including .
The main consequence from these bounds is that small condition number forces fast rate of convergence, while large condition number allows slow convergence. Yet the bounds are not tight in the sense that the actual rate of convergence is often considerably faster than the implied rate. This behavior is due to a number of reasons. First, in many symmetric positive semidefinite matrices, the ratio is considerably smaller than . Second, as noted above, is expected to be considerably smaller than , so the rate of convergence for (or ) is expected to be much faster. Third, let be an arbitrary permutation matrix and consider the SOR method for solving the linear system
Then, since the iteration matrix of is not necessarily similar to that of , we might get a different rate of convergence, e.g., [27, 32, 37]. On the other hand, since has the same eigenvalues as , both matrices share the same upper bound. This shows that the bound holds for the worst possible ordering.
Finally, we note that the above treatment of the positive semidefinite case is easily adapted to the positive definite case. In the latter case, is an invertible matrix, and is similar to , so the bounds on apply to .
4. Slow Rate of Convergence
The bounds derived in the former section indicate that slow rate of convergence is possible only when has a large condition number. On the other hand, the assumption that has unit diagonal implies that the largest eigenvalue of satisfies . Consequently, a large condition number occurs whenever has small positive eigenvalues. These observations raise the question of whether small positive eigenvalues are the reason behind slow convergence. Indeed, as explained below, a small positive eigenvalue may cause slow rate of convergence. The first two lemmas provide the tools for proving this claim.
Lemma 8. Define . Then, satisfies the equality
Proof. The definition of implies that and Now the equality shows that ☐
Lemma 9. Let denote a nonzero eigenvalue of and let denote the corresponding unit eigenvector. That is, , and . Then
Proof. Using (71) and the fact that the matrix is positive semidefinite, we obtain the inequalities and ☐
Theorem 10. Let be an eigenvector of as above, and let and be defined by the equalities Then where satisfies and
Proof. A further use of the equality gives while from (74) we see that Finally, the inequality proves (80).☐
One consequence of (79) is that small implies small , while (80) shows that for small the error component in the direction of decays slowly. That is, small leads to slow error decay. Another consequence is that small (which means close to 2) may compensate the slowing effect of small .
5. Concluding Remarks
The SOR method and Kaczmarz’s method have been intensively studied for many years. Thus, naturally, some of the mentioned results can be found elsewhere in different forms. In particular, the Kaczmarz iteration matrix (34) and the symmetric Kaczmarz iteration matrix (45) both easily come out as special cases of a more general iteration, see [13] (prop. 4 and 10). Also, the relation between the spectral radius of Kaczmarz and symmetric Kaczmarz (first part of Theorem 7) is already observed in [13] (§4).
Estimates of the rate of convergence in the semidefinite case are derived in a series of papers by Lee et al. [23, 24, 40]. However, these estimates have a different flavor, as they are not using the eigenvalues of or its condition number.
More recently Oswald and Zhou [31] have used the concept of stable Hilbert splittings to develop a unified approach for studying the convergence of multiplicative Schwartz methods. This approach was used in [31] to derive upper bounds on the rate of convergence of Kaczmarz’s method, and later in [32], it was modified to bound the SOR convergence in the semidefinite case.
The current treatment of the semidefinite case is quite different. It is based on direct arguments from linear algebra, such as the Jordan canonical form of and the relations between the eigenvalues of and . This simplifies the proof and adds important insight into the semidefinite case.
The upper bounds on the convergence factor explain why small condition number ensures fast convergence. Another related question is whether and why large condition number leads to slow convergence. The analysis in Section 4 provides a convincing explanation. Yet, as this is the first attempt to resolve this enigma, there may be further ways to answer this question.
The relation between the condition number and the rate of convergence stands behind the “Kaczmarz anomaly” phenomenon [10, 11]. The Kaczmarz-SOR method is often considered as a prototype of more sophisticated methods from the families of Row-action methods [5, 6, 9], Projection methods [6, 33, 36], Column-action methods [8, 12, 38], and Coordinate-descent algorithms [39]. This suggests that other members of these families may share similar asymptotic behavior. Examples that illustrate this connection are described in [9].
Data Availability
No data were used to support this study.
Conflicts of Interest
The author declares that there are no conflicts of interest.