Abstract

Robust estimation has proved to be a valuable alternative to the least squares estimator for the cases where the dataset is contaminated with outliers. Many robust estimators have been designed to be minimally affected by the outlying observations and produce a good fit for the majority of the data. Among them, the redescending estimators have demonstrated the best estimation capabilities. It is little known, however, that the success of a robust estimation method depends not only on the robust estimator used but also on the way the estimator is computed. In the present paper, we show that for complicated cases, the predominant method of computing the robust estimator by means of an iteratively reweighted least squares scheme may result in a local optimum of significantly lower quality than the global optimum attainable by means of a global optimization method. Further, the sequential use of the proposed global robust estimation proves to successfully solve the problem of M-split estimation, that is, the determination of parameters of different functional models implicit in the data.

1. Introduction

The statistical methods used for the adjustment of observations rely on several assumptions. Most commonly, the observed data are assumed to follow a normal distribution (also called Gaussian distribution) and the functional model relating the observed data with the unknown system parameters is assumed to hold exactly. Under these assumptions, it can be proved that the least squares (LS) estimator is the best linear unbiased estimator (Gauss–Markov theorem [1, 2]).

It is not uncommon, however, that the data collected contain one or several atypical observations, called outliers. They deviate from the general data pattern for some reason: they may be gross errors (observation values incorrectly noted down, for example) or pieces of data belonging to distributions other than the one of the main bulk of data (for example, due to the occasional unexpected observation of a different phenomenon concurrent with the main outcome of the experiment designed). It can also be that the functional model used to encapsulate the relationship between the unknown parameters and the observables is incomplete or simply approximate. In these cases, namely, the appearance of outliers or the existence of systematic effects in the observations or the functional model, the LS estimator may be very adversely influenced and produce a clearly unacceptable solution for the majority of the data.

While procedures to eliminate gross errors are routinely used (recent examples in geosciences include [35]); unfortunately, on certain occasions, some undesired errors may remain undetected and slip into the adjustment phase. Robust estimation techniques were devised in the last decades to attain a solution for the adjustment problem that is minimally affected by the appearance of outliers in the data or systematic effects either in the data or in the model (inexactness of the functional model) while producing a solution close to the LS solution in the case of correct functional model and inexistence of outliers and systematic errors in the data. The reader may consult [6] for a general reference.

Many different robust estimators have been proposed over the last decades, some showing better performance than others in coping with undesired errors of different sizes and distributions. Little attention has gained, however, the question of how to compute the solution once the robust estimator has been selected, so that the Iteratively Reweighted Least Squares (IRLS) scheme is almost universally indicated as the procedure to follow (e.g., [68]). This may be due to the simplicity in recasting the new estimation techniques in an iterative procedure based on the familiar LS scheme; but, as we will see, this decision is at the cost of critically undermining the robust estimator capabilities, which are essential to successfully cope with the most complicated cases.

After a quick review of the types and properties of robust estimators, which will be needed to develop the subsequent sections, we will summarize the IRLS method to compute a robust estimator and propose a solution for this computation problem based on a Global Optimization algorithm (GO). We discuss then that a GO solution of the robust estimator is always equal or better than the solution by IRLS and can be referred to as global robust estimation, in contrast with the local robust estimation that is achieved by IRLS. The differences will be evident in the application section, whereas a novel result shows that global robust estimation efficiently solves the problem of M-split estimation resolved so far in a completely different way (see, for instance, [9]).

2. Types and Properties of Robust Estimators

We deal now with the so-called maximum likelihood estimators, or M-estimators, that is, the type of estimators that maximize the corresponding likelihood function over the parameter space. As Huber [10] proposed, the estimation problem entails the minimization over the entire parameter space of a function f of the parameters x and the adjustment residuals

The function f, which is desired to be minimized, can be called the objective function. It is obtained as the sum of certain function ρ computed for every residual. This ρ function needs to have some desirable properties for the solution to be robust against outliers and systematic errors. The derivative of this ρ function, sometimes simply referred to as influence function, can be denoted by

The computation of the robust estimation problem has been routinely done by means of an IRLS scheme: from early seminal works, [11], and early computing software packages, MIT ROSEPACK [12], until the most recent proposals, e.g., [1315], including the general references on robust estimation, e.g., [6]. According to the IRLS scheme, in every iteration, the residuals of the previous iteration are used to form weights that are introduced in a new LS adjustment to obtain a new solution. The functions to obtain the corresponding weights to solve the robust estimation problem by means of IRLS are easily obtained as follows:

For the case of the LS estimator, objective ρ, influence ψ, and weight ω functions are as follows:

The least L1-norm or least absolute deviation estimator is significantly more robust against outliers and systematic errors. Its corresponding functions are as follows:

It may be worth mentioning that if the estimation problem consists of obtaining the best value of a single quantity that has been repeatedly measured, the LS estimator is the arithmetic mean, and the least L1-norm estimator is the median.

In general, we can choose values for p other than 1 and 2 and use the corresponding Lp-norm estimator, whose objective ρ, influence ψ, and weight ω functions are as follows:

Noninteger exponents may be used, for example , and some have even proposed to compute the best value adapted to the sample [16].for number of degrees of freedom dof and residuals of an initial adjustment.

Special combinations with some of these norms can also be used to obtain robust results (see, e.g., [17, 18]).

The L1-norm estimator (all Lp-norms in fact) and many others that have been proposed and used in a relatively extensive way (e.g., Huber’s estimator) lack, however, a desirable property that may be key to success in complicated cases: they are not redescending estimators.

The theory of redescending estimators originated in the work by Hampel et al. [19]. These are estimators having ψ functions that are not only zero in origin (the correct solution) but decrease toward zero far from the origin. They offer an increase in robustness toward large outliers and have a high breakdown point (in simple words, they resist the appearance of a very large proportion of outliers in the sample).

Some redescending estimators widely used are as follows:(i)Geman–McClure estimator [20].(ii)Welsch estimator [21].where the constant c normally takes a value around 1 to 5.(iii)Tukey’s biweight estimator [22].where the threshold constant c normally takes a value around 1 to 5. Note that in the following section, we propose an equivalent redefinition of these functions that formally avoids piecewise definitions.

Figure 1 shows the ψ functions of all the above-mentioned robust estimators. As it can be seen, only Geman–McClure, Welsch, and Tukey are redescending estimators.

In the last years, redescending estimators have been extensively proved to outperform nonredescending estimators in the most complicated cases (see, e.g., [23]) but, as they are solved by IRLS, the initial solution used « must be robust in order to ensure convergence to a “good” solution» ([6], p. 40). Otherwise, the solution achieved is suboptimal. That is, it is not the best possible solution (global optimum) but the best only in the vicinity of the initial solution used for the IRLS. How to start with a good enough initial solution is still a case of research as it can be exemplified in the recent work [24]. The underlying problem, however, is indeed the use of IRLS, an iterative process strongly relying on the goodness of the starting solution due to the implicit dependence of the procedure on the least squares estimator.

We propose to abandon altogether the solution by IRLS in favor of a GO method, which eliminates the requirement of having a “good enough” initial solution. While the differences in computing the robust estimator by one procedure or the other (IRLS or GO) may be negligible in relatively easy problems (e.g., the case of levelling networks, [25]), in really complicated problems, one can definitely see a significant difference in favor of GO, which provides the global optimum and not simply a local optimum close to the initial solution, as in IRLS.

Global robust estimation, robust estimation where the estimator is computed by a GO method guaranteeing that the global optimum is attained, succeeds where the use of IRLS (local robust estimation) fails, as it will be shown. This applies to complicated examples in terms of large contamination by outliers and includes the possibility of successfully performing M-split estimation with no other tool than global robust estimation.

3. Methods

Let us solve the overdetermined system of observation equationswhere x is the vector of unknown parameters, A a coefficient matrix, k the vector of independent terms, and the vector of adjustment residuals. The system of equations is assumed to have been previously reduced to the unit weight.

3.1. Robust Estimation by IRLS

The LS solution for the system of equations above, equation (23), is as follows:and the adjustment residuals can be computed by introducing the x vector obtained into

These residuals can be used to obtain the weights for computing every robust estimator as an IRLS problem, using the corresponding previous equations (9), (12), (16), (19), or (22). The new solution for the system of weighted equations (having included the weight matrix with weights ωi in the diagonal) is as follows:and the new residualswhich are used to compute new weights and new adjustment solutions in an iterative procedure that is stopped when no significant difference between iterations is found.

3.2. Robust Estimation by GO

Alternatively to the use of IRLS for computing the robust estimator, we propose the use of a GO method, which, as shown in the next section, will produce a global optimum, sometimes significantly different from the mere local optimum achievable by IRLS for complicated estimation problems.

The GO method searches in a prescribed orderly way for the x vector minimizing the value of the objective function equation (1) using the corresponding expression ρ for the estimator (note this is also applicable to the LS estimator), say equations (4), (7), (10), (14), (17), or (20).

In the robust estimator function definitions above, note that Tukey’s biweight estimator is piecewise-defined in terms of the residual achieved in the previous iteration. We want a unique expression and desire not to call values from previous iterations, so we propose the redefinition of equation (20) as follows:where we have omitted the constant factor c2/6 for simplicity, since it has no influence on the final result, and we have introduced the Heaviside step function H(x), which returns 0 for x < 0 and 1 for x > 0, and is customarily included in standard computation software (e.g., Matlab). With this definition, if either νi > c (then cνi < 0 and H(cνi) = 0) or νi < -c (then c+νi < 0 and H(c+νi) = 0), then the second summand vanishes as required.

Now, we describe an algorithm for computing any robust estimator by using the Simulated Annealing (SA) method, a GO method that has recently found many successful applications in the field of geosciences, e.g., [2529]. The method is based on the analogy with the process of self-construction of crystalline networks. The interested reader should consult more detailed references (e.g., [30, 31]); here, we only show a particular adaptation to solve the problem at hand.

Step 1. A search domain should be defined and a point in it be taken as starting point. To this end, we can make use, for example (although this is not important), of the initial LS solution of the problem x0 as the starting solution and define x0 ± kσ as the search domainwith k being an arbitrarily large value that guarantees (with a large degree of certainty) that the global optimum is inside the search domain and σ a vector with the standard deviations of every parameter, which are obtained as the square root of the values in the diagonal of the covariance matrix Cxwhere dof denotes the number of degrees of freedom in the adjustment (equal to the number of equations, m, minus the number of unknowns, n).
It is essential to note that unlike the case of IRLS, where the initial solution causes the final solution to be the closest local optimum, in the SA method, the starting point may be completely arbitrary since the final solution obtained will be independent of the initial point used. Also, we can take k = 20 or the larger the value we may desire at the only risk of increasing the computational burden. In other words, the use of the initial LS solution (or any other sensible alternative) to define both the starting value and the search domain is completely uncritical.
An initial movement amplitude, , needs to be defined. It can be taken as a fraction of the search space half-width mentioned above so that it enables to comfortably explore the entire search space in a few iterations, for example,This initial movement amplitude σ0 is taken as the current movement amplitude σcurrent.

Step 2. The initial solution x0 is taken both as the current solution xcurrent and the best solution xbest. Using these values, we compute the corresponding adjustment residuals ν, equation (25), and the corresponding value for the objective functionTo this end, we make use of the desired form of the ρ function: equation (7) if the LS estimator is desired, equation (10) if the minimum L1-norm estimator is desired, etc. The value initially obtained is stored both as the best () and the current () value of the f function.

Step 3. The current solution vector and the current objective function value are stored as the corresponding previous values: xcurrentxprevious, fcurrentfprevious
A new tentative solution is generated from the previous solution xprevious and the current movement amplitude σcurrent as taken from the Normal distributionWe can do this computation element by element that is, for the first element of vector x1, we draw a value from a normal distribution of mean, the first element of xprevious, and standard deviation, the first element of σcurrent, and analogously for the rest of the elements.
If the solution x1 obtained is located inside the search domain, equation (29), it is a feasible solution. Otherwise, it is discarded and a new solution is drawn from the Normal distribution, equation (33).
The feasible solution may be accepted or not as the current solution for subsequent iterations, depending on certain conditions (akin to the process observed in nature by which crystalline networks are self-constructed) which are studied in the following step.

Step 4. We compute the residuals ν corresponding to solution and evaluate therein the objective function f analogously as done before in Step 2. This becomes the current value for f function, fcurrent.(i)If the solution is better than the previous one, that is, if fcurrent < fprevious, the solution is accepted, that is, x1 is stored as xcurrent: x1xcurrent. It must also be checked whether the current value of the objective function is the best value so far and rename variables if this is the case (fcurrentfbest and xcurrentxbest)(ii)If the solution is not better than the previous one, that is, if fcurrent ≥ fprevious, the solution is only accepted with a very low probability p as to emulate the very rare cases observed in nature where worse energy states are occasionally accepted in the process of crystalline network self-construction. In short, if a number drawn at random from the uniform distribution [0,1] is below, say, 0.01 (which only happens with 1% of probability); then, x1xcurrent; otherwise, xcurrent is taken from the previous iteration (also fpreviousfcurrent).

Step 5. Cooling process. The amplitude of movement is reduced as emulating the cooling process during which the crystalline network is created. The cooling equation can be as simple as follows:where σ0 is the initial movement amplitude, iterNo the iteration number, and β a number sufficiently close to unity (e.g., β = 0.99999), taking into account the fact that the closer to the unity, the slower the cooling and the corresponding computational process.
If the current movement amplitude can be considered negligible for the problem at hand, the algorithm stops. Otherwise, we return to Step 3.
It can be demonstrated that the SA method converges to the global optimum in probabilistic terms [30], and for this, we need a sufficiently slow cooling scheme (β sufficiently close to one). Although there can be no complete certainty that the solution attained is indeed the global optimum, a pair of hints reinforce the credibility of the solution: first, the final current solution must be coincident with the best solution attained all along the iterations; second, and important to check, subsequent executions of the algorithm (which use different numbers due to the intrinsic random nature of the algorithm) may produce the same result: the expected global optimum. In addition, we can also execute the algorithm with different initial solutions than the LS solution to verify that it also converges to the same result.

4. Application

4.1. Robust Estimation in the presence of Outliers

As said before, the computation of the robust estimator as an IRLS problem is the procedure customary found in the literature, also for application to geodetic problems: examples of different types include [3234].

Depending on the problem complexity, the solution obtained by IRLS may already be the global optimum (thus coincident with the one obtained by GO). This is the case when the initial solution to start the IRLS process—the LS solution—is not extremely affected by the existing undesired errors and is relatively close to the correct solution. As shown in [25], the case of levelling networks appears to be one of such problems where robust estimation in the presence of outliers is equally well solved by IRLS and by GO since both are chiefly limited by the relatively low redundancy of the problem.

We turn our attention now to a problem where IRLS and GO behave very differently. It is a linear adjustment taken from Maronna et al. ([6], p.147 Table 5.6) based, in its turn, on a work by Jalali-Heravi and Konouz [35]. The experiment consists of the observation of different chemical compounds of two quantities (heat and Krafft point), which are in theory linearly related, where a significant proportion of outlying observations exists. In fact, some of the outliers correspond to a different structure than the principal one, whereas others may be gross errors.

Figure 2 and Tables 1 and 2 show the results obtained by the estimators presented before (those whose ψ functions were depicted in Figure 1). For GO, they were obtained by the SA method with the following parameters: cooldown factor β = 0.99999, low probability , and for each unknown: search space half-width with σ from the LS estimation and k = 300, initial movement amplitude σ0 = /5 (with the same values for σ and k), and final movement amplitude σ0/10000. The following results were obtained with the same values for all the decimal places shown after several runs (indeed with negligible discrepancies of the order of the tuneable final movement amplitude only). For the starting solution, we used the LS estimation values, although, as expected, it was checked in several test runs that the starting values do not affect the final results either.

The LS estimator computed by IRLS is simply the standard computation of LS since the equivalent weights are 1, recall equation (6). This result completely coincides with the computation of LS by GO, as it can be seen in the corresponding columns of Table 1. This is also the reason why only a solid line can be seen in the LS subplot in Figure 1 (the discontinuous line is superimposed on the solid one). As expected, the LS estimator is strongly affected by the existing outliers and is unable to make a good fit for the main bulk of the data.

The situation is quite similar for the minimum L1 and minimum L1.5 norms: in this problem, they are unable to provide a satisfactory solution for the majority of the data, and they produce a compromise solution. In both cases, the IRLS and the GO methods yield the same results despite minimum, negligible discrepancies. LS, minimum L1, and minimum L1.5 norms produce relatively similar solutions among them, all incapable of distinguishing the main bulk of the data from their outlying observations.

The situation changes with the Geman–McClure estimator. We can see, first, that the solution is completely different if computed by IRLS (discontinuous line in Figure 2) or by GO (solid line). The Geman–McClure estimator computed by IRLS is not really different from LS or the previous estimators: it fails to fit the line to the majority of the data. The Geman–McClure estimator computed by GO, however, performs more sensitively to the main linear substructure. Looking at the corresponding best values of the Geman–McClure objective function f obtained by IRLS and GO, 14.245 and 13.009, respectively, it is clear that the IRLS method has produced a suboptimal result, that is only a local optimum in the neighborhood of its starting solution, whereas the GO method has produced a better result (presumably the global optimum).

In the case of the Welsch estimator, we also see that the solution is poor if computed by IRLS (only a local optimum of the objective function f: 23.755), whereas it is very satisfactory if computed by GO (global optimum of the objective function f: 20.899).

This is analogous to the case of Tukey, where the robust estimator successfully captures the main linear substructure in the data but only if properly computed as GO, the IRLS scheme only producing a local optimum for Tukey’s objective function.

In summary, we have experienced that, under a relatively high degree of contamination of the observations (by gross errors or minor substructures in the data), only the Geman–McClure (to a moderate extent), Welsch, and Tukey estimators succeed in finding a good fit to the majority of the data, and only as long as they are computed by GO (while they fail if computed by IRLS). Recall from Figure 1 that these are redescending estimators, having better estimation capabilities, as expected, but also strongly dependant on the starting solutions for their IRLS schemes.

Consequently, for cases where there is a large proportion of outliers with respect to the main bulk of the data, we propose not only the use of the best robust estimators, that is, redescending robust estimators like Welsch and Tukey, but also their computation in a GO scheme, inasmuch as their computation by means of IRLS may yield a local optimum only.

4.2. Sequential Robust Estimation to Solve M-Split Estimation

After having examined the previous example, one may wonder whether the main structure could be eliminated from the data after its successful identification and additional searches for minor substructures be performed then by the same procedure. This suggests the possible use of global robust estimation for what is known as M-split estimation: the estimation of parameters of two or more unknown functional models from the observation of some existing mixed data (see, e.g., [9, 36]).

We study an example with possible interest in geomatics, especially in the processing of terrestrial laser scanning data: the adjustment of data to several circles whose number, origin, and radius are unknown. The idea, as well as the size and location of the circles, has been taken from [37], but the data generated are different in our example since they include their particular random errors. The interested reader can find the data used in the Supplementary Information file.

These data correspond to the (x, y) coordinates in meters of 1050 points belonging to 7 circles with the same centers and radii as those in [37]. Here, the number of points computed for every circle has been selected so that it is proportional to the circumference length and, in order to make the problem a bit more complicated, the distribution of points in every circle is not equally spaced but taken with azimuth at random, as well as the corresponding radius being modified by a Gaussian noise of zero mean and 0.04 cm standard deviation. Figure 3 shows the generated data points and Table 3 the corresponding 7 circles.

All points (, ) belonging to a circumference of center (, ) and radius r satisfy the following equation:which can be transformed into the equivalent equation:withwhere the circumference is now described by its parameters (P1, P2, P3) instead of (xc, yc, r) with the advantage that the model equation (36) is linear in the parameters to estimate.

We perform the adjustment of the 1050 data points to the model equation (36) in order to obtain the parameters of the circle (or circles) by means of the same estimators as in the previous example and obtain the results displayed in Tables 4 and 5.

As with the previous example, the LS, minimum L1 norm, and minimum L1.5 norm estimators, computed both by IRLS and by GO, fail to produce a result that fairly represents any of the existing circles while discarding the rest of the observations. Only the redescending estimators (Geman–McClure, Welsch, and Tukey) produce a successful result as long as they are computed by GO (they also fail otherwise): they yield the parameters of circle No. 2, presumably because it is the existing substructure having more data points.

It is important to note that since the data points belonging to the circles were generated using the theoretical radius plus Gaussian noises of standard deviation 0.04 m, the parameters expected are not exactly those in Table 3 but others relatively close to them. We must therefore be satisfied with the P1, P2, P3 values in Table 5 obtained by GO for Geman–McClure, Welsch, and Tukey estimators, which, in this case, correspond to circle No. 2. For the P1, P2, P3 parameters obtained for circle No. 2, we can easily get the corresponding center (xc, yc) and radius r via equations (37)–(39). The comparison could also be made in terms of these xc, yc, r, as it will also be provided in subsequent tables.

Now we propose to discard all points in the dataset that can be ascribed to circle No. 2. It is worth noting that, due to the existence of a large number of redundant observations, we will prefer to delete a few points that possibly do not belong to circle No. 2 rather than to leave points of this circle that will not be able to be correctly ascribed to any of the remaining circles in subsequent computations.

To this end, for any point (xi, yi) in the dataset, we analyze the following condition derived from equation (35) for the values of center (xc, yc) and radius r previously obtained:where the standard deviation σ is 0.04 m as before. If this condition is fulfilled, the point is assumed to belong to the circumference whose parameters have been obtained and is eliminated from the dataset to be analyzed in subsequent computations. Recall that, as explained above, we want to eliminate all points belonging to the first circle at the risk of eliminating others too, which would constitute no problem due to a large number of redundant observations. This is the reason for using such a high threshold as 10σ in equation (40). For lower values, like 3σ or 5σ, we observed that a very small percentage of the data points underwent the entire process with no ascription to any solution.

For all the remaining points in the dataset, the process is repeated. That is, a system of equations of the type of equation (36) is again formed and solved by the redescending estimators (Geman–McClure, Welsch, and Tukey) by means of GO. Obviously, the other estimators computed either by GO or IRLS or the redescending estimators computed by IRLS are no longer considered since they were already unable to yield the parameters of the first circle. This computation is iterated until the dataset is empty or has a very low number of points (say less than 0.1% of the original total number of points).

For the Geman–McClure estimator, the results obtained in the subsequent iterations are shown in Table 6, and the misfits, left-hand side of equation (40), for the corresponding datasets in each iteration are shown in Figure 4. As expected, in every iteration, the solution produces close-to-zero misfits for a group of observations, those of the corresponding circle.

As it can be seen, the Geman–McClure estimator computed by GO produces along the sequential adjustments the circles No. 2, 4, 7, 1, 5, 3, and 6. Quite expectedly, they are obtained in the order of higher to lower number of corresponding points in the dataset.

The results are quite similar for the Welsch estimator by GO, Table 7 and Figure 5. Although the circles sequentially obtained are now circles No. 2, 7, 4, 1, 5, 3, and 6, they are still obtained from higher to the lower number of points in the dataset (circles 4, 7, 1, and 5 have the same number of points).

Finally, the results obtained for the Tukey estimator are shown in Table 8 and Figure 6. Although the parameters obtained for the circles are very similar to the ones obtained by the previous two estimators, it is unknown why the order in which they are sequentially obtained is now circles No. 2, 1, 7, 6, 4, 3, and 5, which does not strictly correspond to the highest-to-lowest number of points in the dataset. This has little importance, however, inasmuch as the seven functional models which are implicit in the data have been revealed.

In summary, only the three redescending estimators (Geman–McClure, Welsch, and Tukey) computed by GO have succeeded in revealing the different functional models implicit in the data, that is, the seven circles, yielding their corresponding parameters within their expected accuracy of the order of a few cm.

5. Conclusions

In the present paper, we have shown that the success of a robust estimation method depends not only on the robust estimator used but also on the way the estimator is computed. We have shown that in problematic cases, the predominant method of computation by means of IRLS may result in a local optimum of poor quality.

This problem, which is a current topic of research, has been proposed to be solved in the literature by requiring a robust initial solution for the first iteration of the IRLS method. We have proposed here a different alternative: the computation of the robust estimator by a GO method. A particular implementation of the SA method has also been given for this purpose.

For cases where there is a large proportion of outliers with respect to the majority of the data, the use of the best robust estimators, redescending estimators like Geman–McClure, Welsch, and Tukey estimators, were proposed along with their computation in a GO scheme, inasmuch as their computation by means of IRLS yielded local optima only. We also propose an original, equivalent redefinition of some estimators, like Tukey’s biweight estimator, in order to avoid the use of piecewise definitions for the sake of an easier implementation in a GO method.

The sequential use of these three redescending estimators—Geman–McClure, Welsch, and Tukey estimators—computed by GO was proved to successfully solve the M-split estimation, that is, the determination of parameters of different functional models implicit in the data. These estimators failed if computed by IRLS, in the same way, that other nonredescending robust estimators (least L1 and least L1.5 norms) as well as the LS estimator, irrespectively of the computation method, failed too when asked to determine any of the different functional models underlying the data sample.

In short, for an adjustment problem where the dataset is highly contaminated with outliers or there are several underlying functional models for the observations, we propose, as an original, novel alternative not to be found in the literature so far, the use of redescending estimators computed by GO as the best alternative to attain a good fit to the majority of the data. If desired, sequential adjustments may also reveal the corresponding parameters of the existing different functional models. As a future work, the effectiveness of this technique should be further analyzed in more challenging problems also with the application of other methods for GO apart from SA.

Data Availability

The data used for this study can be found in Maronna et al. ([6], p.147 Table 5.6) for Section 4.1 and the Supplementary Information file for Section 4.2.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Supplementary Materials

The x_y_7_circles.txt ASCII file contains the x and y coordinates of the points used for the example of 7 circles used in Section 4.2. (Supplementary Materials)