Abstract
We revisit some problems arising in the context of multiallelic discrete-time evolutionary dynamics driven by fitness. We consider both the deterministic and the stochastic setups and for the latter both the Wright-Fisher and the Moran approaches. In the deterministic formulation, we construct a Markov process whose Master equation identifies with the nonlinear deterministic evolutionary equation. Then, we draw the attention on a class of fitness matrices that plays some role in the important matter of polymorphism: the class of strictly ultrametric fitness matrices. In the random cases, we focus on fixation probabilities, on various conditionings on nonfixation, and on (quasi)stationary distributions.
1. Introduction
Population genetics aims at elucidating the fate of genotype frequencies undergoing the basic evolutionary processes when various driving “forces” such as fitness, mutation, or recombination are at stake in the gene pool. This requires to clarify the updating mechanisms of the gene frequency-distributions over time. Another important additional driving source is the genetic drift whose nature is exclusively random. The corresponding field of interest is the statistical theory arising from this aspect of the gene replacement processes and it requires some use of the Markov chain theory (see [1]).
In this note, we revisit the basics of both the deterministic and stochastic dynamics arising in discrete-time asexual multiallele evolutionary genetics driven only by fitness. We do not touch at all upon other important mechanisms such as mutation. We start with the haploid case with alleles before switching to the more interesting diploid case.
Let us summarize and comment the material developed in Section 2. In the deterministic haploid case, a vector of fitness is attached to the alleles. The interest is on the evolution of the allelic frequency distribution over time. The updates of the allele frequency distributions are driven by the relative fitnesses of the alleles, ending up in a state where only the fittest monomorphic state will survive. This state is also an extremal point of the simplex over which the dynamics takes place. From this dynamics, it appears that the mean fitness increases as time passes by, the rate of increase being the variance in relative fitness (a well-known particular incarnation of the Fisher theorem of natural selection). We recall the background and the evolution equations.
The haploid replicator dynamics, as a nonlinear updating mapping from the simplex to the simplex, may be viewed as the discrete-time nonlinear Master equation of some random Markov process. We supply a construction for this process thereby giving a stochastic interpretation to the original deterministic formulation of the dynamics.
In the diploid case, there is a similar deterministic updating dynamics but now on the full array of the genotype frequencies. It involves the fitness matrix attached to the genotypes. When mating is random so that the Hardy-Weinberg law applies, we may look at the induced marginal allelic frequencies dynamics. The updating dynamics looks quite similar to the one occurring in the haploid case except that the mean fitness is now the mean fitness quadratic form in the current frequencies whereas marginal fitnesses are no longer constant but affine functions in these frequencies. As for the haploid case, it is possible to construct a Markov process whose nonlinear Master equation coincides with the diploid replicator dynamics. We supply this construction which we believe is new.
In the diploid context, the Fisher theorem still holds true but, as a result of the fitness landscape being more complex, there is a possibility for a polymorphic equilibrium state to emerge. Due to its major evolutionary interest, our subsequent concern is to identify examples of diploid dynamics leading to a unique polymorphic state on the simplex, either unstable or stable. We start with the unstable case and draw the attention on a class of fitness matrices leading to a unique unstable polymorphic equilibrium state: the class of strictly potential matrices. Strictly ultrametric matrices are particular instances of strictly potential matrices [2] which therefore will display unstable polymorphism as well. There is a useful canonical representation of strictly ultrametric matrices due to [3] which we recall which helps giving specific examples of strictly ultrametric matrices. When dealing with the class of strictly potential fitness matrices, the mean fitness quadratic form is definite-positive; we derive a related class of fitness matrices leading to a definite-negative mean fitness quadratic form. For this class of matrices, there will also be a unique polymorphic equilibrium state for the diploid dynamics and it will be stable. We also draw the attention on a subclass of the latter: the so-called “antistrictly ultrametric matrices.” For such matrices, among other things, the fitness of each homozygote should not exceed the ones of all the heterozygotes carrying the allele of the homozygote. To the best of our knowledge, the large class of potential fitness matrices as natural candidates for polymorphic states to emerge was not discussed in the literature.
Section 3 is devoted to the stochastic version of these considerations when the transitions in the constitutive allelic population sizes are given by a -dimensional Wright-Fisher model with total constant-size (see [1, 4]). It takes into account an additional important driving source of evolution, namely, the genetic drift, whose nature is exclusively random. Under fitness only and in particular in the absence of mutations, the multiallelic Wright-Fisher model is a transient Markov chain on a -dimensional state-space whose absorbing states are the monomorphic states. We give an expression for the fixation probabilities of this process. Then, we develop four conditioning problems: conditioning on fixating in a given monomorphic state, conditioning on avoiding the extremal states before the current instant, conditioning on nonfixation at each transition time, and conditioning on avoiding the extremal states in the remote future. For the last three conditioned processes, the equilibrium structure of the fitness mechanism shows up.
Finally, we run into similar considerations for the Moran model. When dealing with the fixation probabilities in this Moran context, we suggest a new mean-field approximation of these probabilities which is based on a well-known explicit formula for the 2-alleles case [1]. It concerns the case of multiplicative fitnesses only. Finally, we consider the Moran model conditioned on nonfixation at each transition time. We exploit the reversible character of this process to derive a new explicit product formula for its invariant probability measure.
2. Deterministic Evolutionary Dynamics
We start with the haploid case before moving to the more interesting diploid case.
2.1. Single Locus: Haploid Population with Alleles
Consider alleles attached to a single locus. Suppose that the current time- allelic frequency distribution is given by the column vector . (Throughout, a boldface variable, say , will represent a column-vector so that its transpose, say , will be a line-vector.) We therefore have , the -simplex as a convex subset of with dimension . Let denote the absolute fitnesses of the alleles Let be the mean fitness of the population at time . We will also need the variance in absolute fitness, and the variance in relative fitness .
2.1.1. Dynamics
From the deterministic evolutionary genetics point of view, the discrete-time update of the allele frequency distribution on the simplex is given by (the symbol is a common and useful notation to denote the updated frequency.) The quantity therefore interprets as the frequency-dependent Malthus growth rate parameter of As required, the vector , maps into . In vector form, with , the nonlinear deterministic evolutionary dynamics reads or, with , the increment of Avoiding the trivial case where fitnesses are all equal, without loss of generality, we can assume that either or Thus that allele or has largest fitness.
2.1.2. Mean Fitness Increase
According to the dynamical system (2.4), unless the equilibrium state is attained, the absolute mean fitness increases: The mean fitness is maximal at equilibrium. The rate of increase of is which is the variance in relative fitness defined in (2.3). These last two facts are sometimes termed the 1930s Fisher fundamental theorem of natural selection. The equilibria of (2.4) are the extremal states (-faces) of the boundary of To make it simple, if there is an allele whose fitness is strictly larger than the ones of the others, the deterministic evolutionary dynamics (2.4) will attain an equilibrium where only the fittest will survive; starting from any initial state of which is not an extremal (or monomorphic) point, the haploid trajectories will converge to this fittest state.
2.1.3. A Stochastic Interpretation of the Deterministic Dynamics (2.4)
A vector of can be thought of as a probability vector. The dynamical equation (2.4), as a nonlinear update mapping from to may be viewed as the discrete-time nonlinear Master equation of some Markov process whose construction we now give. Suppose that we have a population of haploid individuals each of which can be of one among types or colors (carrying one among the possible alleles). We will need to introduce an extra color-state, say which will be absorbing for the process we will now construct Let be the random color distribution of the population at time , therefore with enlarged state-space Assume that the individuals are indistinguishable leading to the exchangeability property: (equality in distribution). Let be i.i.d. driving sequences of uniformly distributed random variables on , independent of Let . To decide the allele carried by the individual number at time , with being the indicator function of the event , consider the random Markovian dynamical system: Here and is the first time that hits the absorbing state As a result of , we naturally assume For each our model therefore attributes a positive probability that for all . Although, in principle, there is a possibility that the type of the th particle is the one of the fictitious unobservable allele as a result of (2.9), the sample paths of leading to this are ruled out because focus is on the observable states only.
In words, for the dynamics (2.9), the observable event is realized (together then with ) if the proportion at of type- individuals, weighted by the corresponding scaled fitness , is large enough (compared to ) and of course if the whole process was not absorbed at in the previous step. Taking first the expectation with respect to the driving noise in (2.9) we get Putting recalling and taking the expectation with respect to , we get an unnormalized version of (2.4): We have geometrically fast Defining the normalized conditional probabilities we obtain the normalized haploid dynamics (2.4): It now may be viewed as the nonlinear Master equation of some stochastic Markov process. Let us make some miscellaneous remarks.
Clearly this construction makes also sense if (a single particle). When is finite, we should stress that the initial condition can be chosen to be deterministic, say with , for some sequence of integers satisfying () and quantifying the initial population sizes It could also be chosen to be random, with defining the initial probability distribution of the alleles. This occurs in the large limit if so that The latter choice may therefore be interpreted as a large limit of the former one. In the stochastic interpretation (2.9) of the deterministic dynamics (2.4), can be interpreted either as the probability that the random allele carried by a typical individual is or like the expected proportion of the individuals of type within the whole population (a frequentist point of view). The appeal to the coffin state was a necessary step to understand the normalization Even though is exchangeable, it is not true that, with any two distinct individuals their random labels and are independent. The random algorithm allowing to update the joint types of and could be written down but is much more involved.
2.2. Single Locus: Diploid Population with Alleles
We now run into similar considerations but with diploid populations.
2.2.1. Joint Evolutionary Dynamics
Let stand for the absolute fitness of the genotypes attached to a single locus Assume ( being proportional to the probability of an surviving to maturity, it is natural to take ). Let then be the symmetric fitness matrix with -entry .
Assume that the current frequency distribution at time of the genotypes is given by Let be the frequencies array with -entry . The joint evolutionary dynamics in the diploid case is given by the updating: where the mean fitness is now given by The relative fitness of the genotype is . The joint dynamics takes the matrix form: where stands for the (commutative) Hadamard product of matrices.
Let be the flat matrix whose entries are all . Then We will also let stand for the genotypic variance in absolute fitness and will stand for the diploid variance in relative fitness.
Consider the problem of evaluating the increase of the mean fitness. We have with a relative rate of increase: This is the full diploid version of the Fisher theorem.
2.2.2. Marginal Allelic Dynamics
Assuming a Hardy-Weinberg equilibrium, the frequency distribution at time say , of the genotypes is given by where is the marginal frequency of allele in the whole genotypic population The whole frequency information is now enclosed within For instance, the mean fitness is now given by the quadratic form: with being the transposed line vector of the column vector ( the unit -vector). We will also let stand for the genotypic variance in absolute fitness and will stand for the diploid variance in relative fitness.
Consider now the update of the allelic marginal frequencies themselves. If we first define the frequency-dependent marginal fitness of by , the marginal dynamics is given as in (2.4) by: This dynamics involves a multiplicative interaction between and , the th entry of the image of by . In (2.22) there is a normalization by the quadratic form In vector form (2.22) reads where maps into . Iterating, the time- frequency distribution is
Except for the fact that the mean fitness in (2.22) is now a quadratic form in and that the marginal fitness of is now frequency-dependent, depending linearly on , as far as the marginal frequencies are concerned, the updating formalism (2.22) in the diploid case looks very similar to the one in (2.4) describing the haploid case.
In the diploid case, assuming fitnesses to be multiplicative, say with then and the dynamics (2.22) boils down to (2.4). However, the mean fitness in this case is and not as in the haploid case.
2.2.3. A Stochastic Interpretation of the Deterministic Dynamics (2.22)
As for the haploid case, there is a Markov chain governed by the Master equation (2.22). Consider a population of diploid individuals. The number of alleles in this population is therefore twice the number of genes. Each allele can be of one among types or colors (carrying one among the possible alleles). As before, we introduce an extra color-state, say which is absorbing for the process to be constructed For , let be two independent copies of the random color distribution of the allelic population at time Let . Assume that the alleles are indistinguishable within each sample, leading to . For let be two mutually independent i.i.d. driving -sequences of uniformly distributed random variables on and independent of To decide the type of the random allele at time , consider now the Markovian dynamical system: Here are the first hitting times of each of the absorbing state and for any matrix norm say, for example, . We assume
In words, for this new dynamics, the observable event is seen to be realized together with if two natural independent conditions are now satisfied which can be read from the two indicator functions in the right-hand side of (2.25):(i)first, the proportion at of type- individuals of the first copy should be large enough (compared to );(ii)second, for the second sample copy , the expected fitness of the genotypes containing allele at should be large enough (compared to ).
As for the haploid case, it is necessary that both processes should not be absorbed at in the previous step. Taking first the expectation with respect to the independent driving noises in (2.25), we get Putting taking the expectation with respect to and using our independence and exchangeability hypotheses, we get corresponding to an unnormalized version of (2.22): Defining the normalized conditional probabilities we obtain the normalized nonlinear Markovian diploid dynamics (2.22): Note that this construction makes sense if (a single individual and its 2 alleles of possible types). The need for two copies of was governed by the quadratic character of the interaction appearing in the numerator of (2.22).
2.2.4. Increase of Mean Fitness
Again, the mean fitness as a Lyapunov function, increases as time passes by. We indeed have because, defining , we have Its partial rate of increase due to frequency shifts only is It satisfies where is the allelic variance in relative fitness:
2.2.5. An Alternative Representation of the Allelic Dynamics
There is an alternative vectorial representation of the dynamics (2.22). Define the symmetric positive-definite matrix with quadratic entries in the frequencies: Introduce the quantity which is half the logarithm of the mean fitness Then, (2.22) may be recast as the gradient-like dynamics: with as a result of . Note The dynamics (2.36) is of pure gradient-type with respect to the Svirezhev-Shashahani distance metric ; see [5, 6] For this metric, the distance between and of is From (2.33) and (2.34), this quantity, which is the length of satisfying , is also the square-root of half the allelic variance (the standard deviation) in relative fitness.
2.3. Equilibria (Diploid Case)
The mean fitness increase phenomenon occurs till the evolutionary dynamics reaches an equilibrium state. We wish to briefly discuss the questions relative to equilibria in the diploid case.
2.3.1. Preliminaries
In contrast with the haploid case, in the diploid situation, the dynamics (2.22) can have more complex equilibrium points, satisfying , and To avoid linear manifolds of equilibria, we first assume that all principal minors of are nonsingular and also that the fitnesses of all homozygotes are positive. In this case, from the Bézout theorem, the number of equilibria is finite and less or equal than the number of faces of Note that the extremal endpoints of (-faces) are always monomorphic fixed points of (2.22).
An instructive example fulfilling these preliminary conditions is . There are equilibrium points (the barycenters of the -dimensional faces, ), but only one polymorphic equilibrium which is the barycenter of This point is the one with minimal fitness and it is unstable. The -faces are stable fixed points whereas the barycenters of the faces with are saddle-points. The simplex could be partitioned into pieces each of which being the attraction basins of the stable -face states: in contrast with the haploid case, the type of the survivor is not necessarily the one of the fittest; it will depend on the initial condition.
Similar conclusions can be drawn if instead of we start with where satisfies (meaning , for all ). In this case again, there is only one unstable polymorphic equilibrium which is easily seen to be
Due to its evolutionary interest, we would like now to discuss the opportunity for a polymorphic state to be stable. Under the above assumptions on , a unique stable internal (polymorphic) equilibrium state can exist, necessary and sufficient conditions being that there is a unique for which and has exactly one strictly positive dominant eigenvalue and at least one strictly negative eigenvalue or else the sequence of principal minors of alternates in sign (see Kingman [7]). If this is the case, the equilibrium polymorphic state is It is stable in the sense that it is a local maximum of the mean fitness , with . Since , the linearization of at has no eigenvalue of modulus and so is hyperbolic and/or isolated (see [8]). A stable isolated polymorphic state is asymptotically Lyapunov stable.
Under these additional assumptions therefore on , starting from any initial condition in the interior of all trajectories will be attracted by which is an isolated global maximum of
When there is no such unique globally stable polymorphic equilibrium, all trajectories will still converge but perhaps to a local equilibrium state where some alleles get extinct. Which allele and how many alleles are concerned seems to be an unsolved problem in its full generality.
2.3.2. Special Classes of Fitness Matrices Leading to a Polymorphic State
We now draw the attention on a particular class of fitness matrices that lead to a polymorphic state, either unstable or stable. We start with the unstable case, extending the above special diagonal case leading to a unique unstable polymorphic state.
 The Unstable Case
Let  be a symmetric irreducible strictly substochastic matrix satisfying  the positive mass-defect vector of  is  Let  Define the symmetric strictly potential matrix: , with  defining a strictly row-diagonally dominant Stieltjes matrix with the properties [3]:  for  and  for all . Then
											
										The vector  is called the equilibrium potential of  We have . For this class of  therefore,  admits a positive solution .
Conversely, given a nonsingular matrix  satisfying  for some , the matrix  defines a substochastic matrix if and only  satisfies  for  and . Then,  is row-diagonally dominant and  is a potential matrix.
Strictly ultrametric (sUm) matrices are special classes of positive-definite and symmetric strictly potential matrices [2]. An sUm matrix is symmetric with nonnegative entries, satisfying (i) , for all and (ii) , for all (If in condition is substituted for , then is simply an ultrametric matrix and this new condition is implied by ). If is an sUm matrix, the fitness dynamics will admit an unstable polymorphic equilibrium state, as a result of being positive-definite.
Remark 2.1. Suppose that is substochastic and primitive. Then is an ultrametric matrix If is the Hadamard reciprocal of with entries it satisfies that Therefore is an ultrametric distance associated to the ultrametric potential . Tree matrices are ultrametric matrices that are not sUm.
 The Stable Case
To produce a stable equilibrium state from the sUm matrix construction, let  define a symmetric strictly potential matrix as before. Then, there is a  for which  With , define
											
										With , we have  and we can choose  so that . We have  and  for all  satisfying  showing that  is now a stable polymorphic state for . If  is an sUm matrix, then clearly  satisfies the “anti-sUm” property expressing a fitness domination of the heterozygotes  over the homozygotes:
											
										Nonnegative symmetric negative-definite anti-sUm fitness matrices  will therefore display a stable polymorphic equilibrium state . Note that  is now the maximal value of the mean fitness.
Example 2.2. When , with , , and , let define the fitness matrix with selection parameter and dominance This is sUm if and only if and The equilibrium state is and it is unstable. This is anti-sUm if and only if and The equilibrium state is the same but it is now stable.
Note that a singular multiplicative fitness matrix of the form cannot be a strictly potential matrix because its determinant is zero.
2.3.3. A General Construction to Produce sUm and Anti-sUm Matrices
Consider the problem consisting in splitting binarily and recursively the set till complete reduction to singletons (leaves) which are left behind in the process. For instance, consider the refinement sequence with of :
Starting from the left, there are blocks of symbols (the total number of nodes in the splitting binary tree with leaves). To each encountered block, numbered from to , starting from the left, attach a vector of size with th entry if symbol is in the block string otherwise. For instance, from the above sequence, , and . To each such , attach a number which is if (the leaves) and if (the internal nodes, including the root). Then (see [3]), for any choice of respecting these constraints is an sUm matrix and any sUm matrix can be represented in this way. Because for the corresponding to the leaves the diagonal terms of are necessarily
Since for each set , there are splitting tree sequences where satisfies , there are many ways to generate an sUm matrix
Clearly, for each splitting procedure, with may be written as where the matrices take values in
Now, with any matrix of the form is an anti-sUm matrix. Assuming and may be written under the form: where It has at most independent parameters, namely, the , and If the are known, then is a one-parameter family of fitness matrices. (Fitness matrices of the form were considered in [9] in the context of the estimation of problem).
Examples 2.3. Assume also that , for all  with  being a selection parameter Then, defining the -valued matrix
											
										an anti-sUm matrix of the form  will admit a stable polymorphic equilibrium. Clearly,  itself is anti-sUm. Because of this, there is a  such that  Thus  showing that, with  We thus have  and so . Furthermore, the equilibrium mean fitness for such models is 
For the following simple sequence example with  (four alleles, say A,C,T,G),  we find 
											
										which itself clearly is a symmetric anti-sUm matrix, together with . For this example,  and the equilibrium mean fitness is 
Note that taking  just for the indices  that were initially chosen to satisfy  would also lead to anti-sUm matrices  and  In this case, the sum (2.46) defining  should then be restricted to the indices  from  to  for which . Proceeding in this extreme way for the above simple example, we get the borderline anti-sUm shapes: 
											
										Here,  is simply the barycenter of the 4-simplex.
Remark 2.4. The ultrametric conditions (2.41) should be compared with the so-called “triangle inequality” conditions (leading to stable polymorphism) pointed out in [10], which read It is clear that the class of anti-sUm matrices is a particular subclass of the Lewontin one.
3. Stochastic Evolutionary Dynamics
We now switch to the random point of view of multiallelic evolutionary dynamics driven by selection. There are two models of interest: the Wright-Fisher and the Moran models. We start with Wright-Fisher.
3.1. The Wright-Fisher Model
The Wright-Fisher model is a discrete space-time model which takes into account another important driving source of evolution, namely, the genetic drift whose nature is exclusively random.
3.1.1. The Model and Its First Properties
Consider an allelic population with constant size In the haploid (diploid) case, is (twice) the number of real individuals Let and be two vectors of integers quantifying the size of the allelic populations at two consecutive generations and . We will let Suppose that the stochastic evolutionary dynamics is now given by a Markov chain whose one-step transition matrix from states to is given by the multinomial Wright-Fisher model: Therefore, given to form the next generation, each allele chooses its type independently of the others with probability where is given either by (2.4) in the haploid case or by (2.22) in the diploid case. In the diploid case, the mechanism is assumed to present a unique polymorphic state, either stable or unstable.
The state-space dimension of the Markov chain governed by (3.1) is (the number of compositions of integer into nonnegative parts which is also the number of ways to assign indistinguishable balls into distinguishable boxes). To view as a standard transition matrix of some Markov chain, we need first to order the states and in (3.1). Starting from the bottom right corner of states should be arranged in decreasing order when listing the entries of moving up and left along the lines and columns, respectively; or equivalently, starting from the top left corner of states should be arranged in increasing order when moving down and right.
For example, we can order the states in decreasing order from to as follows. Let be a base- code of the state that will serve as a ranking function The largest state for which is maximal (equal to ) is . Given a state define the subsequent state in decreasing order as Then and to pass from state to the next state in this decreasing sequence, there is a unique with entries in satisfying and such that This way to order the states is consistent with the reverse lexicographic order. For instance, if , there are states ordered in decreasing order as follows:2
The way the digits are propagated from left to right is clear and the consecutive can easily be obtained. Proceeding in this way to order states, is a well-defined conventional object (matrix).
From (3.1), the marginal transition matrix from to is binomial bin with: Given , the th component of the updated state is random with suggesting that, in the large population limit, the deterministic evolutionary dynamics should be recovered. Indeed, from (3.1), the Laplace-Stieltjes transform of the joint law of reads From the strong law of large numbers therefore, if , then, given , which is (2.22): when the population under study is very large, random fluctuations as modelled by (3.1) can be ignored so that the gene frequencies evolve deterministically.
3.1.2. Fixation Probabilities
Let be the -null vector except for its th entry which is The extremal pure states are all absorbing for this Markov chain because and, from (3.1), any additional fixed point which could have on the boundary-faces of which are not points does not give rise to an absorbing state for . Under our assumptions on , the chain is not recurrent, rather it is transient. Depending on the initial condition, say , the chain will necessarily end up in one of the extremal states , with some fixation probability, say , which can be computed as follows. Let be the harmonic function of the Wright-Fisher Markov chain which is the smallest solution to the boundary problem: We also have where ( almost surely) is the random hitting time of for and the s are normalized so as . Thus are the searched probabilities to end up in state starting from state .
From (3.7), is known if . The remaining unknown restriction, say , of to the nonextremal states is easily seen to be In (3.9), is obtained from after erasing the lines and columns corresponding to all the extremal states and is the -column of where the entries corresponding to the extremal states have been deleted When dealing with the Wright-Fisher model, is a positive matrix and also , therefore for all and this for each : starting from any state which is not an extremal state, there is a positive probability to hit any of the extremal states Fixation of the state means extinction of the remaining monomorphic states. It would therefore be of interest to understand the structure of the set for each which is the stochastic version of the attraction basin of especially when is the label of the extremal state with largest fitness . If indeed, the probability to end up in is larger than the probability to end up in any other extremal state.
Unfortunately, the development of the inverse of in terms of its adjugate matrix in (3.9) shows that these fixation probabilities have a very complex determinantal alternating structure and the question of identifying is very complex.
With being the solution (3.9) to the Dirichlet problem with boundary conditions (3.7), the equilibrium measure of the chain therefore is which depends on . Necessarily, one allele will fixate and there is no polymorphic equilibrium state even when dealing with diploid populations. Which allele and with what probability will depend on the initial condition. Thanks to fluctuations, the picture therefore looks very different from the one pertaining to the deterministic theory.
Of importance also is the time it takes to get extinct. It relies on similar techniques. For instance, the expected overall fixation time solves the boundary problem: where . The restriction of to the nonextremal states therefore is
3.1.3. Conditioning on Nonfixation
There are four places where questions relative to conditioning on fixation are relevant in this context. (Similar conditioning problems were considered in [11] in the context of the 2-alleles Wright-Fisher diffusion).
Consider the full fixation vector . Remove from the states for which only, keeping the one, for which The size of this vector, say , is . Let be the corresponding -reduced transition matrix obtained from after erasing the lines and columns corresponding to all the monomorphic states except . The Markov chain conditioned to exit in the extremal state only admits the stochastic transition matrix: It is obtained from after a diagonal Doob transform based on . The chain governed by admits a unique absorbing state which is . The entries of are and for this new conditioned Markov chain, transitions to states for which are favored.
Let us now consider again the fully reduced transition matrix obtained from after erasing the lines and columns corresponding to all the monomorphic states. The matrix is substochastic and irreducible, with . The law of the process corresponding to conditioned on avoiding the monomorphic states before evolves as follows: with being the time needed for first hitting one of the extremal states for , let . Then, with , is the nonlinear Master Equation governing its evolution [12]. The reduced state-space dimension of this Markov chain is and where is the left Perron probability eigenvector of associated to the dominant Perron eigenvalue , namely, . ( is called a Yaglom limit (see [13]) or a quasistationary distribution). If the process is started using this limiting quasistationary distribution, it remains in the same state over time and the fixation time is geometrically distributed with success probability .
One can define another stochastic process which admits the stochastic transition matrix: again defined on the reduced state-space For each we have and the conditioning on nonfixation occurs at each transition time. This process is an ergodic Markov chain with invariant probability measure solving It has the following closed-form determinantal expression (see [14, Section 6] and [15, page 1559]): where is the submatrix resulting from the deletion of row and column of . The question as to whether the process governed by is reversible or not arises. Defining the transition matrix of the time-reversed process by it does not hold that and so detailed balance does not hold. Indeed, is similar to the transition matrix of an ergodic Wright-Fisher model and Wright-Fisher chains are not reversible.
If we condition on nonfixation in the remote future (see [16] for additional details), we get a Markov chain whose stochastic transition matrix is Here is the positive right Perron eigenvector of associated to the Perron eigenvalue satisfying This vector can be chosen so that where is again the left Perron probability eigenvector of associated to (see [17]). With , we have . The process governed by is an ergodic Markov chain whose invariant probability measure is the Schur product of and with th entry
For the last three conditionings, it is difficult to extract some information on the limiting distribution, either or or respectively. This is because it would suppose to solve the eigenvalue problems explicitly which is out of reach, at least theoretically. However, assuming a diploid population with a polymorphic equilibrium state for , we expect that these distributions will present a global (local) maximum (minimum) near if is stable (unstable). These limiting distributions should be more sharply peaked around the extremum if we consider the conditioning compared to because, the latter conditioning being more stringent than the former, it should charge more heavily the sample paths that stay away from the monomorphic states.
Finally, we would like to stress that all these considerations are also relevant in the context of another fundamental stochastic model arising in the context of evolutionary genetics. We will give some elements of how to proceed with this model presenting very different properties.
3.2. The -Alleles Moran Model
We now focus on the Moran model.
3.2.1. The Multiallelic Moran Model
Let In the Moran version of the stochastic evolution, given , the only accessible values of are the neighboring states: where Here is in position and in position corresponding to the transfer of an individual from the box (if nonempty) to the box With being the number of nonempty entries of , there are accessible states from . The Moran stochastic evolutionary dynamics is now given by a Markov chain whose one-step transition matrix from states to is where is given either by (2.4) in the haploid case or by (2.22) in the diploid case.
Summing over , in (3.20), we get the holding probability completing the characterization of the -alleles Moran model. Under our assumptions on , the holding probabilities are equal to only for the extremal states which are therefore the only absorbing states of the Moran chain, just like for the Wright-Fisher model. The drift at state is
Let us compute the Laplace-Stieltjes Transform of in the context of a Moran model. Omitting the argument in we get the factorized form: Putting if , the th marginal reads showing that is of the random walk type. Indeed, we get if or and We have There is only a small correction (of order ) of the updated mean to its current value and fluctuations around the mean are small too (of order ). The evolution process is very slow.
3.2.2. Fixation Probabilities
As a random walk model, the Moran model has a much simpler transition matrix of the Jacobi type. The equilibrium measure of the chain again is where again solves the Dirichlet problem (3.7) but with this new simpler Jacobi . For the Moran model, the explicit expression (3.9) of the fixation probability simplifies a little bit because only for the neighboring states of that is for some .
When (2 alleles), the random walk transition probabilities () are the probabilities that the first component of moves down and up by one unit, respectively. In this case, the Dirichlet problem giving the fixation probabilities solves explicitly. With () being the harmonic function of the 2-alleles chain, we easily get that is the probability that the extremal state is reached given Assuming a model with multiplicative fitnesses: then takes the simple form () showing (see [1, page 109]) that Assuming and , putting , for large , we get [18]
In the general fitness case where, as usual, leading to
This 2-alleles exact solution can be used in the full diploid -alleles Moran case with multiplicative fitnesses. Indeed, from this, with , the fixation probability of can be conjectured to be approximated qualitatively by This can be justified as follows: mark one particular box with size corresponding to the allele with fitness . Then clump the remaining boxes into a single box with size corresponding to a fictitious allele with average fitness We are left with a 2-alleles Moran multiplicative fitness model for which (from the 2-alleles exact solution) the fixation probability of is given by (3.35). This formula constitutes sort of a mean field approximation to the full Dirichlet problem associated to the Moran model.
Assuming , a Kimura-like approximation of (3.35) would lead for large to where is now a point of the continuous -simplex different from the -unit vector .
From (3.35), when the fitness of allele is large (small), compared to the average fitness of the remaining alleles, then the fixation probability of gets close to (resp., to ). Note also that the larger is, the larger the fixation probability is.
As required also, for all As another particular initial configuration case, suppose that we start from the 2-alleles type state: where the is in position (i.e., ) and the entry in position (i.e., ). Although for this choice of the initial state, the fixation of is very likely, there still is a positive probability that allele gets fixed which is seen to be from (3.35): depending only on the relative fitnesses of and (see [19, 20] for a similar expression). As gets large, this probability gets close to if and close to if
Conversely, assuming the 2-alleles type state to be defined by and , which, as required, gets close to as gets large if and close to if
3.2.3. Conditioning on Nonfixation
The conditioning developments discussed for the Wright-Fisher model are also relevant in the Moran model context substituting the of Moran for the of Wright-Fisher. Let us revisit condition .
With being now the reduced substochastic matrix of the full Moran transition matrix defined in (3.20), consider the stochastic process with stochastic transition matrix: defined on the reduced state-space with dimension The process is again an ergodic Markov chain with invariant probability measure solving With being the transition matrix of the time-reversed process it now holds that and so detailed balance holds when dealing with the Moran case (see also [21]). This will be proved if we can exhibit an equilibrium probability measure such that for all neighboring state of .
With any terminal state, suppose that we want to use (3.39) to compute starting from the smallest available state in the system which is . This is possible because may be represented as where can be chosen so that Let us give some details.
(i) Note first, by reversing path, that for two consecutive states , the ratio can be computed. We have where for some of the form and therefore Clearly, from such a structure of the entries of , for each possible transition , the ratio will only depend separately on a ratio involving the terminal and the initial states and (the detailed balance condition holds).
(ii) Second, the sequence of -s in (3.40) is governed by the following path starting from and ending up in the target state : we can use the following sequence of s filling up successively the entries of to the left of the last entry of by using the individuals of the reservoir state By doing so, each intermediate state is separated from the next by some clearly identified , and after evaluating the probability ratio for each consecutive states of this sequence, we are done. Because there exists a probability distribution such that the reversibility identity (3.39) holds, then this Moran process is reversible with (3.40) as its stationary distribution. Using (3.42), this constitutes an exact explicit product-form formula.
4. Concluding Remarks
This paper studies a classical population genetic model describing a one-locus multiallelic population subject to natural selection, random mating and then random genetic drifts as from the Wright-Fisher and Moran models. Although the two-alleles version of this model is fairly well studied and understood, this is not so much the case of the multiallelic one, especially in the discrete-time context which we adopt here. Let us summarize our results emphasizing that the ones which we believe are new.
Considering first the deterministic updating mechanisms driven by selection, we underline that it has the form of a nonlinear Master equation suggesting that it is possible to construct an underlying Markov process governed by this Master equation. We briefly and intuitively supply such a construction.
In the diploid context, we pay attention on a class of fitness matrices that leads to polymorphism. Would the equilibrium polymorphic state be unstable, we suggest that the class of potential matrices constitute a large such admissible class of fitness matrices. It contains the class of strictly ultrametric matrices which therefore deserves some interest. To the best of our knowledge, there is no discussion of such fitness models in the population genetics context. Would the polymorphic state be stable, we derive a related class of fitness matrices leading to a definite-negative mean fitness quadratic form. Some simple examples are supplied and detailed.
The last section is devoted to the stochastic version of these considerations taking into account an additional important driving source of evolution, namely, the random genetic drift. When driven by selection only and in particular in the absence of mutations, the multiallelic Wright-Fisher model is a transient Markov chain whose absorbing states are the monomorphic states. We give an expression for the fixation probabilities for this process. Then, we develop four conditioning problems: conditioning on fixating in a given monomorphic state, conditioning on avoiding the extremal states before the current instant, conditioning on nonfixation at each transition time, and conditioning on avoiding the extremal states in the remote future. Finally we run into similar considerations but for the Moran model. When dealing with the fixation probabilities in this Moran context, we suggest a mean-field approximation of these probabilities which is based on a well-known explicit formula for the 2-alleles case. It concerns the case of multiplicative fitnesses only. Finally, we consider the Moran model conditioned on nonfixation at each transition time. We exploit the reversible character of this process to derive an explicit product formula for its invariant probability measure.