Sparse channel estimation algorithms for OTFS system

Orthogonal time-frequency space (OTFS) modulation, which has recently been proposed in the literature, is one of the promising techniques designed in the 2D Delay-Doppler domain adapted to combat high Doppler fading channels. However, channel estimation in high Doppler scenarios in advanced mobile-communication systems is still a challenging task. In this paper, the problem of channel estimation in the Delay-Doppler domain of the OTFS is focused on. First, a simple adaptation of the generalized orthogonal matching pursuit procedure, which will serve as a baseline method in this work, is proposed. Then, iterative algorithms are derived beneﬁciating from the sparsity of the channel. The unknown channel vector is separated into an unknown sparse support vector corresponding to the delay and Doppler taps, and an unknown vector of channel gains. These algorithms involve 𝓁 1 -norm minimization and a two-stage iterative procedure to recover alternatively the channel support and its coefﬁcients. The estimation problem is also addressed from a Bayesian point of view. The sparse representation is reformulated as a speciﬁc marginalization of the maximum a posteriori problem on the support of the channel. To deal with the intractability of this problem, two existing techniques are adapted to this context, namely: The Monte Carlo Markov chain with the Gibbs sampler and variational mean-ﬁeld approximation with the variational Bayesian expectation-maximization procedure. Finally, to assess the performance of the proposed algorithms, their complexity and performance are compared against existing methods. Experimental tests, conducted in high-mobility scenarios and low-latency applications, show that the proposed schemes are slightly more expensive in terms of complexity load but perform signiﬁcantly better in terms of normalized mean square error and bit error rate.


INTRODUCTION
Orthogonal frequency division multiplexing (OFDM) is the most popular modulation technique deployed in 4G longterm evolution and 5G mobile-communication systems [1]. It is known for its robustness and high spectral efficiency in time-invariant frequency selective channels. However, its performance deteriorates when the Doppler effect is high (in high-mobility environments, such as high-speed trains); hence, the need for other modulation techniques to deal with this problem [2]. Orthogonal time frequency space (OTFS) modulation, recently proposed in [3,4], is one of the promising techniques for advanced mobile-communication systems (5G and beyond) thanks to the increased performance that it provides compared to OFDM modulation. With OTFS, each data symbol is multiplexed onto 2D orthogonal basis functions specifically developed to deal with time-varying multipath channels dynamics at high speeds.
To detect data with minimal errors, the channel's impulse response must be known on the receiver side. Several channel estimation (CE) schemes for OTFS systems have been proposed in the literature. Let's briefly review some of them.
In [5], a 2D turbo compressed sensing (CS) algorithm for CE is featured based on the previous work on turbo CS for fast time-varying CE suggested in [6]. The support matrix of the Delay-Doppler channel is modeled by using a Markov random field and the Bernoulli Gaussian distribution. multiple-output (MIMO) OTFS networks. The MIMO-OTFS signal model along the uplink and the reciprocity between the uplink and the downlink are formulated to recover the parameters of the channel including the delay, the angle, the Doppler frequency, and the channel gain for each physical scattering path.
A 3D-structured orthogonal matching pursuit algorithmbased CE technique is proposed in [8], which exploits the 3Dstructured channel sparsity in the Delay-Doppler-angle domain.
In [9], a CE scheme is established using pulses in the Delay-Doppler domain as pilots adapted for MIMO-OTFS. The equivalent MIMO-OTFS channel matrix is obtained using a single MIMO-OTFS frame thanks to the sufficient spacing between pilots and data in the Delay-Doppler domain.
In another work [2], an embedded pilot-aided CE scheme for OTFS is highlighted. In each OTFS frame, pilot, guard, and data symbols are arranged appropriately in the Delay-Doppler domain to avoid interference between the data symbols and the pilot at the receiver side. The used CE is founded on a threshold method. The estimated channel is fed to data detection through the message-passing technique proposed in [10].
The authors in [11] proposed a pilot design and optimization for the OTFS system to improve the accuracy of CE. The classical orthogonal matching pursuit (OMP) technique is adopted for CE.
In [12], a Bayesian learning-aided sparse CE scheme for OTFS modulated systems are featured. The pilots are directly transmitted over the time-frequency domain grid for estimating the Delay-Doppler domain channel state information (CSI). Within the same context in [13], an off-grid CE scheme for OTFS systems is proposed based on the sparse Bayesian learning framework.
Another pilot pattern, namely: a superimposed scheme, has been recently proposed. In this scheme, pilots and data symbols are spread in the Delay-Doppler domain. Several algorithms for CE and data detection using this scheme are available [14][15][16]. In [14], a data-aided CE algorithm for a superimposed pilot and data transmission scheme is suggested to improve spectral efficiency. To accurately estimate the channel and detect the data symbols, the channel is coarsely estimated based on the pilot symbol, followed by an iterative process that detects the data symbols and refines the channel estimate. In [15], pilots and data symbols are superimposed in the Delay-Doppler grid. In [16], an iterative algorithm for CE and data detection is proposed. The CE step is formulated as a sparse recovery problem via the variational Bayesian expectation-maximization (VB-EM) procedure.
In the present paper, several CE algorithms for OTFS systems are proposed and compared against two state-of-the-art methods recently suggested in [2] and [11].
First, this work starts with a simple adaptation of a method that belongs to the greedy family, which is the generalized orthogonal matching pursuit (GOMP) [17,18]. This adapted technique will serve as a baseline method in the developed solutions.
Next, two algorithms involving 1 -minimization that take into account the sparse nature of this problem are proposed.
The main idea is the separation of the unknown channel vector into an unknown sparse support vector, which corresponds to the locations of delay and Doppler taps, and an unknown vector of channel gains. These two 1 -minimization algorithms use an iterative two-stage procedure that consists on optimizing the recovery problem alternatively concerning support and gain vectors. The first method resorts to the penalization of the mean square error with the 1 -norm of the gain vector. The second uses the reweighted 1 -norm, which further improves the recovery performance.
Alternatively, the sparse representation problem is reformulated as a specific marginalization of the maximum a posteriori (MAP) problem on the sparse support-vector. And to deal with the intractability of this problem, two Bayesian-based algorithms are proposed. Whereas, the first relies on the Monte Carlo Markov chain (MCMC) with the Gibbs sampler, the second uses a particular variational mean-field approximation and the VB-EM procedure.
In summary, the main contributions of this work are: 1. The proposal of two algorithms for sparse CE that involve 1 -minimization to optimize the recovery issue about gain and support vectors. 2. The proposal of two Bayesian-based algorithms for sparse CE. 3. The complexity analysis of the suggested algorithms and the evaluation of their performance in terms of NMSE and BER.
Simulations will show that, compared to state-of-the-art methods, most of the proposed algorithms perform significantly better in terms of normalized mean square error (NMSE) and bit error rate (BER) but are slightly more expensive. Experimental tests will also show that the proposed algorithms remain applicable in high-mobility scenarios and low-latency applications.
The rest of the paper is structured as follows. In Section 2, the basic concepts of OTFS modulation are introduced and the CE problem is formulated. In Section 3, the proposed CE algorithms are described and their complexity is evaluated. In Section 4, the two selected state-of-the-art methods are described and the performance of the proposed algorithms are compared against these methods.
Notations: symbols a, a, and A, stand for scalar, vector, and matrix, respectively. (.) H denotes Hermitian transposition. Operators ⊗ and ⊙ denote Kronecker product and Hadamard product, respectively. vec(.) denotes the column vectorization of an M × N matrix into an MN column vector. F n and F H n denote, respectively, the n-point direct and inverse discrete Fourier transform matrices. I M is the M × M identity matrix.

GENERAL CONCEPTS OF OTFS AND PROBLEM FORMULATION
In this section, first, the basic concepts of OTFS modulation are detailed. Then, the input-output relation of the

General concepts of OTFS
Details about the basic concepts of OTFS modulation can be found in [2], [4], [10], [19], [20]. Table 1 summarizes the notations and their physical meaning. The time-frequency grid G TF and the Delay-Doppler grid G DD are expressed as follows: The cross-ambiguity function between p tx (t ) and p rx (t ) is: We say that the bi-orthonormal property condition between p tx and p rx is satisfied if Here, a single-input single-output OTFS system whose block diagram is given in Figure 1 is considered. This diagram is divided into three parts: transmitter, Delay-Doppler channel, receiver.
On the transmitter side, the 2D information symbols (e.g. QAM or M-PSK symbols), which are arranged on the Delay-Doppler grid G DD , are mapped on the time-frequency grid G TF (n = 0 ∶ N − 1 and m = 0 ∶ M − 1) via the inverse symplectic fast Fourier transform as follows: Then, the Heisenberg transform modulator, which is equivalent to an M -point inverse fast Fourier transform with a pulse-shaping p tx (t ), will be applied on X to generate the transmitter time-domain signal s(t ) as follows: The signal r (t ), received through a linear time-variant channel, can be expressed as [4]: where w(t ) ∼  (w(t ); 0, 2 ) is the additive noise. The Delay-Doppler channel is characterized by a reduced number of reflectors, each defined by its delay, Doppler shift and amplitude. The channel is sparse and its response h( , ) can be expressed in the following form: where (.) is the Dirac delta function; P is the number of paths; h i ∈ ℂ is the complex gain. i ∈ [0, max ] and i ∈ [− max , max ] are, respectively, the delay and Doppler shift associated with the i th path. l i and k i designate the delay and Doppler taps for the i th path: On the receiver side, the signal r (t ) first passes through a matched filter bank, which supplies the cross-ambiguity function between p rx (t ) and r (t ). Then, the output Y (t , f ) of the matched filter is sampled, leading to Y [n, m] = Y (t , f )| t =nΔt , f =mΔ f . This procedure is referred to as the Wigner transform.
Next, the relationship between time-frequency output samples Y [n, m] and input samples X [n, m] is: where Then, applying the symplectic fast Fourier transform yields The connection between y[k, l ] and the transmitted symbols can be expressed as a 2D circular convolution [10] as follows: where b k ′ ,l ′ ∈ {0, 1} is the path indicator and the term v[k, l ] ∼  (v[k, l ]; 0, 2 ) is an additive circular white noise with variance 2 . k,l is a known phase shift caused by imperfect bi-orthogonality of the rectangular waveform [19], given by

Problem formulation
In this work, a non-blind estimation approach for CE is considered, that is, to estimate the channel accurately, pilots in the Delay-Doppler map and an efficient pilot pattern are needed. Benefiting from the fact that the channel is sparse in the Delay-Doppler domain, CE can be seen as a sparse recovery problem. The arrangement of pilots, guards, and data symbols proposed in [11] is used, where the proportion of guard symbols is reduced compared to the pattern of pilots proposed in [2]. In addition, the distribution of pilots over several Delay-Doppler positions in the grid helps limit the peak-to-average power ratio. On the receiver side, a simple, yet effective, interference cancellation scheme is adopted in [11] to address pilot contamination in the pilot pattern. Figure 2 illustrates this arrangement in the Delay-Doppler domain grid for OTFS frame transmission.
For each OTFS frame transmission, there are L p = (2N p + 1)M p pilots, L g = (2(k + N p ) + 1)(l + M p ) − L p guard symbols and L d = MN − L p − L g data symbols in the Delay-Doppler grid G DD . Their locations are given as follows: where [k p , l p ] denotes the reference position of the pilot symbols such that 0 ≤ k p ≤ N − 1 and 0 ≤ l p ≤ M − 1.
Here, x d [k, l ] represent the data symbols located at the grid location [k, l ]. Guard symbols ensure that received symbols for CE do not interfere with data symbols. We note that in this pattern there are no guard symbols on one side of the pilot. This is due to the channel causality in the delay dimension. Therefore, the data symbols on the right side will not corrupt the pilots. Unlike Doppler taps which can be negative or positive, it is necessary to place the guard intervals only on the left side of the pilots since delay taps cannot be negative.
In the Delay-Doppler domain, due to the transmit symbol arrangement in (14), the received pilot symbols can be expressed in the same way as (12), where k ∈ [k p − N p , k p + N p ] and l ∈ [l p , l p + M p − 1]. This can be written in a matrix form as: Let L = (2k + 1)(l + 1). Then, y p ∈ ℂ L p is the received vector for pilots, which is expressed as [y p ] ((k+N p )M p +l −l p ) = y[k, l ], and h ∈ ℂ L is the channel vector, which is expressed as It is worth noting that the vector h contains only P non-zero elements, where P is the number of paths. The CE in this context amounts to estimating the channel support (8). Finally, CE quantifies the sparse vector h from (15). In the next section, the proposed algorithms to solve this problem will be presented.

PROPOSED ALGORITHMS
In this section, various algorithms for sparse CE in OTFS systems are presented. A straightforward way to solve problem (15) with the pilot pattern adopted in this work is to use a greedy approach. Thus, the first algorithm is a mere adaptation of the GOMP method [17,18]. This adapted algorithm will be considered here as a baseline method for sparse CE. The remaining algorithms are divided into two categories. The first one includes two iterative algorithms involving 1 -minimization. The second one includes two algorithms based on Bayesian approaches. This section concludes with the complexity analysis of all the suggested algorithms.

Adapted GOMP for sparse CE
GOMP is the generalization of the OMP [21,22] in the sense that multiple indices are identified per iteration. Therefore, GOMP ends the process of recovery with a much smaller number of iterations than the OMP. It is seen in [17] that GOMP has a high-recovery performance (compared to 1 -minimization techniques) with fast processing speed and competitive computational complexity. In the present work, GOMP is adapted to solve approximately the following problem: where ‖h‖ 0 denotes the 0 -pseudo-norm (i.e. the number of non-zero entries of h). GOMP is divided into four steps, namely: Identification, Augmentation, Estimation, and Residual Update. These steps are detailed in [17]. GOMP is one of the greedy algorithms which are the least complex CS-theory detectors, but they suffer from error propagation. A non-zero false estimate at a given iteration can cause the non-zero estimate of all elements of the vector to be incorrect. Moreover, GOMP ensures convergence towards a local optimum, but there is no guarantee of convergence towards a global optimum. This is a known weakness of greedy algorithms.

Iterative two-stage (I2S) algorithm
To get around the drawbacks of GOMP, a criterion and an algorithm that exploits the sparse structure of the channel h are defined. To this end, one can write Solving (17) is computationally very demanding. To ease this problem, the usual approach is the relaxation of the constraints. Here, the constraints involving b are relaxed by using the 1norm with lower and upper bounds. Then, (17) is relaxed as follows: where is introduced to force the sparsity of b.
A two-stage algorithm is proposed to solve (18). This algorithm estimates the support b given g in the first stage. Then, it estimates the channel gain vector g given b in the second stage. The estimate of g is obtained via a minimum mean square error estimator which is given bŷ where Âb = AD̂b, D̂b is an (L × L) matrix given by D̂b = diag(b) and = 2 ∕ 2 g , with 2 g is the power of the P channel coefficients.
To solve (18), the vector g LS of channel gains using the leastsquares solution for (15) is considered. Then, the initial vector g (0) is defined as being the L vector containing the P largest values of g LS and zeros elsewhere: g (0) = max P (g LS ). The proposed iterative algorithm will be referred to as iterative two-stage (I2S) and is detailed in Algorithm 1.
At the output of Algorithm 1, the entries of the support vectorb belong to [0, 1]. To reconstruct h, a binary support vector is needed. There are two options to estimate it fromb. The first, without any prior, is to apply onb a fixed threshold of = 0.5. The second is to use an optimal hard threshold [23]. ALGORITHM 1 Iterative two-stage (I2S) algorithm Input: measurements y p ∈ ℂ L p , sensing matrix A ∈ ℂ L p ×L ,

Reweighted iterative two-stage (RI2S) algorithm
To improve sparse signal recovery, an interesting solution has been proposed in [24], named "reweighted 1 -minimization". This solution can be seen as an intermediate between the 0minimization and the 1 -minimization because it approximates the former (which is not convex) while preserving its convexity at each iteration, thanks to a criterion based on the 1 -norm. Therefore, based on this approach, a CE algorithm for OTFS modulation, named Reweighted iterative two-stage (RI2S) algorithm is proposed. The difference between RI2S and I2S lies in the first stage, where the estimate of the channel support b (given g) is provided by applying reweighted 1 -minimization. This is detailed in Algorithm 2, where W = diag(w 1 , w 2 , … , w L ). We note that the initial weights are set to w (1) Here, the weights w (i+1) k are updated as follows: w . Large/small weights could be used to discour-age/encourage non-zero entries in the recovered signal. Parameter > 0 is introduced to avoid the norm divergence due to high weight values caused by extremely low entries of b (provides stability) and to ensure that a zero-valued component in b (i ) k does not strictly prohibit a non-zero estimate at the next step. A possible adaptive choice of is given in [24] as = max{|b (i−1) | (P+1) , 10 −3 }, where |b (i−1) | (P+1) is the (P + 1) th largest elements of |b (i−1) |.

Bayesian approaches
The unstructured sparse representation problem can be reformulated as a particular marginalized MAP problem on the support b of the sparse vector h. To deal with the intractability of direct optimization of the distributions of interest, two possible approaches are considered. The first is based on MCMC simulation and involves the Gibbs sampler. The second is based on a particular variational mean-field approximation and VB-EM procedure [25]. Before the description of the two proposed Bayesian-based CE algorithms, it is needed to define the probabilistic prior models used to derive these techniques.

Probabilistic model
The observation model (15) can be also expressed as follows: where A k is the k th column of A and v = [v 1 , … , v L ] T is the noise vector which is a white circular Gaussian, with p(v k ) = 1 2 e −|v k | 2 ∕2 2 . Therefore, p(y p |g, b) =  (A b g b , 2 I L ), where A b ∈ ℂ L p ×P and g b ∈ ℂ P are made up of A and g considering indices k where b k ≠ 0.
The sparse nature of h lies on the fact that most of its entries are close to 0 and a few of them are not negligible. To account for this prior information, the entries of h can be described as a mixture of two zero-mean Gaussian distributions, with respective variances 2 0 and 2 1 , such . Therefore, g obeys the following probabilistic model: Note that the variances 2 g k of Gaussian distributions realizations are independent of the support b. To detect the location of spikes, note that the variables b k associated with the corresponding entries of h are assumed to be independent Bernoulli random variables, such that b k = 1 if a spike is present at h k and b k = 0 otherwise. Thus, h k |b k ∼  (h k ; 0, Unstructured sparsity can be modeled by a standard choice based on a product of the Bernoulli distributions, as follows: Alternatively, instead of seeking to approximate the means of the posterior distributions via MCMC simulation, recent approaches aim to iteratively calculate exactly a variational approximation of the target posterior distribution.

3.4.2
Channel estimation in a Bayesian framework At this point, MAP estimation of the sparse channel parameters, which corresponds to the optimal Bayesian estimator [26] is addressed. In this case, the estimation problem of support and channel gains can take the following form: First, the channel support vector b is determined. To achieve this, the probability of a false decision on b is minimized. The decision that leads to this minimization is given as follows: The maximization of p(b|y p ) is too costly. This is because solving (24) requires the evaluation of the function log(p(b|y p ) for all sequences of b ∈ {0, 1} L (2 L evaluations). Individual decisions for the entries of the support b can be made from a marginalized MAP, leading to: In the same way, the evaluation of p(b k |y p ) is intractable due to the costly marginalization of p(b|y p ) over the b l 's, for l ≠ k. To circumvent this difficulty, MCMC with Gibbs sampler and mean-field variational approximations compute a tractable surrogate q(b k ) of q(b k |y p ) [25], [27]. In this case, (25) will be approximated by the following easier problem: Once the support b is estimated, channel coefficients g can be estimated asĝ = arg max g log(p(g|b, y p )): whereĝ̂b denotes entries of g restricted to the estimated support, Âb is the corresponding columns of A and = diag( 2 ∕ 2 g 1 , 2 ∕ 2 g 2 , … , 2 ∕ 2 g L ). If 2 ⋘ 2 g k ∀k, then, this solution reduces to the least-square solution.

MCMC algorithm
The idea is to generate realizations of a given process and use them to compute an estimator of the parameters. Thus, to decide b, the marginal posterior probability p( |y p ) of = (g, b) is simulated using a Gibbs algorithm [28]. The Gibbs sampler simulates realizations of samples k according to the a posteriori marginals p( k |y p , −k ). Denoting by g −k the vector g where entry g k has been removed and defining b −k in the same way, each loop of the Gibbs algorithm leads to the iterative simulation of variables g k ∼ p(g k |g −k , b, y p ) and b k ∼ p(b k |b −k , g, y p ). Implementation of the Gibbs sampler is summarized in Appendix A.1. The Gibbs sampler is given by the following distributions: where (b (i ) , g (i ) ) i=1∶K denote the vectors supplied by operating the Gibbs sampler K times. After K 0 iterations, which represent a learning period, the samples are (roughly) sampled according to the target distribution. Then, comparing averaged simulations for b k to a threshold = 0.5, minimizes the Bayes risk for the corresponding decisionb k ∈ {0, 1}: Then, forĝ, the corresponding maximum likelihood estimator (27) can be chosen.
Details of the implementation of the proposed MCMC algorithm for CE are summarized in Algorithm 3.

3.4.4
SoBaP algorithm The methodology adopted here is to compute an approximation q(b k ) of the posterior probability p(b k |y p ) named the mean-field approximation. It is detailed in [25]. The mean-field ALGORITHM 3 MCMC algorithm Input: measurements y p ∈ ℂ L p , sensing matrix A ∈ ℂ L p ×L , Bernoulli parameter , , end for end while , compute the ML estimator of g using (27), approximation [29] of p( |y p ) is the surrogate distribution q ⋆ ( ) satisfying where = (b, g) and p( |y p ) is its posterior distribution. The problem (30) can be solved by successive minimizations of the Kullback-Leibler divergence [30] with respect to factors q( i ). Then, the VB-EM algorithm [31][32][33] used in this work is given in [25]. It will almost surely converge towards a saddle point or a local/global maximum of the problem (30) under mild conditions [25]. The relation between this procedure and the known expectation-maximization algorithm [34,35] results from the addition of the constraint q( i ) = ( i −̂i ) on some q( i ) ′ s.
The mean-field approximation offers a good framework for approximating the marginals p( i |y p ). Indeed, where the last equality stems from the constraint of (30).

ALGORITHM 4 SoBaP algorithm
Input: measurements y p ∈ ℂ L p , sensing matrix A ∈ ℂ L p ×L ,  A(b ⊙ m), estimate g conditional to q using (27), The soft Bayesian pursuit (SoBaP) algorithm used in this section considers the particular case where the mean-field approximation q (g, b) of p(g, b|y p ) simply writes q(g, b) = ∏ k q(g k , b k ). Together with the model (21), the corresponding VB-EM update is given in Appendix A.2. From (31), an approximation of p(b k |y p ) therefore follows simply from the relations By using this approximation, (25) can be solved easily by a simple threshold operation, that is,b k = 1 if q(b k = 1) > and b k = 0 otherwise, with = 0.5 as for the MCMC algorithm. A crucial parameter that can influence the performance of SoBaP and that deserves to be estimated adaptively (at each iteration) is the noise variance 2 . A good estimator of 2 is crucial for the convergence of the algorithm. Its estimation requires its introduction as a new variable in , that is, = [b, g, 2 ]. Then, q(g, b, 2 ) = q( 2 ) ∏ k q(g k , b k ). Letting q( 2 ) = ( 2 − 2 ), the estimation of 2 is given in Appendix A.3. The steps of SoBaP are detailed in Algorithm 4.

Complexity analysis of the proposed algorithms
In the case of GOMP, the complexity of the k th iteration in terms of multiplications is approximatively 2L p L + (4K 2 + 2K )kL p [17]. Thus, the overall complexity is dominated by the identification step. For N g iterations, it is  (N g L 2 ).
In the case of I2S and RI2S, solving the convex optimization problem considered in the first stage is relatively fast. The solution for this linear programming problem is obtained via the Matlab CVX toolbox [36,37], a package for solving convex problems based on the interior-point method. The complexity is dominated by (L 3 ). Whereas, the second stage, which involves the calculation of a minimum mean square error, is evaluated around (L 3 ). Therefore, the overall complexity of these two algorithms is around (N t L 3 ) for N t iterations.
In the case of MCMC, the complexity requires (L 2 ) operations for each iteration. Thus, the overall complexity of this algorithm is upper-bounded by (N m L 2 ) for N m iterations.
In the case of SoBaP, the most expensive operation is the update (equation (41) in [25]) which is (L 2 ) at each iteration. Thus, the overall complexity of SoBaP is around (N s L 2 ) for N s iterations.

RESULTS AND DISCUSSION
In this section, the proposed algorithms are tested and compared with two state-of-the-art methods from [2] and [11]. The description of these methods is given in Section 4.2. The comparative tests are presented in Section 4.3 in terms of complexity, NMSE, and BER in real scenarios.

Simulation setup
All the simulations are performed on a regular desktop PC using MATLAB 2018a. Information bits are mapped to QPSK before being transmitted through the OTFS transmitter.
For the channel delay model, the Extended Vehicular A model [38] of the 3GPP, the standardization body for 5G cellular communications is adopted. Note that each delay tap has a different single Doppler shift in the form i = max cos( i ), where i ∼  [0, ] and max denotes the maximum Doppler shift determined by the user's equipment speed. The maximum delay tap is set to l = 10 and the maximum Doppler tap to k = 4, which corresponds to a high-speed scenario with a maximum user's equipment speed v = 120 km∕h. The channel coherence time T c = 1∕ max = 2.25 ms is longer than the OTFS symbol duration T s = T ∕(M Δ f ) = 0.13 s, which means that the channel variation is slow in this case.
To achieve good CE, the size of pilot signals should cover the maximum delay and Doppler spread, that is, M p ≥ l and 2N p + 1 ≥ k . Thus, to balance the estimation accuracy and spectral efficiency, M p and N p are set to 16 and 4, respectively. Pilots are chosen to optimize the average mutual incoherence property of A which is expressed as [11] where a i denotes the i th column of A. Note that the coherence reflects the correlation between the columns of A; a higher correlation will degrade the performance. Therefore, pilots will be chosen so as to minimize avg . Four types of pilot sequences are tested: constant amplitude zero auto-correlation sequences (CAZAC), Gaussian sequences, Hadamard sequences, and particle swarm optimization (PSO) sequences [11]. Table 2 shows that the best value of avg is the one corresponding to the PSO sequences, which  Table 3.
Performance will be calculated in terms of BER and NMSE. The NMSE is a normalized mean square error with values in [0, 1], defined by

Description of the selected state-of-the-art methods
The comparison of the developed techniques is done against the two state-of-the-art methods proposed in [2] and [11]. The first one is well known and widely used. The second one is a recent on-grid channel estimation method that uses a pilots scheme and guard intervals in the Delay-Doppler domain.
In [2], a simple CE method based on thresholding is proposed to solve (15) with respect to h using a pilot pattern with only one pilot symbol x p transmitted per frame. It operates as follows: for 1 ≤ k ≤ L p , entries k such that |y p (k)| ≥ (real positive detection threshold), determine the support vector b of h where we set (above the threshold) b(k) = 1 and h(k) = y p (k)∕x p . Otherwise, b(k) = 0 and h(k) = 0. This method relies on the fact that the existence of a path implies that the received symbol is the weighted pilot signal with additive white Gaussian noise. Otherwise, it is only noise.
This method (Threshold method) is simple and has a low complexity. However, miss-detection and false-alarm probabilities can be altered by varying the threshold on path detection. To guarantee good performance, the SNR p (pilot signal-to-noise ratio) must be high (the minimum value of SNR p used in [2] is 30 dB). If we assume that SNR p = SNR d (the signal-to-noise ratio of data symbols), then this method may not detect some paths. In particular, for SNR p = SNR d < 20 dB, this method generally detects n < P paths. An optimal threshold can be used to counter this problem [23]. At this point, as this approach will serve as a benchmark, it is slightly modified as follows. It is assumed that P is known and that the estimation of channel coefficients are given as h(k) = y p (k)∕x p , where k's are the indices of the P values with largest amplitudes in y p . This modification does not penalize at all the method. In fact, it will enable the method to reach its best performance so as to provide a good testing ground for the proposed algorithms.
As second method, the OMP procedure proposed in [11] is used. It uses a pilot pattern, where the proportion of guard symbols is reduced. To achieve accurate CE, based on the MIP criterion, an OTFS pilot sequence optimization problem is formulated and a PSO-based procedure is developed to solve it. The classical OMP algorithm is adopted as the CE algorithm.

Comparison with the state-of-the-art methods
Here, the proposed CE algorithms are compared with the two selected state-of-the-art methods in terms of complexity, NMSE, and BER.

Complexity
To refine the complexity analysis given in Section 3.5, the order of the number of iterations N g , N t , N m , and N s must be known. It is shown in [17] that N g ≤ min(P, L∕N ), where N is a small constant and P is the number of channel taps. In practice, N g and P have the same order. Therefore, the overall complexity of GOMP is n g (L 2 ).
The parameters N t , N m , and N s are determined experimentally. The average number of iterations needed for convergence as a function of the dimension L for each of the proposed algorithms is plotted. Figure 3 shows the average number of iterations allowing I2S, RI2S, MCMC and SoBaP to converge, as a function of L for a constant maximum delay tap (l = 10). Note that L = (l + 1)(2k + 1), therefore, varying k in the interval [0, 16] amounts to varying L in [11,363]. Figure 3 shows that the average number of iterations for I2S and RI2S remains constant (about 10) independently of the dimension L. For MCMC and SoBaP, this average number grows according L. However, in the case of SoBaP, this number variates slowly. The dotted lines in Figure 3 correspond to linear approximations of the MCMC and SoBaP curves. These approximations remain valid for the values of k used in the present work. In summary, the computational cost of all the proposed algorithms is (L 3 ); except for the baseline method GOMP which is (L 2 ).
Another metric to qualify the complexity of these algorithms is the runtime. Figure 4 shows the runtime in seconds of the proposed algorithms as well as that of the two state-of-the-art methods as a function of k . The runtime of the baseline GOMP is smaller and is comparable to that of the state-of-the-art methods in [2] and [11]

4.3.2
Normalized mean square error (NMSE) The variation of NMSE as a function of SNR is now investigated. We have set SNR = SNR d = SNR p . For a fair comparison with the one-pilot scheme, the total power of each OTFS frame is the same for all schemes. Figure 5 illustrates that GOMP and the methods in [2] and [11] show similar performance. All the other algorithms perform better, SoBaP being the best overall.  Bit error rate (BER) The variation of BER as a function of SNR is now investigated.
Here, the message-passing procedure [10] is used to detect the data symbols by solving a set of MN linear equations after the removal of pilots contribution. Figure 6 shows the variation of BER versus SNR. Note that state-of-the-art methods in [2] and [11] and GOMP perform similarly. However, the other algorithms perform better. Here again, it is observed that SoBaP is the best overall and achieves a BER close to that of the oracle. BER under low-latency communications New generations of advanced mobile communications primarily require low-latency communications [39], that is, small N . Figure 7 shows the variation of BER versus SNR in the case of a low-latency application. Here, the number of time slots and the number of subcarriers used are N = 16 and M = 128, which corresponds to a frame duration of 1.1 ms. It is worth noting that there is a downgrade of the performance of all methods (which is expected in low-latency applications). However, the downgrade for state-of-the-art methods and GOMP is more pronounced than that of other algorithms. Here again, SoBaP is still the most efficient algorithm.

Discussion
Several different algorithms for sparse CE in OTFS systems (GOMP, I2S, RI2S, MCMC and SoBaP) are suggested and compared with the state-of-the-art methods in [2] and [11].
Comparison have been made in terms of complexity, NMSE and BER.
Simulations have shown that GOMP and the methods in [2] and [11] have similar lowest complexity. SoBaP costs slightly more but is still less expensive than the other algorithms.
Regarding the performance in terms of NMSE and BER, simulations have shown that GOMP performs similarly to methods in [2] and [11]. All the other algorithms perform better. Those based on Bayesian approaches (MCMC and SoBaP) dominate those based on 1 -minimization (I2S and RI2S). In particular, SoBaP is the best algorithm. It exceeds the two stateof-the-art methods proposed in [2] and [11] by about 8 dB at BER = 10 −3 and by about 6 dB at NMSE = −15 dB.
Consequently, it appears that SoBaP offers a good compromise between performance and complexity.
It is worth noting that the proposed algorithms remain interesting even in high-speed scenarios. This is due to the fact that the channel coherence time is higher than the duration of an OTFS symbol for high speeds. Therefore, the channel variations remain slow in Delay-Doppler domain and the interference between adjacent subcarriers has similar properties. For example, with the simulation parameters used in this work and for a relative speed of 500 km/h, the channel coherence time is equal to 0.54 ms. This value is still greater than the duration of an OTFS symbol T s = 0.13 s.
In future work, it will be worthwhile to test the performance of these algorithms in OTFS systems using Turbo coding [40][41][42]. Moreover, it will be interesting to adapt these algorithms (in particular SoBaP) to the case of superimposed schemes [14][15][16].

CONCLUSION
In this paper, the design of CE algorithms in the Delay-Doppler domain for OTFS systems has been addressed. First, iterative algorithms involving unknown sparse support vector, which corresponds to the delay and Doppler taps, and an unknown vector of channel values have been developed. The two proposed algorithms use an iterative two-stage procedure consisting in optimizing the support and channel gains alternatively. The first resorts to the 1 -minimization of the support vector. The second uses a reweighted 1 -minimization to improve the recovery performance. The estimation problem is also addressed from a Bayesian point of view. The structured sparse representation is reformulated as a particular marginalized MAP problem on the support vector. In order to deal with the intractability of this problem, two existing techniques are adapted to this context. The first is based on the MCMC with the Gibbs sampler and the second on a specific variational mean-field approximation, as well as the VB-EM procedure. The proposed schemes have been compared to a recognized CE method and to a recent on-grid channel estimation method. Experimental results have shown that the proposed schemes are computationally slightly more expensive but perform significantly better in terms of NMSE and BER and remain applicable in high-mobility scenarios and low-latency applications.