# 8PSK Demodulator for new generation DVB-S2

T.Botticchio<sup>1</sup>, P.Burzigotti<sup>1</sup>, R.Degaudenzi<sup>2</sup>, M.Luise<sup>3</sup>, A.Martinez<sup>2</sup>, F.Richichi<sup>1</sup> and P.Tabacco<sup>4</sup>.

<sup>1</sup> Space Engineering S.p.A.via dei Berio 91, 00155 Roma Italy

Phone. +39 06 225951, Fax. +39 06 2280739, botticchio@space.it, burzigotti@space.it, richichi@space.it

<sup>2</sup> TOS-ETC, ESA/ESTEC, Keplerlaan 1, Postbus 299, 2200 AG Noordwijk (The Netherlands) Phone. +31 715654227, Fax. +31 715654596 rdegaude@xrsun0.estec.esa.nl, Phone. +31 715654943, Fax. +31 715654596 Alfonso.Martinez@esa.int

<sup>3</sup> Dept. Information Engineering, University of Pisa, Via Diotisalvi 2, 56122 PISA - Italy, Phone: +39 050 568662 fax: +39 050 568522, marco.luise@iet.unipi.it

#### **Abstract**

This article describes the architecture and the performance of a robust and near Shannon limit 8PSK Demodulator aimed in term of complexity for symbol rate from 11Mbaud to 27.5Mbaud. Being very accurate in terms of the residual jitter for carrier and clock, the Demodulator is aimed for use of Pragmatic Turbo Trellis Codes. A possible structure-reuse for a multi-modulation has been also investigated accordingly with the future satellite broadcasting and contribution services.

### 1. General Scheme: Architecture

There is a growing demand in communication world to provide high throughput services near reliable communication links. The increase of data rate immediately leads to the necessity of high level modulation to preserve bandwidth but also leads to the research of new structure solutions to increase performance, particularly in terms of phase stability. A very low phase jitter is a basic request for a correct operation of Co/Decoder as Turbo.

The studied 8PSK demodulator wants to answer to this necessity, with a compromise between an enhanced throughput and a conservation of good performances saving bandwidth.

Through the paper specific algorithms will be presented for symbol timing, carrier frequency and carrier phase recovery, operating a very low signal-to-noise ratios compatible with the operation of a TURBO DECODER. What is mandatory to highlight is the complete NON-DATA-AIDED feature of the implemented algorithms.

The following Figure 1 simply shows the needed steps to obtain a recovered 8PSK sample, ready to be elaborated during the decoding phase.



Figure 1. Demodulator general scheme.

Detailed global pictures of the down-conversion, rate adaptation, internal AGC, recovery schemes are shown in Figure 10 and Figure 11.

<sup>&</sup>lt;sup>4</sup>DSP SYSTEMS, via dell'Orsa Maggiore 21, 00144 Roma Italy, mc4759@mclink.it

This section wants to give emphasis to the revision of the relevant features of the project, as they have finally issued accordingly with market-requirements. Particularly we identified two applications which a new modem specification and implementation with improved performances could be accepted in:

- Downstream channel of Small Office Home Office (SOHO) INTERNET service providers acting through Satellite
- TV Broadcasting based on high quality digital standards (DVB-S and DVB-DSNG)

As the following table reports, the critical aspects of the project lie on the relative short time allocated for the synchronization, with operating EsN0 at few dB and near Shannon-limit.

**Table 1 Demodulator main specifications** 

| Input signal characteristics                                  |                     |  |
|---------------------------------------------------------------|---------------------|--|
| Transmission type                                             | Single carrier,     |  |
| Transmission type                                             | continuos mode      |  |
| Minimum E <sub>S</sub> /N <sub>0</sub>                        | 8 dB                |  |
| Symbol rate range                                             | [11.0-27.5] Msym/s  |  |
| IF front end and demodulator                                  |                     |  |
| IF carrier frequency                                          | 140 MHz             |  |
| Modulation format                                             | 8PSK Filtered       |  |
| Signal shaping                                                | Squared Root Cosine |  |
| Roll-off factor                                               | 0.2, 0.25, 0.35     |  |
| Carrier acquisition                                           |                     |  |
| Maximum carrier acquisition time $(SNR = 7 dB^1)$             | ≤ 500 ms            |  |
| Mean time to lose lock                                        | >24 hrs             |  |
| Carrier acquisition min. S/N at the input of the AD converter | 7 dB                |  |
| Symbol timing acquisition:                                    |                     |  |
| Pull-in range                                                 | 0.1/T MHz           |  |
| Tracking range                                                | 0.01/T MHz          |  |
| Maximum acquisition time (SNR = 7 dB)                         | ≤ 500 ms            |  |
| Mean time to lose lock                                        | >24 hrs             |  |
| Symbol timing acquisition min. E <sub>S</sub> /N <sub>0</sub> | 7.0 dB              |  |
| Timing Missed Acq. Probability                                | 1E-5                |  |
| Total Acquisition Time                                        |                     |  |
| At $E_S/N_0 = 8 \text{ dB (QEF)}$                             | ≤ 500 ms            |  |
| Total loss, synchronization subsystem                         | ≤ 0.5 dB            |  |

#### 2. Down-conversion and down-sampling

The first step after input signal quantization by means of a 12-bit ADC is the down-conversion to base-band through a low IF carrier sampled at four times the input spectra center frequency.

The multiplication required by the down conversion block is so replaced by a sign change and an alternated In Phase and In Quadrature sample neglecting. Then a CIC filter is used in order to down-sample the down-converted signal.

The choice of such a simple filter is related to the necessity of a fast signal elaboration and a simple programmable block. As the input symbol rate can be in the range [11 - 27.5]Msym/sec, the down-

<sup>1</sup> this value is minimum value for which the Modem is supposed to lock in the specified maximum acquisition time. At this value the BER Quasi Error Free condition performances cannot be guaranteed

sampling factor, to obtain an output sampling rate slightly greater 2 sample per symbol, is an integer in the range  $R_0 \in [2-6]$ .

A CIC filter is made up by N (fixed to 3) serially concatenated filter, each having a rectangular impulse response:  $S_i = \sum_{i-R,+1}^{i} I_i$  where  $R_0$  is the downsampling factor.

$$H(z) = \left(\frac{1}{1-z}\right)^{N_{CIC}} \iff H(f) = e^{-j\pi f T_S(R_0 M_{CIC} + 1)N_{CIC}} \left[\frac{\sin(\pi f T_S R_0 M_{CIC})}{\pi f T_S}\right]^{N_{CIC}}$$
(Eq. 1)



Figure 2. CIC frequency response, input sampling frequency f<sub>s</sub>. M<sub>CIC</sub>=1, N<sub>CIC</sub>=3, R<sub>0</sub>=6.

The frequency response shape (Figure 2) clearly shows the magnitude distortion that will be compensated during matched filtering by an appropriate filter coefficient design.

### 3. Timing Recovery

Since the receiver local oscillator and transmitter one are completely independent, symbol rate and sampling rate are uncommensurated.

A NCO is employed to mark the valid-samples from CIC section; it also furnishes the amount of time-delay to recover the exact sampling time (zero-ISI sample). The timing-NCO register, which represents this value, is passed to a third order interpolator, designed with Farrow decomposition. It is possible to consider only few MSBs of the NCO: from simulation a short word-length seems to be sufficient.

#### 3.1 Farrow

Two distinct blocks make up the interpolator:

- Farrow coefficients calculation block: the input value is the timing-NCO accumulator, scaled to obtain only the MSBs.
- Four taps filter: the properly defined "interpolator", designed as FIR. The coefficients  $C_i$  are determined from previous block.



Figure 3. Third-order interpolator parameters.

The interpolator receives  $T_S = R_0 T_{samp}$ -spaced samples (x-sequence) and outputs the y-sequence:

$$y(t_n) = \sum_{i=-2}^{1} C_i(\mu_n) x[(l_n - i)]$$
 (Eq.2)

where:

basepoint index 
$$l_n = \inf\left(\frac{t_n}{T_S}\right)$$
 (Eq.3)

fractional interval 
$$\mu_n = \text{fre}\left(\frac{t_n}{T_S}\right)$$
 (Eq.4)

While the input rate is slightly more than 2 sample per symbol, the interpolator output (then the calculation) is exactly active 2 times per symbol. Even if aliasing occurs, caused by this small downsampling, nevertheless simulations have demonstrated that postponing the matched filtering after this block do not alter the performance significantly.

### 3.2 NCO - timing

The NCO is just a simple accumulator, whose length has been chosen equal to 22 bits plus the carry out, updated at interpolator input-rate.

The ratio between the symbol rate and the down-sampled rate is then used to calculate the constant parameter  $K_{NCO}$ , the NCO incremental word. The finite representation of the NCO and then of  $K_{NCO}$  implies a ramp in the accumulated Eg (error signal from TED): it represents the Gardner algorithm ability to track little systematic errors during the acquisition and tracking symbol rate and LO frequency.

### 3.3 Gardner algorithm implementation

The well-known Gardner algorithm, even if thought for a QPSK modulation, is really suitable also for the 8PSK study-case. Both the real and the imaginary branches are needed in a complex modulation in order to make the algorithm independent from the input signal phase and a possible residual low-frequency carrier.

The Figure 3(b) shows the delay-estimate time evolution for different simulations with a loop-bandwidth  $B_{\tau}=10^{-4}$ . The results ensure a lock-time less than 1ms at worst-case 11Msym/sec.



Figure 3Gardner S-Curve; Accumulated E<sub>g</sub> for a number of simulations.

### 4. Frequency Recovery

### 4.1 Phase offset and residual Frequency removing

After timing recovery took place, the signal is yet not perfectly synchronized because of the residual carrier frequency. Before filtering with the Matched Filter (MF) a frequency error compensation is needed in order to avoid a mismatch between the MF input signal spectra and the MF frequency response.

This function is performed with a frequency rotator implemented through a complex multiplication. The complex components are provided by a LUT addressed by the sum of a coarse frequency correction furnished by the frequency NCO and a fine frequency correction furnished by a second order phase recovery block described later in this paper.

### 4.2 Modulation Removing

Independently by the chosen non-data-aided frequency estimation algorithm, the first step in carrier recovery is modulation removing. The signal is passed through a non-linearity that multiply the phase by M and raises the signal module to a 2's power, following the Viterbi &Viterbi (V&V) algorithm.

A fundamental phase ambiguity problem rises: with a MPSK modulation the maximum resolvable frequency is  $\frac{1}{2MT}$ , related to sampling period (T) and M-power non-linearity which maps each  $2\pi/M$  sector into the complete complex circle  $2\pi$ .

By the way, both coordinate transformation from Cartesian-to-Polar and *viceversa* are needed for the non-linear extractor and for the V&V algorithm.

## 4.3 MR&B algorithm

Frequency estimation is the most crucial problem in the design of the receiver architecture for its heavy effect on subsequent phase synchronizer.

The required accuracy in this modem design and relative few available symbols moved our attention to feedforward schemes and, particularly, to a Modified Rife & Boorstyn (MR&B) algorithm. Simulations indicate that feedforward frequency estimation is affected by large errors (outliers) when the signal-to-noise ratio drops below some threshold. Likely, A favorable feature of the R&B method is that the threshold can be made as low as desired with a sufficiently long observation window.

Dividing the observation window into N adjacent segments, each containing  $L = L_v / N$  samples, the modified R&B equation becomes:

$$P_n(\tilde{\nu}) = \left| \frac{1}{L} \sum_{m=0}^{L-1} y(m+nL) e^{-j8\pi m \tilde{\nu}T} \right|^2 \qquad n = 0, 1, ..., N-1$$
 (Eq.5)

$$Q(\tilde{v}) = \sum_{n=0}^{N-1} P_n(\tilde{v})$$
 (Eq.6)



Figure 4. Frequency RMSEE (MR&B algorithm).

In a first release, the search was split in two steps. The first step was called *coarse search* and consists of looking for that k, say  $k_M$ , which maximizes  $Q(k_M)$ . In the second step, called fine search, the local maximum is accomplished making some interpolation (Figure 4). To further simplify the estimation algorithm we also investigated, and then chosen, the possibility not to perform fine search, just taking the coarse estimated value as the final one

### 4.4 Frequency and Phase recovery NCO

The input to NCO is the estimated frequency-offset from previous block, once it has been calculated. The represented value is the actual phase error due to the frequency mismatch. In order to permit a more accurate phase correction, a fine phase error signal from PED is added.

#### 5. Internal AGC

The sensing point of the fully digital internal AGC is placed immediately at the output of the Matched Filter (after a downsampling), where noise influence is the lowest possible. In fixed-point release, the absolute value of the complex filter output is calculated *via* a typical Radar approximation formula, avoiding any multiplication and requiring only 2 adders and shift.

$$A_{n+1} = A_n + \gamma_A (|\xi(n)| - 1)$$
 (Eq.7)

where  $\gamma_A$  is a suited step-size. It is apparent that the steady state value of the factor  $A_n$  is such that the average amplitude of the PLL input  $\xi(n)$  is 1, as it should be in the absence of noise.



Figure 5. AGC-amplitude control factor.

### 6. Matched Filter (MF)

The coefficients of FIR MF filter will depend on compensation of the CIC filter then on the inputrate. Anyway, as in the case of not-compensated CIC, the filter length is fixed to symmetric 33 taps with only 17 multiplications (per branch) needed, because of the filter symmetry.

Thanks to previous downsampling blocks the input rate is just 2 sample per symbols: with fixed filter length in number of symbol, the computation complexity is then reduce to its minimum.

The filtering phase is made up by successive approximations and normalization in order to avoid long internal data-path dynamics. After the multiplication by MF coefficients a first normalization is performed and the FIR accumulation is realized with a tree structure using truncation of partial results.

### 7. Phase Recovery

Carrier phase estimation and correction is performed via a traditional 2<sup>nd</sup>-ordrer PLL with "blind phase error detection based on hard decision (digital Costas Loop).

Phase error signal is carried out with the comparison (correlation) between the received signal (optimum sample) and ideal constellation reference. The output is then elaborated with the parameters depending on the variable loop-bandwidth, which changes accordingly with the proposed algorithm.



Figure 6. Digital 2<sup>nd</sup>-order Costas Loop

 $\xi(n)$  is the received signal after frequency offset correction,  $\hat{\theta}(n)$  is the  $n^{th}$  estimate of the carrier phase  $\theta$ , PED stands for "Phase Error Detector", e(n) is the loop error signal and  $\tilde{c}_n$  is the hard-detected (Coded) 8PSK symbol.

$$\hat{\theta}(n+1) = \hat{\theta}(n) + \mu(n)$$
 (Eq.8)

$$\mu(n+1) = \mu(n) - \gamma(1+\rho)e(n+1) + \gamma e(n)$$
 (Eq.9)

$$e(n) = \Im\{w(n)\tilde{c}_n^*\}$$
 (Eq.10)

A second order Costas loop is employed to recovery inevitable frequency shift not corrected in the previous section by FFT computation.

Our dimensioning of the  $2^{nd}$ -order loop starts from the specs about the rms phase error. Such value is set considering both the performance degradation of a turbo decoder (due to phase fluctuations) and the loop MTLL. Analytical and simulation results show that the steady-state rms phase jitter must be on the order of  $1.5^{\circ}$  for an 8PSK modulation.

Once this value has been extrapolated, it is possible to derive the loop-bandwidth: at operating point of  $E_S/N_0$  a value of  $B_\theta=10^{-4}$  seems to be well defined.



Figure 7. Time evolution of Phase Recovery loop-bandwidth.

The time-dependent bandwidth allows an initial fast phase/frequency recovery. The starting  $B_L T_{in}$  is then (slowly) decreased toward the final tracking value to meet the restrictive modem specifications.



Figure 8. Costas Loop Acquisition Transient (linearly variable bandwidth)

### 8. 8PSK Demodulator Fixed Point Performances

Previous sections treat with the description of main modem features, without going into detail and hardware specifications. It is now mandatory to underline the good results achieved with both floating-point and fixed-point simulation.

The following Figure 9 show the final results in term of BER using the error counting methodology.



Figure 9. SER results, fixed point vs. floating point.

| Table 2. Fixed point Performance (ideal 8PSK relative) |           |                          |  |
|--------------------------------------------------------|-----------|--------------------------|--|
| Eb/N0 (dB)                                             | SER       | Implementation loss (dB) |  |
| 4.5                                                    | 2.0225E-1 | 1.2E-1                   |  |
| 5.5                                                    | 1.5221E-1 | 1.15E-1                  |  |
| 6.5                                                    | 1.0854E-1 | 1.25E-1                  |  |
| 7.5                                                    | 7.2165E-2 | 1.35E-1                  |  |



Figure 10. Timing recovery scheme



#### 9. Conclusions.

A very accurate synchronizer for 8PSK constellation has been designed in bit true form and it is ready for VHDL and HW implementation using a fast prototyping system. Its implementation loss is around 0.1 dB with respect to theoretical 8PSK performance. The HW implementation is on going in the frame of ESA Advanced High Rate Digital Modem contract.

A possible multi-modulation re-design with a poor implementation impact is an interesting objective under study.

# 10. List of Acronyms

ADC Analog-to-Digital Controller
AGC Automatic Gain Control
CIC Cascaded Integrator-Comb
DVB-DSNG Digital Satellite Newsgathering
DVB-S Digital Video Broadcasting - Satellite

LUT Look-Up Table

NCO Numerically Controlled Oscillator

PED Phase Error Detector PLL Phase Locked Loop QEF Quasi Error Free

SOHO Small Office Home Office

# 11. References Documents.

[1] H.J.Oh, S.Kim, G.Choi and Y.H.Lee, "On the Use of Interpolated Second Order Polynomials for Efficient Filter Design in Programmable Downconversion", IEEE Journal on Selected Areas in Communications, Vol.17, No.4, P.551-560, 1999, 4.