Algorithmes Parallèles pour Systèmes CO-OFDM
Algorithmes Parallèles pour Systèmes CO-OFDM
Composition du jury :
Lilian BOSSUET
Maître de Conférences HDR, Télécom Saint-Etienne
Université Jean Monnet / Examinateur
I wish to express profound gratitude to my thesis director Prof. Olivier Sentieys for guiding
me throughout the time span of this work. I am grateful to him for his expert advice and
time for thesis discussions and begin available for questions/clarifications at all times. The
meetings which I had helped with many aspects of work. His suggestions and comments
improved the quality of the thesis report.
I also wish to express my thanks to co-director Mr. Laurent Bramerie for his guidance
in the topics related to Optical Communication Systems. His evaluation of ideas from
optical systems point of view helped in validation of the work. I am very thankful for
many discussions on optical experiments.
I am thankful to my colleague Rémi Pallas for bringing up the real-time FPGA develop-
ment system and then integrating my architecture implementation on it. It was a very
hard job and he was a very good experience working with him for all the three years. I
also acknowledge Arnaud Carer for his help in setting up real-time FPGA platform and
discussions regarding implementation.
Thanks are also due to faculty members, senior researchers and rest of my colleagues at
ENSSAT with whom I mutually shared ideas and had discussions. I am thankful especially
to Mme. Nathalie Caradec for her French classes. I would also like to acknowledge the
assistance of administrative staff of ENSSAT due to which I had a pleasant work environ-
ment. Further I wish to recollect with fondness, the memorable association I developed
with my friends Hai, Jérémy, Karthik, Nhan, Rémi, Rengarajan, Stéphane, Vaibhav, Vinh
and Vivek.
This thesis has been made possible thanks to the funding by 100GFLEX project and
facilities extended by IRISA/INRIA, including travel assistance to attend conferences, for
which I remain grateful.
I wish to express my deep sense of gratitude to my parents for their encouragement in all
phases of my academic and professional career that in the first instance enabled me take
up doctoral studies. They sharing my goal of acquiring a doctorate only added to my
inspiration to complete the doctoral program successfully. I am also grateful to the rest
of my family members and friends whose constant support and words of encouragement
enabled me to focus on my work.
Last, but not the least, I thank the members of the jury for agreeing to make a critical
assessment of the dissertation and suggesting improvements to the thesis that enhanced
its quality.
Pramod UDUPA
19th June 2014
Lannion, France
i
Résumé
Les systèmes de communications optiques à très haut débit sont construits à partir des
techniques de pointe pour la détection, la modulation et la compensation de dispersion
tels que, la détection cohérente, les modulations multi-porteuses orthogonales (OFDM)
et la compensation électronique des dispersions (EDC). La réapparition de la détection
cohérente dans les systèmes de communication optique a été rendue notamment possible
par les progrès dans les circuits numériques dans les technologies avancées. La détection
cohérente possède une meilleure sensibilité pour la détection du signal par rapport aux
méthodes de détection directe. Elle permet d’utiliser des transmissions à double polari-
sation et conserve les informations de phase du signal optique et les transfert dans le
domaine électrique. L’utilisation de la modulation OFDM fournit une flexibilité signifi-
cative et l’utilisation efficace de la bande passante allouée. En raison de la disponibilité
des informations de phase dans le domaine numérique, les processeurs DSP de faible coût
peuvent être utilisés pour la compensation des dispersions dans le domaine numérique
qui rend la solution flexible et reconfigurable. Mais, l’introduction du système CO-OFDM
(Coherent-Optical OFDM) à la place de système de IM-DD (Intensity Modulation-Direct
Detection) augmente significativement le coût du système avec un plus grand nombre de
composants optiques et une quantité plus élevée de ressources électroniques requises pour
la réception du signal. À l’heure actuelle, cela rend cette solution uniquement justifiable
pour des transmissions à longue portée, même si le nombre de ressources par rapport à
un système mono-porteuse à détection cohérente et modulation à quatre états (DP-CO-
QPSK). Le choix de l’algorithme et l’optimisation de la précision des calculs en virgule
fixe de l’architecture peuvent réduire de façon significative les ressources nécessaires pour
la réalisation de systèmes CO-OFDM.
Dans cette thèse, des algorithmes à faible complexité et des architectures parallèles et effi-
caces sont explorés pour les systèmes CO-OFDM. Tout d’abord, des algorithmes de faible
complexité pour la synchronisation et l’estimation du décalage en fréquence en présence
d’un canal dispersif sont étudiés. Un nouvel algorithme de synchronisation temporelle à
faible complexité qui peut résister à grande quantité de retard dispersif est proposé et
comparé par rapport aux propositions antérieures. Ensuite, le problème de la réalisation
d’une architecture parallèle à faible coût est étudié et une architecture parallèle générique
et évolutive qui peut être utilisée pour réaliser tout type d’algorithme d’auto-corrélation
est proposé. Cette architecture est ensuite étendue pour gérer plusieurs échantillons issus
du convertisseur analogique/numérique (ADC) en parallèle et fournir une sortie qui suive
la fréquence des ADC. L’évolutivité de l’architecture pour un nombre plus élevé de sorties
en parallèle et les différents types d’algorithmes d’auto-corrélation sont explorés.
iii
iv
d’erreur binaire (TEB) de manière significative. Les algorithmes proposés sont validés à
l’aide d’une part d’expériences off-line en utilisant un générateur AWG (arbitrary wave-
form generator) à l’émetteur et un oscilloscope numérique à mémoire (DSO) en sortie
de la détection cohérente au récepteur, et d’autre part un émetteur-récepteur temps-réel
basé sur des plateformes FPGA et des convertisseurs numériques. Le TEB est utilisé pour
montrer la validité du système intégré et en donner les performances.
Abstract
In this thesis, low-complexity algorithms and architectures for CO-OFDM systems are
explored. First, low-complexity algorithms for estimation of timing and carrier frequency
offset (CFO) in dispersive channel are studied. A novel low-complexity timing synchro-
nization algorithm, which can withstand large amount of dispersive delay, is proposed and
compared with previous proposals. Then, the problem of realization of low-complexity
parallel architecture is studied. A generalized scalable parallel architecture, which can be
used to realize any auto-correlation algorithm, is proposed. It is then extended to handle
multiple parallel samples from ADC and provide outputs, which can match the input ADC
rate. The scalability of the architecture for higher number of parallel outputs and different
kinds of auto-correlation algorithms is explored.
Acknowledgements i
Résumé iii
Abstract v
Contents x
List of Figures x
0 Résumé étendu 1
0.1 Système de communications optiques OFDM à détection cohérente . . . . . 1
0.2 Contexte du travail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
0.3 Algorithme de synchronisation temporelle à faible complexité pour les sys-
tèmes OFDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
0.4 Synchronisation temporelle hiérarchique à faible complexité pour les sys-
tèmes CO-OFDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.5 Architecture parallèle pour l’auto-corrélation . . . . . . . . . . . . . . . . . 8
0.5.1 Architecture parallèle partielle(PSBP) . . . . . . . . . . . . . . . . . 9
0.5.2 Architecture parallèle complète (FSBP) . . . . . . . . . . . . . . . . 10
0.6 Architecture parallèle pour les systèmes CO-OFDM . . . . . . . . . . . . . 11
0.6.1 Emetteur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.6.2 Récepteur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.7 Experimentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1 Introduction 19
1.1 Context of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
vi
CONTENTS vii
Publications 130
Bibliography 131
List of Figures
2.1 Fiber loss coefficient vs. different wavelengths for a typical low-loss opti-
cal fiber (SSMF) and fiber without the water absorption peak (Allwave).
[Reproduced from Essiambre et al.[2]] . . . . . . . . . . . . . . . . . . . . . 29
2.2 Tolerance of various phase-amplitude constellations to ASE. Reproduced
from [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Single band of a single/dual polarization CO-OFDM system . . . . . . . . . 35
2.4 Digital OFDM Transmitter, S/P - Serial-to-Parallel, P/S - Parallel-to-Serial 35
2.5 Single Polarization RF-to-Optical Up Converter. IX - Real Part of X-Polarization,
QX - Imaginary Part of X-Polarization, DAC - Digital-to-Analog Con-
verter, LPF - Low Pass Filter, RFD - RF Driver, MZM - Mach-Zender
Modulator, ECL - External Cavity LASER, VOA - Variable Optical Am-
plifier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
x
LIST OF FIGURES xi
3.1 Plot of Coarse (a) and Fine (b) Timing Metric Functions . . . . . . . . . . 54
3.2 MSE of Timing Estimation versus SNR in ISI channel . . . . . . . . . . . . 58
3.3 MSE of CFO Estimation versus SNR in ISI channel . . . . . . . . . . . . . 59
3.4 MSE of Timing Estimation vs. OSNR in SSMF channel with CFO = 0.75 . 62
3.5 MSE of Timing Estimation vs. OSNR in SSMF channel with CFO = 4.75 . 63
3.6 MSE of CFO Estimation vs. OSNR in SSMF channel for CFO = 0.75 . . . 64
3.7 Parallel Architecture proposed by Kaneda et. al for Schmidl-Cox Algorithm 65
3.8 Parallel Architecture proposed by Chen et. al for cross-correlation operation 66
3.9 Proposed R = 4-Parallel PSBP Architecture for Psc calculation in case
of SCA. iter_flag = 0 indicates non-iterative computation mode, while
iter_flag = 1 indicates iterative computation mode. . . . . . . . . . . . . . 70
3.10 Proposed R = 4-Parallel PSBP Architecture for Rsc calculation in case
of SCA. iter_flag = 0 indicates non-iterative computation mode, while
iter_flag = 1 indicates iterative computation mode. . . . . . . . . . . . . . 70
3.11 Proposed PSPB Architecture for calculation of Pmb in case of MBA. iter_flag
= 0 indicates non-iterative computation mode, while iter_flag = 1 indicates
iterative computation mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.12 Proposed R = 4-Parallel PSBP Architecture for Rmb calculation in case
of MBA. iter_flag = 0 indicates non-iterative computation mode, while
iter_flag = 1 indicates iterative computation mode. . . . . . . . . . . . . . 72
3.13 Multiplier requirement as a function of R-parallel output for PSBP and
Kaneda’s architecture, M = 32 . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.14 Adder requirement as a function of R-parallel output for PSBP and Kaneda’s
architecture, M = 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.15 R = 4-Parallel Initial Point Auto-Correlation Computation Block for SCA . 75
3.16 R = 4-Parallel Initial Point Energy Computation Block for SCA . . . . . . 75
xii LIST OF FIGURES
4.1 HLS Block Diagram of CatapultC synthesis flow and Matlab Integration . . 83
4.2 OFDM frame format for single polarization (PolX ) CO-OFDM system . . . 84
4.3 OFDM frame format for dual polarization (PolX ,PolY ) CO-OFDM system . 84
4.4 IFFT/FFT Architecture of 4-Parallel radix-22 for N = 256, when input is
given in even and odd index order . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 IFFT/FFT Architecture of 4-Parallel radix-22 for N = 256, when input is
given in normal order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6 Plot of Mean of RMSE output of IFFT as function of Wi and Wt . . . . . . 94
4.7 Proposed CO-OFDM Receiver Architecture Block Diagram . . . . . . . . . 95
4.8 Data organization in the Synchronization Memory . . . . . . . . . . . . . . 95
4.9 Parallel Architecture for IFO Estimation . . . . . . . . . . . . . . . . . . . . 96
4.10 Channel Estimation and Equalization Architecture which supports both LS
and NLMS equalizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.11 CPE Estimation Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.12 BER vs. OSNR plot for floating-point and various fixed-point configurations
in Homodyne setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.13 BER vs. OSNR plot for floating-point and various fixed-point configurations
in Heterodyne setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.14 Pie Chart of area Occupation of all blocks of R = 4-Parallel CO-OFDM
Receiver (Fixed-point config0) . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1 OFDM frame format for single polarization (PolX ) CO-OFDM system . . . 108
5.2 Configuration of Electrical B2B Experiment. Green blocks indicate analogue
blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3 OFDM Signal Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4 Estimated Values of ηSCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5 Gradient of estimated value of ηSCO . . . . . . . . . . . . . . . . . . . . . . 112
5.6 BER vs SNR for Electrical B2B experiment (Theoretical and Experimental) 113
5.7 Configuration of Electrical B2B Experiment with RF Driver. Green blocks
indicate analogue blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.8 BER vs SNR for Electrical B2B experiment with RF driver (Theoretical,
Experimental with and without RF driver) . . . . . . . . . . . . . . . . . . 114
5.9 Configuration of Homodyne Coherent Detection. DSP processing is done
offline in Matlab. Green blocks indicate analogue blocks. Light Blue blocks
indicate Optical components. . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.10 BER vs SNR for single-band Optical Back-to-Back Experiment . . . . . . . 116
LIST OF FIGURES xiii
xiv
LIST OF TABLES xv
4.22 Area vs. Bitwidth for Integer CFO Estimation block . . . . . . . . . . . . . 103
4.23 Area vs. Bitwidth for De-interleaver block . . . . . . . . . . . . . . . . . . . 104
4.24 Area vs. Bitwidth for Channel Estimation & Equalization . . . . . . . . . . 104
4.25 Area vs. Bitwidth for CPE Estimation & Compensation . . . . . . . . . . . 104
4.26 Fixed-Point Allocation and Area for all blocks of the R = 4-Parallel Receiver105
CD Chromatic Dispersion
CE Channel Estimation
CFO Carrier Frequency Offset
CMZM Complex Mach-Zender Modulator
CO Coherent Optical
CO-DP-QPSK Coherent Optical Dual Polarization QPSK
CO-OFDM Coherent Optical-OFDM
CoD Coherent Detection
CP Cyclic Prefix
CPE Common Phase Error
xvii
xviii List of Abbreviations
RF Radio Frequency
RGI CO-OFDM Reduced Guard Interval CO-OFDM
RMSE Root Mean Square Error
ROADM Reconfigurable Optical Add Drop
Multiplexer
RS Reed-Solomon
RZ Return-to-Zero
RZ-DQPSK Return-to-Zero Differential Quadratic Phase
Shift Keying
ZF Zero Forcing
Chapitre 0
Résumé étendu
Figure 1: Architecture typique d’un réseau optique. CN - Core Node, EN - Edge Node,
AN - Access Node
Pour transmettre des données sur une fibre optique simple mode (SMF), des tech-
niques de modulation et détection directes IM-DD (intensity modulation-direct detection)
sont utilisées pour obtenir des débits de 10 Gb/s sur des réseaux longues distances. Pour
supporter l’augmentation de la demande en débit, l’objectif est de supporter des liaisons à
100 Gb/s [6]. Ceci ne peut être atteint avec le schéma IM-DD de façon efficace à cause de
phénomènes tels que la dispersion chromatique CD (chromatic dispersion) ou la dispersion
du mode de polarisation PMD (polarization mode dispersion) qui apparaissant à de telles
1
2 Résumé étendu
vitesses. Pour atteindre de tels débits, la détection cohérente CoD (coherent detection)
a été introduite dans les systèmes de communications optiques et rendue possible grâce
aux progrès des circuits intégrés èlectroniques. La détection cohérente [7][8][9] offre des
avantages grâce à sa meilleure sensibilité de détection, à un débit symbole plus élevé, à
l’utilisation de la double polarisation (dual polarization) et, de façon plus importante, à
la conservation de l’information de phase et d’amplitude entre les domaines optiques et
électroniques, ce qui ouvre la possibilité d’utiliser de puissants algorithmes de traitement
numérique du signal pour la compensation électronique des dispersions EDC (electronic
dispersion compensation) à un faible coût et de façon flexible grâce aux DSP. L’utilisation
des modulations OFDM (orthogonal frequency division multiplexing) a été proposée pour
être utilisée conjointement avec la CoD pour atteindre des débits de 100 Gb/s avec une
meilleure flexibilité. La modulation OFDM est immune à la CD grâce à la présence d’un
préfix cyclique (cyclic prefix (CP)) et à la réduction de la complexité de l’égaliseur avec
l’utilisation de symboles d’apprentissage (training symbols (TS)). De plus, l’OFDM offre
un clair avantage en termes de flexibilité pour l’allocation de puissance par sous porteuse
(bit-power loading) et la présence de symboles pilotes dans les sous porteuses en fonc-
tion des conditions du canal. L’OFDM multi-bande à détection cohérente MB-CO-OFDM
(Multiband-Coherent Optical-OFDM) a donc été proposé en se basant sur les technolo-
gies récentes de convertisseurs numériques DAC et ADC et la réalisation possible dans les
circuits intégrés ASIC ou FPGA.
des algorithmes (e.g. synchronisation) à faible complexité avant de proposer des architec-
tures parallèles efficaces pour leur implémentation matérielle. Les sections suivantes dé-
taillent les résultats obtenus. Premièrement, une architecture et un algorithme à faible com-
plexité sont proposés pour la synchronisation temporelle des trames et symboles OFDM.
Deuxièmement, une architecture d’un transmetteur CO-OFDM complet est détaillé. Fina-
lement, les architectures et algorithmes sont validés dans un contexte d’expérimentation
offline puis temps réel.
La partie A est construite en prenant la IFFT de la séquence de Chu modifiée [12] sur
N
une taille Ns = 8. Ensuite, B est construit à partir de A par un renversement temporel
et une opération de conjugaison. Le motif de signes [1 1 1 − 1] est conçu de façon à
assurer une transition raide pour l’algorithme d’estimation grossière (coarse). L’algorithme
proposé contient trois étapes.
L−2
� M
� −1
Pinit [n] = u[k] r∗ [n + kM + m] · r[n + (k + 1)M + m] (4a)
k=0 m=0
L−1
�M −1
� � �
Rinit [n] = �r[n + kM + m]�2 (4b)
k=0 m=0
La figure 2.a trace T Minit [n] pour un Signal to Noise Ratio (SNR) de 10 dB dans
un canal sans fil sélectif en fréquence. L’algorithme d’estimation fine consiste en la
correction d’un petit décalage pour trouver le point de démarrage correct.
Figure 1a
Coarse Time Estimation Metric
0.8
0.6
0.4
0.2
0
−200 0 200 400 600 800 1000 1200
Figure 1b
Fine Time Estimation Metric
0.8
0.6
0.4
Figure 2: Tracé des fonctions de métriques temporelles grossière (a) et fine (b)
avec N N
−1 −1
�
4 �
2
T Mf ine [n]
Q[n] = (8)
max(T Mf ine [n])
La figure 2.b montre le tracé de Q[n] pour un SNR de 10 dB. La figure montre les
pics correspondant aux gains des trajets multiples. L’index temporel de la valeur
maximale de Q[n]
η�f ine = arg max(Q[n]) (9)
n
est utilisé comme point de démarrage pour la méthode des sommes fenêtrées.
• Sommation basée sur le seuil : les valeurs de Q[n] sont limitées par un seuil de valeur
β.
Q[n], Q[n] > β,
Q[n] = (10)
0, sinon,
β est le seuil qui sépare le signal de la composante de bruit dans Q[n]. Ce seuil est
déterminé en utilisant la distribution de probabilité de la composante de bruit dans
Q[n]. Les étapes sont les suivantes :
3. Une vitesse constante de fausse alarme (constant false alarm rate (CFAR)) de
"α" est utilisée pour le calcul du seuil. L’équation est dérivée de l’intégrale de la
fonction de distribution des probabilités (probability distribution function) du
bruit dans l’intervalle [β, ∞].
�√ �
−1
β = e 2·σ·erf (1−2·α)+µ (12)
CFAR est utilisé pour toutes les valeurs de SNR. Une somme fenêtrée est calculée
après avoir supprimé les valeurs de bruit en dessous du seuil β calculé.
w −1
S�
Ep (n) = Q(�
ηf ine − n + k) (13)
k=0
6 Résumé étendu
Finalement,
η�f inal = η�init − η�f irst (15)
Résultats de simulation
La figure 3 montre l’erreur quadratique moyenne (MSE) de l’estimation temporelle dans
un canal présentant des interférences entre symboles (ISI) pour différentes méthodes de
synchronisation. Les méthodes basées sur la corrélation des délais (Schmidl, Minn, Shi)
ont un MSE plus grand comparé aux méthodes utilisant des corrélations symétriques
conjuguées (Park, Choi). La méthode proposée est meilleure que Park et est comparable
à celle de Choi mais avec une complexité de calcul largement plus faible. Le nombre
d’opérations sur des nombres réels en fonction de N est décrit dans la Table 1 pour
différents algorithmes. Une réduction d’environ 80% de la complexité de calcul est obtenue
pour la méthode proposée (pour Nsym = 1126, Ncyp = 102) par rapport à celles de Choi,
Park et Zhou, tandis que les performances MSE restent très proches de celles de Choi.
104
MSE of start index estimation (symbols2 )
102
101
100
10−1
10−2
10−3
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
SNR (dB)
Table 1: Nombre d’opérations réelles pour le calcul d’un point de la métrique temporelle
Algorithme Multiplication Addition Division
Schmidl-Cox 15 13 1
Minn(L = 4) 31 29 1
Shi 59 61 1
Park (2N + 11) (2N + 7) 1
Choi (2N + 7) (2N + 3) 1
Zhou (2N + 22) (2N + 16) 1
Algorithme proposé 31 29 1
(coarse step) (L = 4)
Algorithme proposé (2N + 3) (2N − 1) 1
(fine step)
Dans le cas de canaux optiques SMF, les valeurs de dispersion ne sont pas trop élevées et
restent stables en comparaison avec l’effet multi-trajets des canaux sans fil qui peuvent pré-
senter des retards très importants. Par conséquent, l’étape de sommation fenêtrée peut être
éliminée et l’algorithme est réduit à ses deux premières étapes. Cet algorithme modifié est
utilisé dans le cadre des canaux optiques SMF. Pour les comparaisons de performance, seuls
les algorithmes basés sur l’auto-corrélation (Schmidl-Cox, Minn-Bhargava, Shi-Serpedin)
sont reportés. En effet, les algorithmes de cross-corrélation (Choi, Park) sont trop com-
plexes dans un contexte optique et ne génèrent pas de sorties à chaque cycle, comme requis
dans ce contexte. Les étapes pour calculer le point de départ du symbole dans un système
CO-OFDM ηf inal = ηinit − ηf ine sont donc :
Résultats de simulation
La figure 4 trace le MSE de l’estimation temporelle dans un canal SMF avec un CFO
de 4.75 pour les sous porteuses. On peut observer que l’algorithme proposé engendre une
légère dégradation pour de faibles OSNR à cause de sa faible complexité dans le calcul
de Pflcine . L’algorithme proposé donne par contre des améliorations significatives pour des
OSNR plus élevés. Rappelons que dans tous les cas la complexité de calcul est largement
réduite par rapport à l’état de l’art ce qui représente un grand avantage dans un contexte
de communications optiques à très haut débit.
102
MSE of start index estimation (symbols2 )
101
Schmidl Minn Shi Proposed
100
10−1
10−2
10−3
0 1 2 3 4 5 6 7 8 9 10 11 12
OSNR (dB)
Figure 4: MSE de l’estimation temporelle en fonction du OSNR pour des canaux SMS
et un CFO = 4.75
où Eq. 20 est la forme non itérative et Eq. 21 est la forme itérative, Pmb est la fonction
d’auto-corrélation, Mmb est la taille de la partie répétitive (A), et Mmb = N4 . Si les équa-
tions sont réécrites pour un niveau de parallélisme R = 4 avec une taille de bloc Mmb , on
obtient
L−2
� mb −1
M�
Pmb [n] = p[k] · p[k + 1] r∗ [n + m + kMmb ]
k=0 m=0
L−2
� mb −1
M�
Pmb [n + Mmb ] = p[k] · p[k + 1] r∗ [n + m + (k + 1)Mmb ]
k=0 m=0
L−2
� mb −1
M�
Pmb [n + 2Mmb ] = p[k] · p[k + 1] r∗ [n + m + (k + 2)Mmb ]
k=0 m=0
L−2
� mb −1
M�
Pmb [n + 3Mmb ] = p[k] · p[k + 1] r∗ [n + m + (k + 3)Mmb ] (22)
k=0 m=0
• R points initiaux sont calculés en mode non itératif ce qui nécessite Mmb cycles pour
l’algorithme MBA,
Après le calcul de R Mmb sorties, le même processus est répété pour les R Mmb sorties
suivantes. L’architecture prend (2Mmb − 1) cycles pour le calcul de Mmb points d’auto-
corrélation par bloc. L’architecture est appelée ”parallèle partielle” car elle ne produit
pas de sortie à chaque cycle et elle possède un délai équivalent à la partie initialisation.
Cependant le nombre de ressources est plus faible que dans la proposition suivante.
10 Résumé étendu
Real Real
Algorithm
Multipliers Adders
Pmb 4(R + 3) 2(3R + 3)
Real Real
Algorithm
Multipliers Adders
Psc 4R 2R(Msc /2 + 1)
architecture parallèle à R = 4 blocs pour MBA est présentée à la figure 6. La table 3 donne
la complexité architecturale pour le calcul de Pmb . Comparée à nouveau à celle de Kaneda
[14], des réductions en termes de surface de 17 à 72% sont obtenus en fonction de la taille
du symbole.
Figure 6: Architecture parallèle FSPB proposée pour le calcul de Pmb avec MBA et
R=4
0.6.1 Emetteur
A l’émission, la IFFT est le bloc principal en termes de complexité. Le choix du radix utilisé
pour le FFT est donc crucial et peut donc influencer la complexité de l’architecture. Pour
N = 256 la complexité en millions d’opérations par seconde (MOPS) est calculée pour des
FFT radix-2/4/22 et split-radix et reportée dans la table 4 pour supporter un débit de 7.3
Gbit/s. Ensuite, le nombre total d’opérations pour supporter un débit total Db,total ≥ 100
Gb/s est reporté dans la dernière colonne de la table 4. Ces résultats montrent qu’un gain
de 800 GOPS peut être obtenu pour les algorithmes radix-4/22 par rapport au radix-2,
tandis que 200 GOPS supplémentaires peuvent être atteints par split-radix. Nous avons
retenu le radix-22 car sa complexité architecturale est plus faible.
0.6.2 Récepteur
Les algorithmes utilisés pour la synchronisation temps, l’estimation du CFO, la FFT,
l’estimation du canal, l’égalisation, l’estimation de l’erreur en phase et la compensation
12 Résumé étendu
sont décrits dans cette section. La complexité algorithmique pour le calcul de N sorties
et pour un système à 117 Gb/s y est aussi présentée. Des optimisations spécifiques sur le
format des données sont réalisées pour réduire la complexité.
• Estimation CFO entière - Une cross-corrélation avec une séquence connue de sym-
boles est réalisée pour l’estimation CFO. Soit une séquence connue de longueur
Nif o = N/4 notée z[n]. Comme, l’estimation CFO est faite dans le domaine de fré-
quences la séquence connue peut être construite à partir de symboles QPSK de valeur
(±1 ± 1j). L’opération de cross-corrélation devient
où n est l’index de recherche, n ∈ [−Ws , .., −2, 0, 2, .., Ws ], où Ws est l’index maxi-
mum. Ici Ws = 20 est choisi comme valeur maximale. La valeur de Nif o est fixée à
32. La complexité algorithmique est reportée dans la table 6. Grâce à l’utilisation
d’une constellation QPSK, (±1 ± j), les multiplications complexes peuvent être com-
plètement éliminées et la complexité ainsi réduite. Un gain de 39.8 MOPS est ainsi
obtenu.
• Estimation CPE et compensation - Une estimation CPE basée sur des symboles
pilotes [15] est réalisée pour l’estimation du bruit de phase du LASER. La complexité
algorithmique est reportée dans la table 8.
Le bilan final montre un gain en complexité de plus 800 GOPS par rapport à une
implémentation non optimisée.
0.7 Experimentations
Des expérimentations temps réel et off-line ont finalement été réalisées dans cette thèse
pour valider les paramètres du système OFDM dans un contexte de communications op-
tiques. Tout d’abord, des scénarios en temps différé (off-line) ont été conduits à l’aide
d’un générateur de signaux AWG comme émetteur et d’un oscilloscope numérique rapide
(DSO) comme récepteur. La figure 7 montre la configuration hétérodyne dans laquelle des
sources LASER distinctes sont utilisées à l’émission et à la réception.
100
10−2
BER
10−3
10−4
10−5
10−6
2 3 4 5 6 7 8 9 10 11 12 13 14
OSNR (dB)
0.8 Conclusion
Les systèmes de communications optiques à très haut débit sont construits à partir des
techniques de pointe pour la détection, la modulation et la compensation de dispersion
tels que, la détection cohérente, les modulations multi-porteuses orthogonales (OFDM)
et la compensation électronique des dispersions (EDC). La réapparition de la détection
cohérente dans les systèmes de communication optique a été rendue notamment possible
par les progrès dans les circuits numériques dans les technologies avancées. La détection
cohérente possède une meilleure sensibilité pour la détection du signal par rapport aux
méthodes de détection directe. Elle permet d’utiliser des transmissions à double polari-
sation et conserve les informations de phase du signal optique et les transfert dans le
Résumé étendu 17
basé sur des plateformes FPGA et des convertisseurs numériques. Le TEB est utilisé pour
montrer la validité du système intégré et en donner les performances.
Chapter 1
Introduction
·104
8
Consumer Video
Consumer Web
Monthly Traffic (in Eta Bytes)
Business Video
6
0
2,012.5 2,013 2,013.5 2,014 2,014.5 2,015 2,015.5 2,016 2,016.5 2,017 2,017.5
Year
Figure 1.1: Cisco Visual Networking Index (VNI) Prediction of growth of internet by
Application Type (Updated May 2013). The ordinate units is in Eta Bytes (EB). Total
traffic is 2017 is predicted to be three times larger than 2012 [1].
19
20 Introduction
The Internet is built upon many communication standards, which use different types
of physical medium to communicate data bits around the world. One of the most im-
portant physical medium which forms the backbone of the network is Optical Commu-
nication system. Optical Communication system carries data presently over very long
distances (Submarine networks, Long-haul networks), medium distances (Metro networks,
Access Networks). With the introduction of Fiber-to-the-Home (FTTH), optical fiber com-
munication system is also serving end users directly. Submarine networks are undersea
networks which have links supporting distances of more than 2000 km. The terrestrial op-
tical communication network can be divided into three major types. Figure 1.2 shows these
three types of network which are classified as a function of distance. Core Network (CN)
Figure 1.2: Typical Optical Network Architecture, CN - Core Node, EN - Edge Node,
AN - Access Node
for transmission is called C-band. 1 Tb/s is the maximum capacity in C-band using 10 Gb/s
DWDM Long-haul transmission network, which uses Non Return-to-Zero (NRZ)-On-Off
Keying (OOK) modulation for transmission.
To realize still higher speeds of transmission, data rate on individual channels have
to be increased from 10 Gb/s to 100 Gb/s [6]. Increasing the symbol rate of 10 Gb/s
OOK transmissions is simply not a viable solution because of dispersion effects in the
optical fiber. Chromatic Dispersion (CD) causes Inter Symbol Interference (ISI) at very
high symbol rates and hence severely impacts single-carrier transmission. At the receiver,
the complexity of time-domain equalizer increases significantly with increased symbol rate.
Also, Polarization Mode Dispersion (PMD) effects are more severe at very high data rates.
Compensation of PMD is done by using bulky rotators which is not flexible. Hence, com-
pensation of both of these effects is challenging and solutions are not cost-effective. So,
the Intensity Modulation-Direct Detection (IM-DD) NRZ-OOK system cannot be scaled
to 100 Gb/s data rates per channel.
To realize higher speeds, Coherent Detection (CoD) has been reintroduced into optical
communication system. Direct Detection (DD) was preferred over CoD because of its sim-
plicity in complexity and cost. CoD has come back into prominence due to advancements
in VLSI circuits. CoD [7][8][9] offers additional advantages of higher detection sensitiv-
ity, higher symbol rates, use of dual polarization and more importantly the amplitude
and phase information is conserved when crossing from optical to electrical domain. This
opens up the possibility of Electronic Dispersion Compensation (EDC) using Digital Sig-
nal Processing (DSP) algorithms, which are low cost, powerful and reprogrammable. This
has led to development of Coherent Optical Dual Polarization QPSK (CO-DP-QPSK) sys-
tems which can work at 100 Gb/s. These systems use dual polarization and two bits per
symbol to essentially deliver four times the bit rate that allows the DSP to operate at four
times the lower frequency. Since it uses single-carrier scheme, it requires Finite Impulse
Response (FIR) filter for equalization. Also, the CO-DP-QPSK adopted in the 100 Gb/s
standard uses blind Channel Estimation (CE), which increases estimation complexity [16].
With the use of Quadrature Phase Shift Keying (QPSK), the use of Digital-to-Analog
Converter (DAC) can be avoided now [16]. But, if in future, higher modulation format
is adapted, DAC will have to be used and this will increase transmitter and equalizer
complexity [16].
In the same time, Coherent Optical-OFDM (CO-OFDM)[17][18] has been proposed as
a possible candidate for transmission for 100 Gb/s/400 Gb/s data rate and beyond. CO-
OFDM as the name indicates combines the technique of coherent detection (CoD) and
multi-carrier modulation of Orthogonal Frequency Division Multiplexing (OFDM) [19] to
counter the optical channel. OFDM is inherently immune to CD due to presence of Cyclic
Prefix (CP) and with the usage of Training Symbols (TS), the equalizer complexity can be
reduced significantly to a one-tap equalizer. Also, OFDM offers all the flexibility advantages
of allocation of power per sub-carrier (bit-power loading), pilot sub-carrier locations based
on channel conditions.
22 Introduction
Presently Ethernet is used at a line rate of 10 Gb/s. Due to large presence of Ethernet,
future increases in line rate will want to use Ethernet standard and change only the
technology to support higher line rates. The next upgrade step is 100 Gb/s Ethernet.
The jump from 10 Gb/s Ethernet to 100 Gb/s is necessary because router-to-router trunk
connectivity has already reached 100 Gb/s [20] and also achieving line rate of 100 Gb/s
compared to 10 lines of 10 Gb/s results in cost reduction per Gb/s. This makes achieving
100 Gigabit Ethernet (100GbE) a very important milestone to support the present day
demands. Along with the present goal of 100 GbE and towards a future goal of 1 Tb/s
Ethernet (1TbE), the solutions adopted should have these desirable properties, which can
make the solution future proof.
• They should be compatible with the present optical infrastructure which comprises of
single mode fiber having varying range of CD, Dispersion Compensation Fiber (DCF)
and other types of fiber.
• They should be scalable to higher speeds easily and can support reconfigurable net-
works which manages bandwidth at a higher software level.
estimation and compensation of non-idealities are done in digital domain, which makes
the algorithms complex and has to be adapted to optical system taking from Wireless
OFDM domain. These algorithms now have to operate at much higher speeds compared
to Wireless OFDM implementation and have to be easily scalable as well.
With FPGA maximum clock frequencies much lower than DAC/ADC frequencies, it
forces every DSP block to be parallelized or simply replicated to match the input data
rate. Simply replicating the blocks results in huge amount of area and is not feasible
in the long term. In this thesis, the goal is to have a fully scalable parallel CO-OFDM
system that can support a very-high data rate. At the transmitter, Inverse Fast Fourier
Transform (IFFT) is the major component, which needs to be parallelized and it needs to
be parallelized efficiently. The choice of radix and number of parallel outputs is explored to
get the best area efficient design. CO-OFDM transmitter is a feed-forward system, while
CO-OFDM receiver is a system with a feedback loop. Also, it consists of time, frequency,
phase offset estimation and compensation blocks which needs to be efficiently parallelized
along with Fast Fourier Transform (FFT) which forms the major block of the receiver.
Fixed-point analysis of the complete OFDM chain is done which helps in reduction of
the area. All the analysis is done for a single polarization CO-OFDM system. It can
be extended to dual polarization CO-OFDM system by the inclusion of Multiple Input
Multiple Output (MIMO) block which can separate the two polarization components and
feed into corresponding chains.
1.2 Contributions
The contributions of this work are given as follows.
3. A complete parallel architecture for all the blocks of the CO-OFDM transceiver is
proposed. The scalability towards 100 Gb/s is detailed. The scalability of individual
algorithms is explored in detail. A complete fixed-point analysis of the proposed
parallel CO-OFDM system is done. Area reduction due to the analysis is reported.
Figure 1.3 shows the possible power saving opportunities at different stages of VLSI de-
sign flow. Resource savings also follows a similar trend. As can be observed, low-complexity
algorithm and architecture saves resources at block level like savings of multipliers, adders
which down the flow results in significant savings compared to savings obtained at Register
Transfer (RT) or Logic Level. This approach is taken in this Thesis to optimize the resource
consumption in the search of parallel CO-OFDM transceiver algorithms/architectures and
it is expressed using high-level synthesis (HLS) language of CatapultC [21].
Figure 1.3: Power Savings Possible at each stage in Top down VLSI Design Flow
on present day DAC/ADC bandwidth and precision available. Literature survey of CO-
OFDM offline and online experiments are detailed. Complexity of algorithms (Number
of operations) used in both transmitter and receiver is calculated. This gives the state-of-
the-art complexity of CO-OFDM systems and provides motivation for reducing complexity
from the top, which means starting from low-complexity algorithms to scalable parallel
architectures and fixed-point exploration for reduction in resources required.
In Chapter 3, low-complexity algorithms for coarse time synchronization in a disper-
sive channel is explored. A novel hierarchical low-complexity synchronization algorithm
is proposed which provides low Mean Square Error (MSE) performance similar to cross-
correlation algorithms. Complexity of the algorithm is compared with previously proposed
algorithms. A novel parallel scalable architecture is proposed for coarse time synchroniza-
tion, which provides high throughput. Proposed real-time architecture is ideal for CO-
OFDM system that can receive multiple sample input per cycle and need to match the
input rate. Complexity analysis of the proposed architecture is performed and compared
with previously proposed real-time architectures for CO-OFDM systems.
In Chapter 4, design of MB-CO-OFDM system given a target rate is shown considering
dispersion parameters of the optical channel. An End-to-end parallel transceiver architec-
ture is proposed which can generate and process multiple samples per cycle and easily
scale to higher parallel inputs/outputs. Parallel architecture of each block is detailed and
savings in resources at the architectural level are shown due to usage of efficient archi-
tecture. The architecture exploration is done using CatapultC [21] which is a High Level
Synthesis (HLS) tool which accepts input in C and outputs Verilog/VHDL. Fixed-point
analysis of the signal processing chain is done which helps in reduction of area for achieving
a particular value of bit error rate (BER) at a particular value of Optical Signal-to-Noise
Ratio (OSNR). Resources consumed on Xilinx FPGA are reported and break-up of re-
sources consumed for each block is given and compared with previous architectures in
terms of scalability and performance.
In Chapter 5, CO-OFDM experiments performed using Arbitrary Waveform Gener-
ator (AWG) as transmitter and Digital Storage Oscilloscope (DSO) as receiver are ex-
plained. The transition from electrical back-to-back experiment (B2B) to Optical B2B
arrangement experiment with optical fiber is explored. BER curves are given for each
configuration as a function of SNR. Performance characterization is then done with dif-
ferent LASER used for both transmitter and receiver. Matlab is used for generating and
decoding data. Experimental curves of BER are compared with theoretical BER curves for
QPSK to validate the setup. Performance of the algorithm and architecture of proposed
real-time synchronization algorithm is then explored in the real-time FPGA platform de-
veloped as part of 100GFLEX FUI project. The proposed architecture is integrated into
this system and performance analysis done with synchronous and asynchronous sampling
configurations.
Chapter 6 concludes the thesis by outlining the major contributions done with respect
to reduction of computational complexity. Proposals to reach speeds higher than 100 Gb/s
26 Introduction
2.1 Introduction
An introduction to the different blocks of the CO-OFDM transceiver is presented in this
Chapter. A CO-OFDM system combines coherent detection and orthogonal multi-carrier
modulation to reach higher data rates greater than which is possible by IM-DD systems.
CO-OFDM system combines the use of coherent detection, OFDM multi-carrier modu-
lation and electronic dispersion compensation to extract more data rate out of optical
fiber channel. Since, the context is optical, the characteristics of single-mode optical fiber
are detailed in Section 2.2. Major linear and non-linear phenomena in the optical channel
which impair high-speed transmission are explained. Dispersion values of different types
of single-mode optical fiber used in core, metro, and access networks are given. Section
2.3 gives the major differences between Wireless and CO-OFDM systems, which helps
in understanding the unique challenges posed by the optical fiber and analogue/optical
front-ends of the system.
Section 2.4 explains a complete end-to-end CO-OFDM system, giving details sep-
arately about digital, analogue/RF, and digital blocks in the subsections that follow.
CO-OFDM system is an expensive system compared to IM-DD systems, in terms of op-
tical/analogue/digital components required for its realization. Section 2.5 calculates the
resource increase for CO-OFDM system in optical/analogue/digital domains, with detailed
analysis done on DSP algorithms used and their complexities. A survey of offline and real-
time CO-OFDM experiments is done and then the algorithmic/architectural complexity
of those systems is calculated. Section 2.6 lists the observations done by this survey and
Section 2.7 concludes the chapter.
27
28 CO-OFDM System
a single mode to propagate and is better at retaining the fidelity of light pulse over longer
distances. It has lower attenuation and much higher bandwidth than multi-mode optical
fibers (MMF). When light pulse travels in SMF, it undergoes pulse width broadening and
attenuation along the fiber. Present day attenuation values of SMF fibers are 0.2 dB/km,
which requires optical amplifiers (EDFA) only at distances of 50 km apart from each other.
EDFA is the most deployed optical amplifier because its amplification window coincides
with the band of lowest attenuation (C-band and L-band) in SMF. Different transmission
windows used in SMF are listed in Table 2.1. Phenomena which contribute to degradation
of signal as it travels through SMF can be grouped into linear and non-linear phenomena.
Description about these impairments are given in the following subsections.
Band Wavelengths
Name (in nm)
O-Band 1260 - 1360
E-Band 1360 - 1460
S-Band 1460 - 1530
C-Band 1530 - 1565
L-Band 1565 - 1625
U-Band 1625 - 1675
• Fiber Attenuation: Signal travelling the optical fiber experiences constant attenua-
tion (αdB ) as a function of length (LF ), given by
10 P0
αdB = log10 (2.1)
LF P
where P0 is the injected power, P is the received power, and LF is the length of
optical fiber. The attenuation can be classified into intrinsic and extrinsic losses.
The intrinsic loss mechanisms are:
3. Silica Absorption Loss - Pure silica causes absorption loss in two regions above
2000 nm.
The extrinsic loss mechanisms are due to bending loss and connection between two
fiber pieces. Figure 2.1 [reproduced from [2]] shows the variation of Fiber attenuation
as a function of wavelength. It shows a region of low attenuation in the C-Band and
L-Band.
Figure 2.1: Fiber loss coefficient vs. different wavelengths for a typical low-loss optical
fiber (SSMF) and fiber without the water absorption peak (Allwave). [Reproduced from
Essiambre et al.[2]]
• Polariztion Mode Dispersion (PMD): The State of Polarization (SoP) of the electric
field changes as the signal traverses through the optical fiber. The changes in SOP is
random because of fluctuating birefringence. Geometric birefringence and anisotropic
stress are the major sources of variation of birefringence. Variation in birefringence
means variation of refractive index, which leads to variation in propagation con-
stant (β). PMD is statistical in nature and is given by the following equation:
� �0.5
(∆T )2
Dp = √ (2.3)
LF
� ps �
where Dp - PMD √
km
, ∆T - mean square Differential Group Delay (DGD) value,
which is a Maxwellian distributed random variable, LF - length of optical fiber. In
case of IM-DD systems, dual polarization is not used. But for systems using dual
polarization, it changes channel coefficients and equalizer coefficients have to be
updated regularly to accommodate this.
Different SMF types are used based on distances involved in transmission. In undersea
network (submarine), distance involved is more than 2000 km. Terrestrial communication
networks is divided into core, metro and access networks. Core network covers distances
upto few hundreds to thousands of kilometres connecting cities or countries. Metro network
connects core and access network, covering several tens of kilometres. Access network
provides connectivity to the end users. Typical values of fiber used in all these types of
networks with values for fiber attenuation, CD and PMD are given in Table 2.2.
Table 2.2: Specifications of commercially available single mode fibers (Corning Fibers)
CD αdB
Fiber ITU-T PMD
√ Network
@1550 nm @1550 nm
Name Naming ps/ km Usage
(ps/nm − km) dB/km
PSCF G.654 20.2 ≤ 0.05 0.158 Submarine
SSMF G.652.D 18 0.1 0.21 Backbone
LEAF G.655 4.4 ≤ 0.04 0.19 Metro
SMF-28 G.652 18 ≤ 0.04 0.18 Access
CO-OFDM System 31
Non-linear impairments are directly proportional to transmission length (LF ) and inversely
proportional to cross-sectional area of the optical fiber. Since non-linear impairments are
caused due to higher power signals, the non-linear effects are reduced for attenuated signal.
For longer fiber lengths and smaller cross-sectional areas, non-linear interaction is stronger.
• Wireless channel can have deep spectral nulls in the bandwidth depending on external
environment of operation, resulting in frequency selective fading of the signal. The
CO-OFDM system uses optical fiber channel which has no spectral nulls in the region
of operation.
• Wireless channel varies much faster compared to optical channel whose time con-
stants of variation are of the order of ms. Optical channel is an engineered channel
with variations in channel parameters caused by temperature, fiber bending, etc.
• Wireless OFDM systems converts signal from RF to baseband signal using RF-to-
analog down converter, while CO-OFDM system converts from optical to RF and
then RF to baseband signal using LASER as local oscillator. Because of linewidth
of LASER, it results in integer carrier frequency offset (CFO) and rapid phase vari-
ations. Rapid phase variations limit the length of symbol size which can be used for
OFDM when using digital common phase error (CPE) estimation technique.
• Due to large bandwidth involved for CO-OFDM systems, the data converters (DAC
and ADC) become the bottleneck of the system since effective number of bits (ENOB)
available at such high bandwidth is also constrained. This imposes resolution con-
straints on data transmission at DAC and on reception at ADC.
• Data rates of Wireless OFDM systems are in the range of Mb/s, while CO-OFDM
systems must support data rates of the order of Gb/s. This difference in data rate
makes it necessary for each block to support multiple parallel input/output.
• Due to the absence of spectral nulls and channel variations, channel estimation al-
gorithm can be simplified and update rate of the coefficients can be reduced.
• Due to high data rates involved, highly parallel and scalable architecture are required
for all the blocks in the transceiver processing chain. Also, it is necessary to avoid
long feedback loops with large delay in critical path of computation.
coherent detection supports dual polarization in the optical fiber, thus essentially dou-
bling the data rate of a single polarization CO-OFDM system. A brief introduction of
Coherent Detection and OFDM modulation is given in subsections below. Next, all the
blocks (digital, analogue and optical) of the single polarization and dual polarization CO-
OFDM transmitter and receiver are explained.
CoD was researched heavily in the 1980s in the quest for providing improved receiver
sensitivity by detecting low signal powers caused by fiber loss. The invention of EDFA
resulted in low cost optical amplifiers that compensate for fiber loss. Due to its low cost,
IM-DD systems gained importance and CoD scheme was neglected. But in optical commu-
nication systems operating in excess of 20Gb/s data rates, CD and PMD effects became
34 CO-OFDM System
very computationally complex to compensate for in DD scheme. CoD started gaining im-
portance because of its ability to give access to optical electrical field. While DD scheme
only detects incoming intensity of the optical signal, CoD scheme detects both amplitude
and phase of optical signal. This enables the use of Electronic Dispersion Compensa-
tion (EDC), which uses DSP techniques for estimation and compensation of these linear
dispersion effects of the channel. With the use of DSP, cost of the system can be brought
down and flexibility of the system increases significantly. CoD scheme enables higher QAM
mapping schemes like QPSK, 16-QAM which increase the bits per symbol. It enables dual
polarization schemes which doubles the data rate per band and requires MIMO processing
at the receiver to separate the two polarizations. So, with optical community significantly
adopting DSP based solutions for very high data rate systems, CoD scheme has seen a
revival recently.
• Resistant to FSF - By division of bandwidth into narrow band flat fading channels,
it is more resistant to FSF effects of the channel. Frequency nulls can be avoided or
bit-power loading can be employed.
• One-tap Equalization - Due to addition of CP, linear convolution with the channel
is converted to circular convolution and hence a single-tap equalizer per sub-carrier
is sufficient.
• Sensitive to timing offset - Loss of timing synchronization causes ISI and ICI. With-
out timing synchronization, other offsets cannot be efficiently estimated and compen-
sated. Symbol synchronization and frame synchronization are essentially the same
in case of OFDM.
• Sensitive to Frequency and Phase Offsets - CFO at the receiver causes loss of or-
thogonality and causes symbol rotation. Phase offset [24] [25] causes rotation of con-
stellation.
• Mapper - It maps bits to symbols. Typical mapping schemes range from BPSK,
QAM, 16-QAM, 64-QAM. It is followed by a serial to parallel converter block before
the IFFT.
• IFFT - It modulates complex data from frequency domain to time domain. It is the
most complex block in the transmitter chain.
1 N�−1
x[n] = √ X[k]e−j2πkn/N (2.4)
N k=0
where x[n] is the time-domain signal, X[k] is the frequency domain signal and N is
the size of IFFT.
36 CO-OFDM System
• Add CP - It adds portion of last part of OFDM symbol to the front. It avoids ISI
due to multipath channel when the length of CP is greater than maximum dispersion
delay of the channel. It provides immunity against CD of optical fiber.
• Scale, Clip - Output of IFFT is scaled and clipped to fit in the input voltage range
of the DAC. Clipping value must be chosen to minimize clipping distortion as well
as quantization noise.
ECL
Polarizer
Digital IX DAC LPF RFD MZM V
I/p
OFDM O
π
Bits Transmitter Q DAC LPF RFD MZM 2A
X
Optical
ECL PBS PBC
O/p
16
TI Maxim-ic Fujitsu
14
Resolution (Bits)
12
Tektronix
10
8
Micram
6
4
0 5 10 15 20 25 30 35
Sampling Rate (GSa/s)
Figure 2.7: Resolution vs. Sampling Rate for fastest DAC available. GSa/s - Giga Sam-
ples/second.
• DAC - DAC converts digital output of IFFT to analogue output. Present day DACs
bandwidth and resolution lag behind the requirements for 100 Gb/s single-band
CO-OFDM system. A survey of the fastest DAC available in the market is shown
in Figure 2.7. Fastest DAC available has a sampling rate of around 34 GSamples/s
with a resolution of 6 bits. To reduce the constraints on DAC/ADCs, multi-band
CO-OFDM has been proposed to achieve a total data rate of 100 Gb/s in case of
100 Gb Ethernet.
• Low Pass Filter (LPF) - It filters the output signal with a cut-off frequency near the
Nyquist frequency of the DAC sampling frequency.
• RF driver - This amplifies the electrical signal after low pass filtering and output
modulates optical carrier in MZ Modulator.
• MZM - The carrier frequency supplied by External Cavity LASER (ECL) module is
modulated by the I/Q electrical signal.
• Variable Optical Amplifier (VOA) - The real and imaginary signals are combined
and amplified by the optical amplifier. The output signal is fed to Polarizer in case
of dual-polarization system.
Balanced ADC IX
Optical 90o
BPF PBS Photo
Signal Hybrid
Diode ADC QX
Balanced ADC IY
ECL 90o
PBS Photo
(LO) Hybrid
Diode ADC QY
Figure 2.8: Optical-to-RF Down Converter. BPF - Band Pass Filter, ECL - Exter-
nal Cavity LASER, LO - Local Oscillator, PBS - Polarization Beam Splitter, ADC -
Analog-to-Digital Converter, IX - Real Part of X-Polarization, QX - Imaginary Part of
X-Polarization, IY - Real Part of Y-Polarization, QY - Imaginary Part of Y-Polarization.
Figure 2.8 shows the front end of the optical receiver for receiving either single/dual
polarized optical signal. It shows direct down conversion architecture, where conversion
from optical to analogue is direct without any intermediate RF frequency. The Band Pass
Filter (BPF) selects the band for processing. The filtered signal is down converted by using
LASER frequency which is tuned to center frequency of the band. The optical signal’s
amplitude and phase information is detected by balanced photodiode circuit and then
sampled by ADC and converted to digital domain. The bandwidth of ADC is the limiting
factor. Oversampling by a large factor is not possible due to this limitation. Generally,
for CO-OFDM systems, an oversampling factor of 1.2 is used. A survey is done of the
fastest available ADC in the market as shown in Figure 2.9. The fastest available ADC
has sampling rate of 56 Gb/s with resolution of 8 bits.
• Remove CP - After CFO compensation, cyclic prefix (CP) is removed and N samples
are fed into FFT block.
CO-OFDM System 39
10
Micram
6
0
0 5 10 15 20 25 30 35 40 45 50 55 60
Sampling Rate (GSa/s)
Figure 2.9: Resolution vs. Sampling Rate for fastest ADC available. GSa/s - Giga Sam-
ples/second.
IX O/p
TFSYNC FCOMP FFT ICFO CEE CPEC DMAP
QX Bits
• FFT - It converts input time domain samples to frequency domain output samples.
It is the most complex block in the receiver chain.
1 N�−1
X[k] = √ x[n]ej2πkn/N (2.5)
N n=0
• Common Phase Error Estimation - Phase Error in the OFDM symbol caused due to
LASER’s rapid variations phase is estimated using pilot symbols dedicated in every
OFDM symbol. Compensation is done by multiplication by exponential multiplica-
tion.
Since the data converters (DAC, ADC) form the bottleneck with respect to sampling
frequency and also resolution, multi-band CO-OFDM (MB-CO-OFDM) is proposed to
reduce the pressure on the signal converters. Also, MB-CO-OFDM helps in the realization
of architectures on FPGA since the maximum frequency attained on an FPGA is order
of magnitude lesser than that of DAC/ADC. MB-CO-OFDM divides the total optical
bandwidth into smaller electrical bandwidths which can be handled by DAC/ADC and
target rate of 100 Gb/s is attained by the use of multiple bands working in parallel. In
this thesis, all the designs are for single-band single-polarization CO-OFDM block. Dual-
Polarization is indicated when it is applicable. Then, the total target data rate of 100 Gb/s
is achieved by using dual polarization multiple bands.
Table 2.3: Cost of Optical Transceiver for CO-OFDM, CO-QPSK and IM-DD Systems
For finding the increase in complexity in the digital part, a full complexity analysis is
done. The analysis is done at two levels. First, the algorithmic complexity of algorithms
CO-OFDM System 41
used in transmitter and receiver is calculated. Algorithmic complexity gives the total
number of real multiplications and additions required for computation of single sample
of output. For example, total number of complex multiplications and additions required
for one output of IFFT is expressed in terms of size of IFFT (N ). Then, the architectural
complexity of the algorithms is calculated for throughput of one output every clock cycle.
Throughput of one output clock is necessary to support high data rates and to avoid large
buffer memory. Architectural complexity involves calculation of number of real multipliers
and adders required for realization of the algorithm.
Real Real
Radix
Multiplications Additions
Radix-
2N · log2 N 3N · log2 N
2
Radix- 3 5
4 2N · log2 N 4N · log2 N
Radix- 3 5
22 2N · log2 N 4N · log2 N
Split- 4 8
Radix 3N · log2 N 3N · log2 N
Survey of previously reported real-time transmitter with offline receiver is done. The
objective is to calculate transmitter’s architectural complexity, which inherently comes
down to calculation of IFFT complexity. Table 2.6 lists real-time CO-OFDM experiments
on standard single mode fiber (SSMF) which have achieved gigabit per second using real-
time implementation on an FPGA.
The computational complexity of the proposed real-time solutions is given in Table
2.7. Proposal by Inan et al. [31] uses radix-2 IFFT for larger parallel factor of 64, which
is inefficient considering higher radix can be used at such high parallel output. Proposal
by Schmogrow et al. [29] does not use multipliers, but huge number of adders and LUTs.
Since, all multiplier combinations are stored in memory, it is limited to small size (N ) of
IFFT of 64. This approach is not scalable to higher speeds.
42 CO-OFDM System
CO-OFDM receiver architecture on FPGA, Kaneda et al. [14] and Chen et al. [32]. All ar-
chitectural complexity comparisons are done with these two papers wherever it is relevant.
Real Real
Algorithm
Multiplications Additions
Schmidl-Cox [33] 8 8
Minn-Bhargava(L = 4) [34] 12 12
Shi [35] 54 57
Park [36] 2(N + 2) 2(N + 1)
Choi [37] 2N 2(N − 1)
Zhou [38] 2N 2(N − 1)
Training
FFT Algorithm Real Real
Author Sym.
Size (N ) Used Multipliers Adders
Size (M)
2-PARALLEL INPUT/OUTPUT
Kaneda et al. 128 32 Auto-Corr. 8 68
Chen et al. 128 128 Cross-Corr. 0 508
4-PARALLEL INPUT/OUTPUT
Kaneda et al. 128 32 Auto-Corr. 16 136
Chen et al. 128 128 Cross-Corr. 0 1016
8-PARALLEL INPUT/OUTPUT
Kaneda et al. 128 32 Auto-Corr. 32 272
Chen et al. [32] 128 128 Cross-Corr. 0 2032
16-PARALLEL INPUT/OUTPUT
Kaneda et al. [14] 128 32 Auto-Corr. 64 544
where ��f rac is the fractional CFO, ��int is the integer CFO, L is the number of repeating
parts of training symbol, N is the size of IFFT/FFT. Fractional CFO Estimation is done
with the help of training symbol used for coarse synchronization, given by
2
��f rac = · P [η�start ] (2.8)
π
where ��f rac is the fractional CFO estimate η�start is the estimate of start of OFDM symbol,
P [ηstart ] is the auto-correlation function at the index of η�start . ��f rac gives CFO in terms
B
of sub-carrier spacing ( N ), B is the bandwidth of the OFDM signal, N is the size of
IFFT/FFT. The range of CFO estimation of Schmidl-Cox algorithm is ±1, and for Minn-
Bhargava algorithm (L = 4) is ±2. Algorithmic complexity of CFO Estimation is arc
tangent calculation and architectural complexity is LUT implementation of arc tangent
function. No real-time proposals for fractional/integer CFO estimation are proposed in
literature.
where xc [n] is the CFO compensated signal, x[n] is the input signal, ��int is the CFO integer
estimated, n is the time index n ∈ [0, N − 1], N is the size of IFFT/FFT. For N samples
in a single OFDM symbol, algorithmic complexity is 4N real multiplications and 2N real
additions. Architectural complexity for R-parallel output is 5R real multipliers and 3R
real adders, where R is the number of parallel outputs.
2.5.5 FFT
The FFT converts received data from time domain to frequency domain. It is the most
computationally intensive block in the receiver signal processing chain. Algorithmic com-
plexity is shown in Table 2.4 and architectural complexity is shown in Table 2.5. Kaneda
et al. used in-built Altera FFT for N = 128 size FFT, which uses 24 real multipliers and
totally 384 multipliers were used for decoding 16-parallel inputs.
• Method 1 - The method uses either two symbols [33] or one symbol [39]. In case of
two symbols/one symbol, the even sub-carriers in the symbols are related by a fixed
phase factor. The timing metric for detection of integer CFO is
�� �2
� x∗1,k+2g vk∗ x2,k+2g �
k∈X
B1 (g) = �� �2 (2.10)
2 k∈X |x2,k |2 |
where integer g spans the range of possible frequency offsets, X = −W, .., 2, ..., W is
the set of indices for even frequency components, W is the number of even frequencies
with the PN sequence. The index corresponding to the maximum value of B1 (g) gives
the integer CFO. Algorithmic/Architectural Complexity is given in Table 2.10.
where y is the known sequence, x is the received sequence, integer g spans the range of
possible frequency offsets, X = −W, .., 2, ..., W is the set of indices for even frequency
components. The index corresponding to maximum value of B2 (g) gives the integer
CFO estimate. Algorithmic/Architectural Complexity is given in Table 2.10.
46 CO-OFDM System
where Rkm is the k th sub-carrier of mth received OFDM symbol, Hk is the channel
response for k th sub-carrier, ckm is the k th sub-carrier of mth transmitted OFDM
symbol, Nkm is the additive noise. Using the training symbol, channel frequency
response can be estimated according to LS criterion using optimization criterion [40],
2. Dual Polarization - In dual polarization system, the received signal can be written
as
Rkm,x Rk(m+1),x
Ĥkxx = , Ĥkxy = (2.19)
ckm.x ck(m+1),y
Rkm,y Rk(m+1),y
Ĥkyx = , Ĥkyy = (2.20)
ckm,x ck(m+1),y
where Ckm is the equalized signal, Ĥk−1 is the inverse of 2x2 Ĥk matrix. Since the
Ĥk is a unitary matrix, the inverse calculation can be done by using a Hermitian
transpose. Algorithmic/Architectural complexity is given in Table 2.11.
After initial estimation using training symbols by LS estimation, NLMS method can be
used to track the channel. The equations used for single polarization are
where ek is the error between received symbol (Rk ) and ideal constellation symbol (Rkideal ),
k1 is the coefficient for updating energy, |R|2k,old is the old value of energy, step is the co-
efficient for updating equalizer coefficients, Ĥk,old is the old value of equalizer coefficient.
Algorithmic Complexity for NLMS Estimation is given in Table 2.11. Equalization in-
volves complex multiplications with channel estimated and its algorithmic/architectural
complexity is given in Table 2.11. Kaneda et al. simplified the channel estimation signif-
icantly by using look-up table implementation. No multipliers were required for channel
estimation.
Np −1
1 � ∗
e= r [m] · c[m] (2.25)
Np m=0
φerr = ∠e (2.26)
where e is the complex error vector, Np is the number of pilots in one OFDM symbol, r[m]
is the received signal, c[m] is the reference pilot symbol. The CPE compensation is done
by multiplying the input signal by eφerr . Algorithmic/Architectural complexity is given in
Table 2.12. Kaneda et al. simplified CPE estimation by using LUT implementation, which
avoided multipliers.
2.5.9 Demapper
Demapper - It maps incoming complex symbols to symbols of a constellation. It involves
comparisons with reference symbols of the constellation and calculation of distance. In case
of QPSK de-mapping, it can be reduced to checking positive/negative sign and mapped
to one of QPSK symbol.
2.6 Observations
From the algorithmic and architectural complexity calculations, it can be seen that IFFT/
FFT are the major resource hungry blocks. With the adoption of multi-band CO-OFDM to
reach the total target data rate of 100 Gb/s, resource savings obtained from one polariza-
tion single-band is multiplied by the total number of sub-bands used. Hence, effort targeted
towards resource optimization by low-complexity algorithm/architecture goes a long way
in reduction of digital computational complexity of CO-OFDM. From the survey, it has
been found that there is no low-complexity parallel architecture proposed for time syn-
chronization. Also, no proposal for efficient integer CFO estimation. Channel Estimation
is one more block which occupies significant area and hence needs to be optimized. Hence
this thesis directs its efforts towards low-complexity scalable algorithms/architectures for
single-band single-polarization CO-OFDM system. The only resource shared from transi-
tion from single-polarization to dual-polarization is channel estimation. Other than that,
all the blocks are replicated in both polarizations.
2.7 Conclusions
State-of-the-art survey of real-time CO-OFDM systems show that transmitters support-
ing large rates have been built. But, there also optimization of IFFT architecture and
scalability has not been explored. In case of real-time CO-OFDM receiver, only two ma-
jor publications have appeared which explore the complexity of the system. End-to-end
parallel architectures have not been explored and scalability also is an interesting op-
tion. Chapter 3 explores timing synchronization from a low-complexity algorithmic and
architecture standpoint and Chapter 4 explores end-to-end parallel architectures for the
complete CO-OFDM receiver. Chapter 5 details the experiments performed using proposed
low-complexity algorithms and performance characterized.
Chapter 3
3.1 Introduction
OFDM systems are sensitive to timing, carrier frequency offset (CFO)[41][42] and phase
offset [43]. Loss of timing synchronization causes inter-carrier interference (ICI) and inter-
symbol interference (ISI). It also leads to reduced accuracy in carrier frequency offset
estimation and causes sub-carrier dependent phase rotation after FFT. Uncompensated
CFO also causes rotation of sub-carriers proportional to frequency offset. Thus, loss of
timing and frequency synchronization reduces the advantages provided by single-carrier
OFDM over single-carrier systems.
In Section 3.2, a survey of timing synchronization algorithms proposed for Wireless
OFDM systems is done. Survey was done to look at possible improvements possible in
timing estimation. This led to a novel proposal of hierarchical low-complexity synchronizer
for Wireless OFDM systems which is given in Section 3.3. Performance of the proposal
is evaluated in Section 3.4 in an ISI channel. In Section 3.5, the proposal is adapted
to optical channel with modifications to reduce complexity. The performance of adapted
proposal is evaluated using single mode optical fiber channel in Section 3.6. In Sections
3.9 and 3.10, the proposed streaming parallel architectures for synchronization algorithm
is explained. Architectural complexity of the proposed architectures is calculated to show
the scalability of the architecture and compared with previous proposals. Section 3.12
concludes the chapter.
50
OFDM Synchronization 51
timing metric point calculation. This leads to low throughput in output timing metric
computation and hence can cause delay in synchronization. There is a need for a low
complexity synchronizer which can quickly synchronize with incoming signal in a highly
dispersive channel. Low complexity leads to lower resource count and low power which is
required both for Wireless and Optical systems.
1 N�−1
x[n] = √ X[k] · ej2πnk/N (3.1)
N k=0
where N is the number of sub-carriers and X[k] the complex information carrying symbol
in frequency domain. The sampled signal at the receiver can be written as
where η is the integer timing offset, � the CFO and φ the phase offset. w[n] is the additive
white Gaussian noise (AWGN) and s[n] in multipath channel is given by
L�
h −1
s[n] = h[m] · x[n − τm ] (3.3)
m=0
where h is the sampled channel response (complex channel coefficients) at the receiver. Lh
is the number of channel paths and τm is the path delay corresponding to the mth channel
path. The channel is assumed static for the duration of the OFDM symbol.
To achieve synchronization in ISI channel at low complexity, a new synchronization
method is proposed based on proposal of a new training symbol. The training symbol has
low PAPR and can support both delay correlation and conjugate symmetry correlation
operations. The training symbol is generated using CAZAC sequence, which have very low
PAPR and possess impulse-like auto-correlation properties and constant cross-correlation
property.
OFDM Synchronization 53
where 0 ≤ k < Ns , gcd(r, Ns ) = 1 and �a� denotes the integer part of a. Here r = 1 is used.
The alphabet size is Ns for modified Chu sequence compared to 2Ns for Chu sequence.
The training symbol proposed [11] is
where Pinit is the auto-correlation function, Rinit is the energy calculation function, T Minit
is the timing metric function and L is the number of repeating parts (L = 4) in the
L
proposed training symbol. The term (L−1) is used to normalized for maximum value of 1
at the correct starting point. The expressions for Pinit and Rinit are
L−2
� M
� −1
Pinit [n] = u[k] r∗ [n + kM + m] · r[n + (k + 1)M + m] (3.7a)
k=0 m=0
L−1 M
� � � −1 �
Rinit [n] = �r[n + kM + m]�2 (3.7b)
k=0 m=0
where u[k] = p[k] · p[k + 1], p[k] contains the sign pattern of [1 1 1 − 1], k = 0, 1, ..., (L − 1)
and M = N/L. The time index corresponding to the maximum value gives the initial
54 OFDM Synchronization
estimate.
η�init = arg max T Minit [n] (3.8)
n
Figure 3.1a shows the plot of T Minit [n] for a Signal-to-Noise Ratio (SNR) of 10 dB in a
frequency selective channel. The maximum peak is not exactly at zero index which is the
actual start of the OFDM symbol, but slightly shifted to the right due to multipath effect.
The fine estimation algorithm consists of correcting this unknown shift and finding the
correct starting point. The fine estimation algorithm does not assume a dominant first
path and can work with non-dominant first path in multipath channel.
Figure 1a
Coarse Time Estimation Metric
0.8
0.6
0.4
0.2
0
−200 0 200 400 600 800 1000 1200
Figure 1b
Fine Time Estimation Metric
0.8
0.6
0.4
Figure 3.1: Plot of Coarse (a) and Fine (b) Timing Metric Functions
The fine estimation algorithm uses the conjugate symmetry present in the training
symbol to estimate the correct starting point. The fine time estimation algorithm starts
N
from the point η�center = η�init + 2. Since the length of A or B part is greater than the
maximum delay spread of the multipath signal, all the four parts of the training symbol are
exposed to similar multipath channel environment. A search distance of [−Ncyp , Ncyp ] is
covered from η�center for finding all paths of multipath channel. The fine timing estimation
metric is given by
T Mf ine [n] = |Pf ine [n]|2 (3.9)
OFDM Synchronization 55
with
N N
−1 −1
�
4 �
2
Pf ine [n] = r[n − k − 1] · r[n + k] − r[n − k − 1] · r[n + k] (3.10)
k=0 k= N
4
where T Mf ine is the fine timing estimation metric and Pf ine is the conjugate symmetric
correlation operation. The negative sign for k ∈ [ N4 , N2 − 1] is because of the sign pattern
[1 1 1 − 1], n ∈ [−Ncyp , Ncyp ]. This range for n was chosen since the initial estimate
does not produce peaks outside the maximum length of multipath channel. The timing
metric produces peaks which are proportional to individual squared channel path gains.
The expansion of Pf ine [n] in terms of channel coefficients is given by Equation 3.11.
M
� −1
Pf ine [n] = r[n − k − 1] · r[n + k] (3.11)
k=0
M
� −1 � L�
h −1 h −1
L� �
= hm x[n − k − 1 − τm ] + w[n − k − 1]] h m� x[n + k − τ m� ] + w[n + k]
k=0 m=0 m� =0
M
� h −1
−1 L� M
� −1 � L�
h −1 Lh −1
� �
= h2m |x[n − k − 1 − τm ]|2 + hm hm� x[n − k − 1 − τm ] · x[n + k − τm� ]
k=0 m=0, k=0 m=0 m� =0,
m=m� m� �=m
+ W [n]
M
� h −1
−1 L�
Pf ine [n] ≈ h2m |x[n − k − 1 − τm ]|2 (3.12)
k=0 m=0,
m=m�
The term W [n] refers to the correlation terms produced due to noise and signal com-
ponents. The first term in Equation 3.11 corresponds to peaks produced in the timing
metric. Q[n] is calculated by normalizing all values of T Mf ine [n] by the maximum value
of T Mf ine [n]:
T Mf ine [n]
Q[n] = (3.13)
max(T Mf ine [n])
Figure 3.1b shows the plot of Q[n] for SNR of 10 dB. The figure shows peaks corre-
sponding to multipath gains. The threshold shown in 3.1b helps in selecting signal only
components which are to be used for the windowed summation method to find the first
arrival path. The time index of the maximum value of Q[n]
is used as the starting point for windowing summation method which is similar to the one
used in Choi’s. The values of Q[n] are thresholded by a value β.
Q[n], Q[n] > β,
Q[n] = (3.15)
0, otherwise,
β is the threshold which separates signal and noise components in Q[n]. This threshold is
56 OFDM Synchronization
determined using the probability distribution of the noise component in Q[n]. The steps
are as follows:
• The sequence Q[n] is passed through Lloyd-Max [13] quantization algorithm using
three levels of quantization.
• The lowest quantization level and its cluster are considered as noise here. It is ob-
served that this cluster follows a lognormal distribution. Mean (µn ) and variance (σn2 )
of noise cluster are calculated first. The corresponding mean µ and σ for lognormal
distribution is � �
µ2
µ = log � n (3.16a)
σn2 + µ2n
� � 2 �
σn
σ= log +1 (3.16b)
µ2n
• A constant false alarm rate (CFAR) of "α" is used for calculation of threshold.
The equation for threshold is derived by integrating probability distribution func-
tion (pdf) of the noise distribution with limits [β, ∞].
�√ �
2·σ·erf −1 (1−2·α)+µ
β=e (3.17)
A constant false alarm rate is used across all SNR values. A windowed summation is
performed after discarding the noise values using the threshold (β) calculated.
S�
w −1
Ep (n) = Q(η�f ine − n + k) (3.18)
k=0
where Sw is the length of summation window and Jm is the search window for signal
component. Then the first arrival path is given by
Finally,
η�f inal = η�init − η�f irst (3.20)
This value indicates final estimate of the starting index of the OFDM symbol.
2
��f rac = P [ηf inal ] (3.21)
π
OFDM Synchronization 57
where P [ηf inal ] is the autocorrelation among the four parts of the training symbol. The
calculation of P [ηf inal ] is done using (3.7a), difference being sign pattern of [1 1 1 1]. It
gives fractional and integer CFO estimation. The integer CFO estimation range is equal
to ± L2 sub-carrier spacing. The CFO estimation range of ��f rac for proposed TS is ± 2
sub-carrier spacing.
Parameters Value
IFFT/FFT Size (N ) 1024
Number of sub-carriers 1024
Length of Cyclic Prefix (Ncyp ) 102
OFDM Symbol Length (Nsym ) 1126
Window Size Sw (samples) 40
Distance Jm (samples) 36
Constant False alarm rate (α) 0.01
Number of simulation runs 105
Number of channel taps 16
Channel Tap Spacing (samples) 4
Ratio between first tap to last tap(in dB) 20
Carrier Frequency Offset (CFO)(�) 0.75
path, which reduces the MSE compared to Park’s method. Proposed method uses conju-
gate symmetry and windowed summation similar to Choi’s to get low MSE in estimation
which is better than Park and comparable to Choi at significantly lower computational
complexity as shown in Figure 3.2.
104
MSE of start index estimation (symbols2 )
102
101
100
10−1
10−2
10−3
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
SNR (dB)
1 3(SN R)−1
CRB(��) = (3.22)
2π 2 N (1 − 1/N 2 )
where N is the size of FFT. Schmidl’s CFO estimator comes closest to lower bound at all
SNR values. Minn’s CFO estimator uses algorithm of Morelli [45], which is computationally
more complex compared to Schmidl’s. Shi’s CFO estimation algorithm hits a floor after
SNR of 15 dB. Proposed CFO estimator is similar to Schmidl’s and gives estimates very
close to Schmidl’s at medium to high SNR values because of more accurate estimation of
the starting index compared to Schmidl’s algorithm which does not suffer from interference
from any negative sign in training symbol. The CFO estimator has a range of ± 2 sub-
carrier spacing compared to Schmidl’s which has a range of ± 1 sub-carrier spacing.
OFDM Synchronization 59
10−3
CR Bound
Schmidl
Minn
MSE of CFO Estimate
10−4 Shi
Proposed
10−5
10−6
10−7
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
SNR (dB)
Table 3.2: Number of Real Operations for calculation of a single timing metric point
Real Real
Algorithm Division
Multiplication Addition
Schmidl-Cox 15 13 1
Minn(L = 4) 31 29 1
Shi 59 61 1
Park (2N + 11) (2N + 7) 1
Choi (2N + 7) (2N + 3) 1
Zhou (2N + 22) (2N + 16) 1
Proposed coarse 31 29 1
step (L = 4)
Proposed (2N + 3) (2N − 1) 1
fine step
much complexity. In case of Schmidl, the MSE is very high although it is computationally
efficient. So, in terms of computational complexity, the proposed algorithm is significantly
better than Choi, Park and Zhou and in terms of MSE, significantly better than Schmidl,
Minn and Shi’s methods. Proposed method is specially useful for MIMO systems where
each antenna calculates timing synchronization and hence it needs to be computationally
and resource efficient. Thus, the proposed algorithm provides a very good trade-off be-
tween computational complexity and MSE of timing and CFO estimation in a frequency
selective channel.
In case of SMF optical channel, the dispersion value is not high and more stable compared
to multi-path effect of the wireless channel which can have large delay. Hence, the win-
dowed summation step in the proposed synchronizer can be eliminated and only a 2- step
procedure is necessary. This modified algorithm is used for synchronization in SMF optical
channel. For comparison purposes, only auto-correlation algorithms (Schmidl-Cox, Minn-
Bhargava, Shi-Serpedin) are compared. Cross-correlation algorithms (Choi, Park) are not
compared since they require large amount of resources and do not provide output every
cycle. Also, cross-correlation algorithms do not provide CFO estimation. The hierarchical
OFDM Synchronization 61
synchronization steps used to calculate starting point of OFDM symbol ηf inal = ηinit −ηf ine
are:
102
MSE of start index estimation (symbols2 )
101
Schmil Minn Shi Proposed
100
10−1
10−2
10−3
0 1 2 3 4 5 6 7 8 9 10 11 12
OSNR (dB)
Figure 3.4: MSE of Timing Estimation vs. OSNR in SSMF channel with CFO = 0.75
OFDM Synchronization 63
101
Schmidl Minn Shi Proposed
100
10−1
10−2
10−3
0 1 2 3 4 5 6 7 8 9 10 11 12
OSNR (dB)
Figure 3.5: MSE of Timing Estimation vs. OSNR in SSMF channel with CFO = 4.75
10−3
Schmidl
Minn
Shi
MSE of CFO Estimation
Proposed
10−4
10−5
10−6
0 1 2 3 4 5 6 7 8 9 10 11 12
OSNR (dB)
Figure 3.6: MSE of CFO Estimation vs. OSNR in SSMF channel for CFO = 0.75
the CFO compensation block. The block operates continuously to detect training symbols
repeatedly sent at the beginning of each frame and tracks CFO variations. The total
processing rate of the timing synchronization block has to match the input serial rate to
avoid large memory for storing incoming data. In the present literature, there has been
only two proposals for real-time parallel processing timing synchronization architecture.
The details of the previous proposals are given below. Both proposals are based on using
sample level parallelism to provide multiple outputs to match the input rate.
Msc /R R(k+1)−1
� �
kan
Psc [n] = r∗ [n + m] · r[n + m + Msc ] (3.28)
k=0 m=0
where Msc is the length of repeating part of the training symbol (A), R is the number
kan is the auto-correlation
of parallel inputs and outputs, r is the input data stream, Psc
function. Figure 3.7 shows the parallel architecture resulting from Eq. 3.28. For the
case of R = 16 and Msc = 32, it requires 64 real multipliers and 544 real adders.
Fs
Hence, it is not an efficient parallel realization of the algorithm. The ratio of Fclk
was 16, but only one pipeline was realized and the training symbol was duplicated
16 times. This resulted in reduced spectrum efficiency and less accurate estimation
OFDM Synchronization 65
of starting point. The fractional CFO estimation was significantly reduced due to
duplication of training symbol.
Figure 3.7: Parallel Architecture proposed by Kaneda et. al for Schmidl-Cox Algorithm
• Chen et al. [32] proposed an 8-parallel architecture which uses cross-correlation oper-
ation, and training symbol of the form [A A A − A]. The length of A was Mmb = 32.
Although, complex multipliers were avoided, the number of adders required was large
and it does not produce output every cycle unlike auto-correlation. Cross-correlation
with known training symbol in presence of large CFO value results in shifted peaks,
which reduces the accuracy of start point estimation. For R = 8-parallel architecture
proposed by Chen et al. 2032 adders were required, and does not provide fractional
CFO estimation.
Architectural complexity of the two architectures is shown in Table 2.9 for different par-
allel inputs/outputs. Previously proposed sample-level parallel architectures consume too
much of resources and do not scale efficiently. Due to complexity of architecture, they
do not provide fractional CFO estimation. The two proposals indicate that further effi-
ciency improvement of symbol synchronization in parallel processing is required [14]. In the
next sections, block-parallel architectures are proposed for acceleration of auto-correlation
66 OFDM Synchronization
Figure 3.8: Parallel Architecture proposed by Chen et. al for cross-correlation operation
QQHO RUGHU UHTXLUHV WR EH UHDUUDQJHG LI FRUUHODWLRQ UHVXOW LVQ¶W WKH
OFDM Synchronization 67
M�
sc −1
Psc [n] = r∗ [n + m] · r[n + m + Msc ] (3.29)
m=0
Psc [n] = Psc [n − 1] + r∗ [n + Msc ] · r[n + 2Msc ] − r∗ [n] · r[n + Msc ] (3.30)
where Eq. 3.29 is non-iterative equation and Eq. 3.30 the iterative equation. Psc is auto-
N
correlation function, Msc = 2 is the size of repeating symbol (A). It can be observed
that non-iterative correlation is time-consuming and does not produce outputs every cycle,
while iterative equation can produce outputs every clock cycle, but depends on availability
of past sample auto-correlation value. Since the non-iterative equation only depends on
inputs, it can be used to calculate auto-correlation value which can be fed into iterative
equation computation. Block-level parallelism [49] uses this idea and applies to multiple
parallel blocks working in this fashion. Block size is important to increase sharing of
resources. Observation of auto-correlation operation indicates its dependency of samples
delayed by Msc samples. This dependency can be used to decide the size of block to
ensure maximum resource sharing among multiple parallel blocks. If the non-iterative and
iterative equations are written for R = 4-parallel computation separated by Msc samples
apart, it is given by
M�
sc −1
Psc [n] = r∗ [n + m] · r[n + m + Msc ] (3.31)
m=0
M�sc −1
Psc [n + Msc ] = r∗ [n + m + Msc ] · r[n + m + 2Msc ] (3.32)
m=0
M�sc −1
Psc [n + 2Msc ] = r∗ [n + m + 2Msc ] · r[n + m + 3Msc ] (3.33)
m=0
M�sc −1
Psc [n + 3Msc ] = r∗ [n + m + 3Msc ] · r[n + m + 4Msc ] (3.34)
m=0
L−2
� M�
mb −1
Pmb [n] = p[k] · p[k + 1] r∗ [n + m + kMmb ]
k=0 m=0
where Eq. 3.39 is the non-iterative equation and Eq. 3.40 is the iterative equation, Pmb
N
is the auto-correlation function, Mmb is the size of the repeating part (A), Mmb = 4. If
the non-iterative and iterative equations are written for R = 4-parallel computation using
block-size of Mmb , it is given by
L−2
� M�
mb −1
Pmb [n] = p[k] · p[k + 1] r∗ [n + m + kMmb ] (3.41)
k=0 m=0
L−2
� M�
mb −1
Pmb [n + Mmb ] = p[k] · p[k + 1] r∗ [n + m + (k + 1)Mmb ] (3.42)
k=0 m=0
L−2
� M�
mb −1
Pmb [n + 2Mmb ] = p[k] · p[k + 1] r∗ [n + m + (k + 2)Mmb ] (3.43)
k=0 m=0
L−2
� M�
mb −1
Pmb [n + 3Mmb ] = p[k] · p[k + 1] r∗ [n + m + (k + 3)Mmb ] (3.44)
k=0 m=0
It can be observed that due to selection of block size Mmb , which is the distance of
auto-correlation, significant sharing of resources can be obtained. Based on sharing of
OFDM Synchronization 69
resources for non-iterative and iterative equations, two architectures are proposed which
can support block-parallel computation.
After the computation of R Msc outputs, the same process is repeated for the calcula-
tion of next set of R Msc outputs. The architecture consumes (2Msc − 1) cycles for com-
puting Msc auto-correlation outputs per block. The architecture is called partial streaming
because, it does not produce output every cycle and has a delay when computing initial
auto-correlation output using non-iterative equation. The outputs are not fully streaming,
but it uses minimum set of resources for computation. The following subsections depict
the architecture for SCA and MBA.
M�
sc −1
Rsc [n] = |r[n + m + Msc ]|2 (3.49)
m=0
Figure 3.10 shows the R = 4-parallel PSBP architecture for energy calculation in case
of SCA. Architectural complexity of the proposed PSBP architecture as a function of R
parallel inputs/outputs is shown in Table 3.4. Resource requirement scales only as a func-
tion of R-parallel input/output and is independent of Msc . It is compared with Kaneda’s
70 OFDM Synchronization
Figure 3.9: Proposed R = 4-Parallel PSBP Architecture for Psc calculation in case
of SCA. iter_flag = 0 indicates non-iterative computation mode, while iter_flag = 1
indicates iterative computation mode.
Figure 3.10: Proposed R = 4-Parallel PSBP Architecture for Rsc calculation in case
of SCA. iter_flag = 0 indicates non-iterative computation mode, while iter_flag = 1
indicates iterative computation mode.
OFDM Synchronization 71
Real Real
Algorithm
Multipliers Adders
Psc 4(R + 1) 2(3R + 1)
Rsc 2(R + 1) 3R + 1
Real Real
Algorithm
Multipliers Adders
Psc 4R 2R(Msc /2 + 1)
proposal for SCA. It can be seen that proposed PSBP architecture requires four more
multipliers compared to Kaneda’s proposal, but this difference is fixed and independent
of R. Number of adders required by Kaneda’s architecture is large due to its dependence
on size of training symbol (Msc ). For Msc = 32, adder savings of proposed architecture
for R = 16-parallel architecture is around 82%. Similar savings can be obtained for the
architecture of Rsc compared to Kaneda’s method of sample-level parallelization.
� M�
L−2 mb −1
Rmb [n] = |r[n + m + kMmb ]|2 (3.51)
k=1 m=0
Rmb [n] = Rmb [n − 1] + |r[n + 4Mmb ]|2 − |r[n + Mmb ]|2 (3.52)
Figure 3.12 shows the R = 4-parallel PSBP architecture for energy calculation in
case of MBA. Table 3.5 shows the architectural complexity as a function of R-parallel
input/output for Minn-Bhargava algorithm. It is again compared with Kaneda’s proposal
for Schmidl-Cox. Proposed architecture requires twelve more real multipliers compared to
Kaneda’s proposal and it is fixed independent of R. This is because Minn-Bhargava auto-
correlation (Pmb ) has inherently higher number of computations compared to Schmidl-
Cox auto-correlation (Psc ). But, there are savings in area required compared to Kaneda’s
proposal. For R = 16-parallel output, adder savings for proposed architecture is around
81%. Similar savings can be obtained for the architecture of Rmb compared to Kaneda’s
method of sample-level parallelization.
72 OFDM Synchronization
Figure 3.11: Proposed PSPB Architecture for calculation of Pmb in case of MBA.
iter_flag = 0 indicates non-iterative computation mode, while iter_flag = 1 indicates
iterative computation mode.
Figure 3.12: Proposed R = 4-Parallel PSBP Architecture for Rmb calculation in case
of MBA. iter_flag = 0 indicates non-iterative computation mode, while iter_flag = 1
indicates iterative computation mode.
OFDM Synchronization 73
Real Real
Algorithm
Multipliers Adders
Pmb 4(R + 3) 2(3R + 3)
Rmb 2(R + 3) (3R + 3)
Real Real
Algorithm
Multipliers Adders
Psc 4R 2R(Msc /2 + 1)
80
Kaneda’s Arch (Schmidl-Cox)
PSBP Arch. (Schmidl-Cox)
Number of Real Multipliers
40
20
0
2 4 6 8 10 12 14 16
R-Parallel Output
Figure 3.13: Multiplier requirement as a function of R-parallel output for PSBP and
Kaneda’s architecture, M = 32
74 OFDM Synchronization
600
Kaneda’s Arch. (Schmidl-Cox)
PSBP Arch. (Schmidl-Cox)
500 PSBP Arch. (Minn-Bhargava)
Number of Real Adders
400
300
200
100
0
2 4 6 8 10 12 14 16
R-Parallel Output
Figure 3.14: Adder requirement as a function of R-parallel output for PSBP and
Kaneda’s architecture, M = 32
iterative computation at regular intervals. In this case, the PSBP architecture is used in it-
erative mode only (iter_flag=1). This makes the FSBP architecture produce outputs every
cycle and therefore operate in real-time. Table 3.6 calculates the architectural complexity
of Psc for FSBP architecture.
Figure 3.15: R = 4-Parallel Initial Point Auto-Correlation Computation Block for SCA
Figure 3.16: R = 4-Parallel Initial Point Energy Computation Block for SCA
Real Real
Algorithm
Multipliers Adders
Psc (Initial Point) 4R 4R
Psc (Iterative Point) 4(R + 1) 2(3R + 1)
Psc (Total) 8R + 4 10R + 2
Rsc (Initial Point) 2R 4R
Rsc (Iterative Point) 2(R + 1) (3R + 1)
Rsc (Total) 4R + 2 7R + 1
Figure 3.17: R = 4-Parallel Initial Point Auto-Correlation Computation Block for MBA
Figure 3.18: R = 4-Parallel Initial Point Energy Computation Block for MBA
Real Real
Algorithm
Multipliers Adders
Pmb (Initial Point) 4R 4R
Pmb (Iterative Point) 4(R + 3) 2(3R + 3)
Pmb (Total) 8R + 12 10R + 6
Rmb (Initial Point) 2R 4R
Rmb (Iterative Point) 2(R + 3) (3R + 3)
Rmb (Total) 4R + 6 7R + 3
technology node (Table 3.8) [50] is calculated. Area for proposed FSBP architecture and
Kaneda’s architecture for SCA is calculated in Table 3.9. Area savings from 21 to 74%
are observed. Area for proposed FSBP architecture for MBA with Kaneda’s architecture
is calculated in Table 3.10. Area savings from 17 to 72% are observed.
Table 3.8: Area estimates for 2-Stage Pipelined Adders and Multipliers for 90nm tech-
nology node
140
Kaneda’s Arch. (Schmidl-Cox)
120
FSPB Arch. (Schmidl-Cox)
Number of Real Multipliers
80
60
40
20
0
2 4 6 8 10 12 14 16
R-Parallel Output
Figure 3.19: Multiplier requirement as a function of R-parallel output for FSBP and
Kaneda’s architecture, M = 32
78 OFDM Synchronization
600
500
Kaneda’s Arch. (Schmidl-Cox)
Number of Real Adders
300
200
100
0
2 4 6 8 10 12 14 16
R-Parallel Output
Figure 3.20: Adder requirement as a function of R-parallel output for FSBP and
Kaneda’s architecture, M = 32
Table 3.9: Area calculation of FSBP (Schmidl-Cox) and Kaneda’s architecture at 90nm
technology node for 5-bit multiplier, 10-bit adder for R = 16-Parallel input/output for
Schmidl-Cox Algorithm
Parallelism
R
Factor
2 1
4 3
8 5
16 9
3.12 Conclusions
In this chapter, a low complexity time synchronization algorithm is proposed which can
work in a highly dispersive channel. For complexity reduction of ≈ 80%, a similar MSE
performance comparable to cross-correlation only estimator is observed. This algorithm
is adapted to optical channel and performance of the algorithm is compared with other
auto-correlation algorithms. Next, two types of block parallel architectures were proposed
for synchronization algorithms. PSBP architecture provided partial streaming output and
required 82% (Schmidl-Cox)/81% (Minn-Bhargava) lesser adder resources compared to
Kaneda’s proposal. FSBP architecture supports full streaming output and area gains of
21-72% (Schmidl-Cox) and 17-72% (Minn-Bhargava) were observed. Then, conjugate sym-
metric correlation operation is accelerated on the proposed PSBP/FSBP architecture for
MBA to improve the timing estimation. The proposed architecture is scalable and can be
generalized for use with any auto-correlation based algorithms.
Chapter 4
4.1 Introduction
To reach 100 Gb/s total data rate, multi-band CO-OFDM (MB-CO-OFDM) approach is
adopted to reduce the pressure on signal converters (DAC/ADC), which presently form
the bottleneck in the signal processing chain. Using 50 GHz bandwidth allocated by In-
ternational Telecommunication Union (ITU) standard, MB-CO-OFDM divides this total
bandwidth into multiple non-overlapping sub-bands. Target data rate of 100 Gb/s requires
that the total line rate to be around 117 Gb/s to accommodate for overheads in transmis-
sion. Considering this scenario, it results in the requirement of single-band to support data
rates of Gb/s. Due to requirement of single-polarization to support data rates of the order
of Gb/s, choice of algorithms and efficient realization of algorithms used plays a huge part
in realizing the goal. OFDM frame structure choices like position and number of training
symbols decides the kind of estimation algorithms. For example, training symbol based
synchronization significantly reduces the complexity of detection at the receiver compared
to blind synchronization. The number of training symbols used can be traded-off with the
complexity of channel estimation algorithm used at the receiver, like whether to use LMS
or time-frequency domain averaging techniques to keep the complexity down. Efficient
realization of algorithms can be broken into two parts, namely adopting efficient parallel
architectures and optimizing on fixed-point precision without incurring too much penalty
on BER value.
A high level synthesis (HLS) approach has been used for realization of the CO-OFDM
architecture, which is described in Section 4.2. This approach is first of its kind in case
of CO-OFDM systems. Section 4.3 describes the frame structure, algorithms used for
the transmitter and receiver blocks of single-polarization single-band CO-OFDM system.
81
82 Parallel Architecture
Section 4.4 shows parallel architecture of the transmitter and the associated fixed-point
analysis. Section 4.6 explains the parallel receiver architecture starting from frame syn-
chronization to demapper block. Section 4.7 explains the fixed-point analysis of the receiver
architecture. Section 4.8 concludes the results showing the gains due to parallel transceiver
architecture and fixed-point optimizations.
Figure 4.1: HLS Block Diagram of CatapultC synthesis flow and Matlab Integration
Modelsim using C/C++ test bench. This helps in verifying the generated RTL code and
CatapultC also generates scripts for downstream synthesis tools.
The generated RTL code is imported into ASIC/FPGA tool flows. A RTL testbench
is written which verifies the functionality of this code for testing after synthesis step. The
major advantages in using CatapultC based HLS flow are
• Fixed-point exploration using C code in Matlab, which is used for RTL generation.
Same fixed-point libraries used for both simulation and synthesis.
data symbols. Data symbols contain data and pilot symbols. Pilot symbols are used for
phase offset estimation. Frame structure used for single and dual-polarization CO-OFDM
systems are shown in Figures 4.2 and 4.3 respectively.
Figure 4.2: OFDM frame format for single polarization (PolX ) CO-OFDM system
Figure 4.3: OFDM frame format for dual polarization (PolX ,PolY ) CO-OFDM system
TS1 (training symbol) is used for timing synchronization and fractional CFO estima-
tion. TS2 is used for integer CFO estimation and channel estimation. TS2 is repeated
twice [51] to improve the accuracy of channel estimation. QPSK mapping scheme is used
for data and pilot symbols. In subsection 4.3.1, the selection of sizes of IFFT/FFT and
cyclic prefix (CP) are discussed for a single band single-polarization CO-OFDM system
and data rate achieved using this setup is calculated. Subsections 4.3.2 and 4.3.3 describe
the algorithms adopted for transmitter and receiver in a single-polarization single-band
CO-OFDM system.
Db = p · log2 M · Bw (4.1)
where Db is the data rate in a single-band, p is the number of polarizations used, p = 1/2
for single/dual polarization respectively, M is given by the mapping scheme used, M = 4
for QPSK mapping, Bw is the bandwidth of the OFDM system. The length of cyclic prefix
Parallel Architecture 85
CD P MD
τmax = τmax + τmax (4.2)
CD
τmax = ηCD · Lf · c · Bw /f02 (4.3)
�
P MD
τmax = 3.5 · ηP M D · Lf (4.4)
where Lf is the fiber length in km, η CD is the chromatic dispersion coefficient, c is the speed
of light in m/s, Bw is the bandwidth of the system in Hz, f0 is the LASER frequency
used in Hz, η P M D is the polarization mode dispersion coefficient. The length of cyclic
prefix (Lcyp ) has to be greater than maximum dispersion delay (τmax ). The number of
samples in cyclic prefix is given by
where FDAC is the sampling frequency of the DAC. Ncyp must be sufficiently small com-
pared to length of IFFT/FFT (N ). A priori, loss in spectral efficiency is fixed
1 − �cyp
N = Ncyp · (4.6)
�cyp
Parameter Value
Carrier Frequency (f0 ) 193.1 T Hz
Sampling Frequency (FsDAC ) 5 GHz
Bandwidth (Bw ) 5 GHz
Spectral Efficiency loss assumed (�cyp ) 20 %
Fiber Lemgth (Lf ) 1000 km
Mapping scheme used (M ) QPSK
Maximum Delay (τmax ) 0.69 ns
Cylic Prefix size (Ncyp ) 8
IFFT/FFT size (N ) 256
Spectral Efficiency loss achieved 3.125 %
OFDM symbol duration (Tsym ) 52.8 ns
DAC
Sub-carrier spacing ( FsN ) 19.531 M Hz
data rate achieved by the single-band OFDM system considering the parameters calculated
and by considering loss of spectral efficiency due to forward error correction (FEC) (�f ec ),
loss of spectral efficiency due to training symbols (�tr ), loss of efficiency due to use of null
86 Parallel Architecture
sub-carriers (�null ),
Considering �f ec = 0.0627, �tr = 0.1, �null = 0.1, we get Db = 7.362 Gb/s for single
polarization. For all further calculations, the values of Ncyp = 8 and N = 256 is used for
designing both optical experiments and hardware implementation. To attain 100 Gb/ data
rate, eight sub-bands with dual polarizations are used. The total data rate achievable is
where Db,total is the total data rate in 50 GHz channel, Nsb is the total number of sub-bands
used. A guard band of 1 GHz is used for separating the sub-bands.
Table 4.2: Algorithmic Complexity for calculation of N output for IFFT size of 256
• Coarse Time Synchronization - The algorithm used is the proposed algorithm due
to its superior performance over other auto-correlation algorithms. The algorithmic
complexity for calculation of N outputs is shown in Table 4.3, for one sub-band and
for 117 Gb/s output. Fractional CFO is estimated using auto-correlation value at
the index corresponding to start point.
where Pmb is the auto-correlation function, Mmb is the length of repeating training
symbol used [A A A − A].
Table 4.3: Algorithmic Complexity (auto-correlation function only) for Proposed Syn-
chronization Algorithm
where n is the search index, n ∈ [−Ws , .., −2, 0, 2, .., Ws ], where Ws is the maximum
search index. Here Ws = 20 is chosen as the maximum search window value. The
value of Nif o is chosen to be 32. The algorithmic complexity is shown in Table
4.4. Due to use of QPSK constellation (±1 ± 1j), complex multiplications can be
completely avoided and complexity reduced. Savings of 39.8 MOPS is obtained by
this optimization.
• Channel Estimation & Equalization - The algorithms of least squares (LS) and nor-
malized mean least squares (NLMS) algorithms have been used for channel estima-
tion. Algorithmic Complexity is given in Table 4.5. Here, LS method’s for complexity
can be reduced by using multiplication by symbol [±1 ± 1j] and complex multiplica-
tions avoided. This optimization works for both Single Polarization and Dual Polar-
ization transmission. Savings of 29.2 GOPS is obtained for LS channel estimation and
21.9 GOPS is obtained for NLMS channel estimation method for single-polarization,
single-band by using this optimization.
• CPE Estimation & Compensation - Pilot based CPE estimation [15] is done for
estimation of LASER phase noise. Optimization can be done by using [±1 ± 1j]
symbol and thus complex multiplications can be avoided. Algorithmic Complexity
of CPE compensation is given in Table 4.6.
From the algorithms chosen, it can be seen that savings of 800 GOPS is obtained by
choosing radix-22 IFFT/FFT over radix-2. In the case of Integer CFO, lower complex-
ity method cross-correlation was adopted and optimized to save 39.8 MOPS. In case of
LS/NLMS channel estimation, savings of 29.2/21.9 GOPS were obtained due to optimiza-
tion.
Parallel Architecture 89
• Feed-forward
• Feedback
Since feedback based architectures do not provide parallel outputs every clock cycle, only
feed-forward based architectures are considered. Similar comment holds good for SDC feed-
forward architecture. Only MDC feedforward architectures can provide parallel outputs
every clock cycle and can be parallelized to provide higher number of parallel outputs.
The IFFT equation of radix-22 is given by
N
−1
� �
4
k (n1 +2n2 )
�
n3 k3
x(n1 + 2n2 + 4n3 ) = H(n1 , n2 , k3 ) · WN3 WN (4.13)
4
k3 =0
� N� � N 3N �
H(n1 , n2 , k3 ) = X(k3 + (−1)n1 X(k3 + + (−j)n1 +2n2 X(k3 + ) + (−1)n1 X(k3 + )
2 4 4
(4.14)
where x[n] is the IFFT output, X[k] is the input, N is the size of IFFT, W is the twiddle
factor multiplication. There are two kinds of architecture based on order of input for
radix-22 MDC IFFT. A novel architecture shown in Figure 4.4 is proposed based on input
order supplied with even and odd indices [55] separated. The proposed architecture has
more uniform routing architecture, but uses one extra complex multiplier compared to
previously proposed architecture shown in Figure 4.5 [5]. In Figure 4.5, the inputs are
applied in normal order. The routing architecture is more complicated.
Parallel Architecture 91
Figure 4.4: IFFT/FFT Architecture of 4-Parallel radix-22 for N = 256, when input is
given in even and odd index order
Figure 4.5: IFFT/FFT Architecture of 4-Parallel radix-22 for N = 256, when input is
given in normal order
Since architecture of Figure 4.5 uses one less complex multiplier compared to proposed
architecture (Figure 4.4), the architecture with normal input order is chosen. Table 4.7
compares the architectural complexity of radix-22 with radix-2/4/8/16 for different parallel
outputs. It shows the scalability of radix-22 for different parallel outputs compared to
radix-2/4/8/16. The amount of resources required by radix-22 is closest to the minimum
resources for all number of parallel outputs. Comparatively, as number of parallel outputs
increases, lower radix IFFT consumes more resources.
Table 4.7: Architectural Complexity (normal input order) for full streaming outputs for
N = 256, with input and output in natural order. Resource count is generated by using
SPIRAL tool [4] for radix-2/4/8/16 and using [5] for radix-22
range is fixed by maximum and minimum values of 64-QAM output. Consider the IFFT
equation implemented by the transmitter:
1 N�−1
x[n] = √ X[k] · ej2πnk/N (4.15)
N k=0
From a fixed-point design perspective of IFFT, the main observation is that IFFT block
provides input to DAC, which is precision limited to 6-8 bits. Hence, the output precision
of IFFT is precision limited by precision input of DAC. This is opposite of that in receiver,
where FFT occurs after ADC block. Hence, fixed-point precision computation at IFFT and
FFT needs to be different and this asymmetry can be used for resource optimization in
case of IFFT. Computation precision of IFFT which is closer to DAC precision is sufficient
and any extra precision used for calculation in IFFT will be discarded at the DAC. Based
on this observation, IFFT area optimization is done.
Table 4.8 shows the variation of Root Mean Square Error (RMSE) with resolution of
input/output bits (Wi ) and twiddle factor inputs (Wt ). RMSE for fixed-point output is
evaluated using �
�� N −1 �
� �
RM SE = � |Sn − Tn |2 /N (4.16)
n=0
Parallel Architecture 93
where Sn is the actual fixed-point output of IFFT while Tn is the double-precision floating-
point output used as reference. Figure 4.6 shows a semilog plot of mean of RMSE as a
function of variation of Wi for different values of Wt . It can be observed that minimum
value of Wt ≥ 7 is required to ensure low value of mean of RMSE. The value of Wi
chosen depends on the input resolution of DAC. Next, parallel architecture of transmitter
with different values of Wt ≥ 7 and Wi ≥ 6 is generated with CatapultC HLS tool. It
allows hardware exploration in terms of pipelining, loop unrolling, etc. to achieve a high
throughput architecture. The gains in area due to usage of lower fixed-point precision value
to achieve a particular value of RMSE is explored. Table 4.9 shows the resources usage in
terms of LUTs used as a function of different values of Wi and Wt . From Table 4.9, it can
be seen that resource consumption of IFFT is a strong function of input precision (Wi )
and a weak function of twiddle factor precision (Wt ). Percentage increase in area from Wi
= 6 to 10 bits for Wt = 8,9,10 bits is 57%,62%,66% respectively. Fixed-point optimization
with respect to Wi does offer huge savings in resource usage.
Table 4.8: Mean (µ) and Standard Deviation (σ) of RMSE for variation of Bitwidths of
inputs/outputs Wi and Twiddle Factor Wt
10−1
Mean of RMSE of IFFT Output
10−2
Wt = 4
Wt = 5
Wt = 6
Wt = 7
10−3 Wt = 8
Wt = 9
Wt = 10
10−4
4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10
Bitwidth of Input/Output Wi
Table 4.9: Area Occupied for variation of Bitwidths of inputs/outputs Wi and Twiddle
Factor Wt
Time
ADC FSBP Integer
Frequency
Time CFO Est-
O/p Sync.
Synch. imation
Memory
Total
Real Real
Algorithm Memory
Multipliers Adders
Locations
Pmb (Total) 44 46 4096
as synchronization memory.
3. CFO Compensation - This block receives data from FFT memory after removal of
cyclic prefix. It receives fractional CFO estimate from synchronization block and
integer CFO estimate from integer CFO estimation block. It compensates the CFO
by using R = 4-parallel multipliers. Architectural Complexity is 16 real multipliers
and 16 real adders.
4. FFT - It receives CFO compensated data and outputs data in frequency domain. It
uses radix-22 R = 4-parallel architecture. The architectural complexity is given in
Table 4.7.
5. Integer CFO Estimation - Figure 4.9 shows the parallel architecture for integer CFO
estimation using Equation 4.11. The look-up table implemented for multiplying by
complex conjugate of reference input is given in Table 4.11. Hence it requires only
two adders for performing multiplication with conjugate of reference input symbol.
IFO estimation block uses a 4N -size memory which stores the input till output is
computed. The delay incurred is equal to receiving 3N amount of samples. IFO
compensation involved at this point is done by reading from starting address which
is equal to integer CFO predicted. Architectural complexity is given in Table 4.12.
Table 4.11: Look-up table implemented for complex multiplication of conjugate of ref-
erence symbol with input r = a + jb.
Reference
Real Imaginary
QPSK
Output Output
Value
1+j a+b −a + b
1−j a−b a+b
−1 + j −a + b −a − b
−1 − j −a − b a−b
6. De-interleaver - It removes the ununsed sub-carriers in the OFDM symbol and pro-
vides data and pilot sub-carriers to the next block. It also calculates energy of the
non-zero sub-carrier samples. Architectural complexity is 8 real multipliers and 512
memory locations.
7. Channel Estimation & Compensation - Figure 4.10 shows the architecture of the
channel estimation and equalization block for single sample input. It receives input
sample and energy calculated from de-interleaver block. It also gets reference symbol
for LS channel estimation or old NLMS channel estimate from memory. The updated
value is written back to memory. Channel equalization is done using Hk,old and
updated Hk value calculated is written to memory for use in next iteration. The
multiplexer selects input to be given to LS channel estimator or NLMS estimator. The
LUT block is used to calculate the inverse of input energy. Architectural Complexity
for R = 4-Parallel Channel Estimator and Equalizer is given in Table 4.13.
Total
Real Real
Algorithm Memory
Multipliers Adders
Locations
Channel
Estimator
36 40 512
and
Equalizer
98 Parallel Architecture
Figure 4.10: Channel Estimation and Equalization Architecture which supports both
LS and NLMS equalizers
8. Common Phase Estimation & Compensation - Figure 4.11 shows the architecture of
CPE estimation. Compensation consists of complex multiplication by using the phase
error estimated. Architectural Complexity for CPE Estimation and Compensation
is given in Table 4.14.
Real Real
Algorithm
Multipliers Adders
CPE
Estimator
16 22
and
Compensator
in identification of blocks whose precision affect more BER more significantly compared
to others. This helps in aggressive optimization of such blocks. After selection of fixed-
point bitwidths of all the blocks, the area vs. bitwidth variation for each of the blocks is
calculated. This table helps explore optimizations which can lead to huge area savings in
large blocks like FFT, channel estimator, etc with certain loss in BER. Finally, the area
occupied by individual blocks after fixed-point optimization is shown in a pie-chart.
100
10−1
10−2
BER
Floating-point config.
10−3 Fixed-point config0
Fixed-point config1
Fixed-point config2
Fixed-point config3
10−4
Fixed-point config4
10−5
2 3 4 5 6 7 8 9 10 11 12
OSNR (dB)
Figure 4.12: BER vs. OSNR plot for floating-point and various fixed-point configurations
in Homodyne setup
100
10−1
10−2
BER
10−3
Floating-point Config.
Fixed-point Config5
10−4 Fixed-point Config6
Fixed-point Config7
10−5
5 6 7 8 9 10 11 12
OSNR
Figure 4.13: BER vs. OSNR plot for floating-point and various fixed-point configurations
in Heterodyne setup
Parallel Architecture 101
Table 4.16: BER vs. ONSR for floating-point and various fixed-point configurations in
Homodyne setup
BER
OSNR (in dB)
5.3 6.8 7.9 9.16 10.1 11.09
Floating-point
3.5x10−2 8.0x10−3 1.6x10−3 4.7x10−4 1.9x10−4 9.1x10−5
configuration
Fixed-point
5.9x10−2 1.4x10−2 2.6x10−3 7.8x10−4 3.6x10−4 1.9x10−4
config0
Fixed-point
9.4x10−2 2.1x10−2 5.6x10−3 1.3x10−3 4.9x10−4 3.2x10−4
config1
Fixed-point
1.2x10−1 3.1x10−2 9.6x10−3 1.9x10−3 8.3x10−4 5.2x10−4
config2
Fixed-point
1.7x10−1 5.1x10−2 1.8x10−2 4.9x10−3 1.7x10−3 1.1x10−3
config3
Fixed-point
2.2x10−1 6.6x10−2 2.4x10−2 6.9x10−3 2.7x10−3 1.7x10−3
config4
1, which means that the receiver reads and writes data every clock cycle. The generated
Verilog/VHDL code is synthesized using Xilinx ISE tool targeted towards Virtex-7 De-
velopment Board. The blocks were designed to work at a frequency of 200 M Hz. Each
of the blocks of the receiver were synthesized individually and resources taken at differ-
ent input/output precision values are given. Tables 4.19 to 4.25 give area calculations for
the blocks in the receiver starting from Time Synchronization to CPE Estimation and
Compensation.
The values of precision selected for the blocks using "Fixed-point config0" are given in
Table 4.26 and plotted in Figure 4.14. If instead of "Fixed-point config1" was chosen, then
a savings of 13.3% in LUTs would have been obtained for a small degradation of BER.
102 Parallel Architecture
Table 4.18: BER vs. ONSR for floating-point and various fixed-point configurations in
Heterodyne setup
BER
OSNR (in dB)
6.5 7.4 8.6 9.8 10.9 11.85
Floating-point
6.5x10−3 3.5x10−3 1.0x10−3 2.9x10−4 1.6x10−4 6.4x10−5
configuration
Fixed-point
9.5x10−3 5.0x10−3 1.3x10−3 3.8x10−4 2.3x10−4 9.4x10−5
config5
Fixed-point
1.3x10−2 9.0x10−3 1.9x10−3 5.3x10−4 3.5x10−4 1.5x10−4
config6
Fixed-point
1.7x10−2 1.4x10−2 2.9x10−3 7.3x10−4 4.8x10−4 2.3x10−4
config7
Table 4.19: Area Occupied vs. Bitwidth for Time/Frequency Synchronization block
Time/Frequency Synchronization
Bitwidth Area in Area
Wi Xilinx FPGA Numbers
LUTs 36210
6
DSP Multipliers 36
LUTs 44873
8
DSP Multipliers 36
LUTs 51374
10
DSP Multipliers 36
Parallel Architecture 103
CFO Compensation
Bitwidth Area in Area
Wi Xilinx FPGA Numbers
LUTs 1807
6
DSP Multipliers 15
LUTs 1985
8
DSP Multipliers 15
LUTs 2437
10
DSP Multipliers 15
FFT
Bitwidth Area in Area
Wi Xilinx FPGA Numbers
LUTs 73529
8
DSP Multipliers 36
LUTs 85653
10
DSP Multipliers 36
LUTs 97979
12
DSP Multipliers 36
Table 4.22: Area vs. Bitwidth for Integer CFO Estimation block
De-Interleaver
Bitwidth Area in Area
Wi Xilinx FPGA Numbers
LUTs 9622
8
DSP Multipliers 8
LUTs 11543
10
DSP Multipliers 8
LUTs 13678
12
DSP Multipliers 8
Table 4.24: Area vs. Bitwidth for Channel Estimation & Equalization
Table 4.25: Area vs. Bitwidth for CPE Estimation & Compensation
4.8 Conclusions
In this Chapter, an end-to-end fully streaming parallel architecture for CO-OFDM system
was proposed. For the transmitter, low-complexity radix-22 IFFT algorithm was used, for
which scalable parallel architecture was utilized. Utilizing the idea that IFFT is present
before DAC and hence limited by its precision, architecture was designed to use precision
closer to DAC precision. This helps in area savings compared to FFT at the receiver
which is not precision limited since it receives data from ADC. The frame structure and
algorithms chosen is done to reduce long feedback loops. The proposed architecture has
Parallel Architecture 105
Table 4.26: Fixed-Point Allocation and Area for all blocks of the R = 4-Parallel Receiver
1%
Time-Frequency Synchronization
CFO Compensation
19%
FFT
46% Integer CFO Compensation
1.28% De-Interleaver
Channel Estimation & Equalization
CPE Estimation & Compensation + Demapper
23%
4% 6%
Figure 4.14: Pie Chart of area Occupation of all blocks of R = 4-Parallel CO-OFDM
Receiver (Fixed-point config0)
106 Parallel Architecture
only one feedback loop whose output is required once every 80 OFDM symbols. At the
receiver, scalable parallel architecture for Time Synchronization was used. Low-Complexity
Parallel blocks for Integer CFO Estimation, Channel Estimation and CPE Estimation was
proposed. The implementation was done for R = 4-Parallel CO-OFDM Transceiver on
Xilinx FPGA using CatapultC. Fixed-point exploration was done using ac_fixed models
and whole parallel CO-OFDM receiver fits in a single Virtex-7 development board. Area
vs. BER trade-off exploration was indicated.
Chapter 5
Experimental Validation of
CO-OFDM System
5.1 Introduction
In this chapter, offline and real-time experiments conducted for validation of OFDM frame
format, transceiver algorithms adopted are explained. In the first half, optical experiments
conducted using arbitrary waveform generator (AWG) as transmitter, digital storage oscil-
loscope (DSO) for sampling received signal and Matlab for generating and decoding data
are described. The OFDM frame format and transceiver algorithms used are from Section
4.3. The algorithm used for selection of best sampling point when using oversampling at
the receiver is described in Section 5.2. The experiments are done in a step-by-step manner
starting from electrical back-to-back (B2B) experiment, which is detailed in Section 5.3.
In Section 5.4, the effect of addition of RF driver to the setup is explored. Section 5.5
describes optical B2B configuration using same LASER for both transmitter and receiver.
Performance is characterized by plotting BER as a function of optical signal-to-noise ra-
tio (OSNR). In section 5.6, performance of the system is explored in the presence of sepa-
rate LASER sources for transmitter and receiver, which resembles real-world data transfer
using optical communication systems.
The second half details experiments conducted on real-time FPGA platform. Sec-
tion 5.7 provides details about the transmitter and receiver FPGA prototype platform,
OFDM frame structure and algorithms used. It primarily validates the use of real-time full-
streaming block-parallel (FSBP) architecture proposed for timing synchronization. The
timing synchronization block provides input samples to FFT in a continuous manner. The
FPGA transceiver prototype platform was developed as part of FUI 100GFLEX project.
Section 5.8 gives the results achieved by the use of proposed architecture in presence of
synchronous and asynchronous sampling at the receiver. Section 5.10 concludes the chap-
ter.
107
108 CO-OFDM Experiments
Figure 5.1: OFDM frame format for single polarization (PolX ) CO-OFDM system
• Channel Estimation - LS for initial estimation and NLMS Equalizer for tracking.
• Phase Tracking - Common Phase Error (CPE) estimation using pilots in data sym-
bols.
In the optical experiments conducted, the signal was oversampled at the receiver and
for decoding, the best sampling point needs to be found. To find the optimal sampling
point in the oversampled signal, fine timing estimation algorithm was used. Fine timing
estimation algorithm calculates residual timing offset in the OFDM signal after the FFT
operation. The output of this operation has integer and fractional parts. The integer part
gives residual integer timing offset estimate, while the fractional part gives sampling clock
offset (SCO) estimate.
ηf ine = ηi + ηSCO (5.1)
where ηf ine is the total fine timing offset estimate, ηi is the integer timing offset estimate
and ηSCO (the fractional part) is the SCO estimate. For the estimation of fine timing offset,
the algorithm proposed by Lee et. al [56] is used. It uses the phase information present
in the sub-carriers to derive estimates of timing offset. Consider the received signal after
FFT affected by residual timing offset after coarse timing synchronization
where Yj [k] and Xj [k] is the k th sub-carrier of j th OFDM symbol, ηi is the integer timing
offset, ηSCO is the SCO, Nj [k] is the noise at the sub-carrier. Data symbols are mapped
using QPSK scheme. The effect of timing offset in the frequency domain is the rotation
CO-OFDM Experiments 109
B/2−1
1 � N
Y p4[j] = Y 4 [j + k], |j| < (5.3)
B k=−B/2
2
1 K−1
�
G= Y p4[k] · Y p4∗ [k + 1] (5.4)
K k=0
where G is the mean of complex phase rotation, it contains both integer and fractional
timing offset contribution, K is the total number of group of sub-carriers. The fine symbol
timing estimation is obtained by taking angle of G:
N
η= G (5.5)
2πB
where η denotes total fine timing offset, η = ηi + ηSCO . In an oversampled signal, this
calculation is done on all the sample streams obtained by down sampling. For example, in
case of oversampling by a factor of 10, ηSCO is calculated for all 10 sample streams and
sample stream with the lowest ηSCO is selected for further demodulation. But, calcula-
tion of ηSCO for all streams for every OFDM symbol in each frame results in too much
computational complexity. To reduce the computational complexity, average SCO value is
calculated using a 20 OFDM data symbols for single downsampled stream. Then, the next
downsampled stream is chosen for calculation of SCO over next 20 OFDM symbols. It is
done till all the downsampled streams are covered. Stream corresponding to minimum of
the average SCO values calculated is selected for further data decoding. For example, if
received signal is oversampled by 10, then for each of the 10 streams, SCO is calculated
using 20 data OFDM symbols. After this calculation, stream corresponding to minimum
SCO value is chosen. This method reduces the computational complexity and assuming
ADC sampling clock remains stable over many OFDM frames, this calculation needs to
be repeated after many OFDM frames (1000 OFDM Frames).
110 CO-OFDM Experiments
Figure 5.2: Configuration of Electrical B2B Experiment. Green blocks indicate analogue
blocks.
The spectrum of OFDM signal after AWG taken using spectrum analyzer is given in
Figure 5.3. Due to high-frequency sub-carriers (sub-carriers near N/2) being switched off,
aliasing effects are reduced. The selection of best sampling stream is done using the esti-
mated values of ηSCO (shown in Figure 5.4) and it’s gradient (Figure 5.5). The minimum
value of absolute value of product of ηSCO and gradient of ηSCO is used as a metric for
deciding the best sampling stream among the possible choices.
Theoretical BER for QPSK modulation is given by
�� �
Es
BER = 0.5 erf c (5.6)
2N0
� ∞
2 2
erf c(x) = √ e−t dt (5.7)
π x
CO-OFDM Experiments 111
0.3
0.2
Estimated Value of ηSCO
0.1
−0.1
−0.2
−0.3
0 1 2 3 4 5 6 7 8 9
Sample Streams
0.25
0.2
Gradient of ηSCO
0.15
0.1
5 · 10−2
0
0 1 2 3 4 5 6 7 8 9
Sample Streams
Figure 5.6 shows BER as a function of Es /N0 . Es /N0 is varied by adding gaussian noise at
the receiver. BER is calculated by averaging over 1000 frames in a single acquisition. Each
frame consists of 77 OFDM data symbols. It can be seen that experimental BER curve
follows theoretical curve very closely and validates the Electrical B2B configuration.
100
Theory
Experiment
10−1
10−2
BER
10−3
10−4
10−5
0 1 2 3 4 5 6 7 8 9 10 11 12
SNR Es /N0 (dB)
Figure 5.6: BER vs SNR for Electrical B2B experiment (Theoretical and Experimental)
Figure 5.7: Configuration of Electrical B2B Experiment with RF Driver. Green blocks
indicate analogue blocks.
curves. From Figure 5.8, it can be observed that since BER curve after RF driver is very
close to BER curve of Electrical B2B experiment. It shows that the addition of RF driver
does not introduce non-linearities in the transmission chain.
100
Theory
Without RF Driver
10−1 With RF Driver
10−2
BER
10−3
10−4
10−5
0 1 2 3 4 5 6 7 8 9 10 11 12
SNR Es /N0 (dB)
Figure 5.8: BER vs SNR for Electrical B2B experiment with RF driver (Theoretical,
Experimental with and without RF driver)
Figure 5.9: Configuration of Homodyne Coherent Detection. DSP processing is done of-
fline in Matlab. Green blocks indicate analogue blocks. Light Blue blocks indicate Optical
components.
CO-OFDM Experiments 115
• Opto-Electronic Receiver - The optical attenuator is used to vary the optical signal-
to-noise ratio (OSNR). The attenuated signal is then amplified for transmission and
measurement. 10% of the amplified signal is given to optical signal analyzer (OSA)
for measuring OSNR of the modulated signal and 90% of the signal is used for
transmission. The optical band pass filter (BPF) selects the bandwidth around the
carrier frequency and the filtered output is passed through optical attenuator. The
bandwidth range of optical BPF is 3 nm. The optical attenuator is used to control
the maximum optical power level at the input of coherent detector. The maximum
value of signal input of coherent detector is fixed at -5 dBm. The attenuated output
is connected to polarization controller which allows to maximize the output level
of CoD corresponding to X-polarization or Y-polarization. The LO power level is
around 11 dBm and signal after coherent detection is given to digital storage oscil-
loscope (DSO). The sampled digital data are then transferred to computer running
Matlab for offline processing.
The bias of MZM needs to be adjusted for operating it in linear region. The setting
of bias is done using the first training symbol (TS1 ), which has the property that it is
PSK signal both in time domain and frequency domain. For adjusting the bias of MZM,
TS1 is selected and zoomed in at the DSO. The bias of MZ Modulator is now adjusted to
have circular constellation for TS1 . Faithful reproduction of circular constellation at the
DSO indicates that MZ Modulator is operating in the linear region and this is done on
everyday before the start of the experiment, since bias drifts with temperature. During
the experiment, using the optical attenuator, the OSNR value is varied from 2 to 13 dB
and corresponding BER obtained by Matlab program is tabulated. Figure 5.10 shows the
BER obtained for different values of OSNR for the optical B2B experiment.
Figure 5.11 shows the homodyne configuration with the addition of standard single
mode optical fiber (SSMF) of length 50 km. Again OSNR value is varied and BER values
116 CO-OFDM Experiments
100
Without Optical Fiber
With Optical Fiber
10−1
10−2
BER
10−3
10−4
10−5
10−6
2 3 4 5 6 7 8 9 10 11 12 13 14
OSNR (dB)
are noted. Figure 5.10 shows the BER as a function of OSNR with introduction of SSMF.
The introduction of SSMF does not change the performance of the system due to presence
of cyclic prefix (CP) which helps in tolerance of chromatic dispersion and is same as optical
B2B configuration.
Figure 5.11: Configuration of Homodyne Coherent Detection with SMF of 50 km. DSP
processing is done offline in Matlab. Green blocks indicate analogue blocks. Light Blue
blocks indicate Optical components.
degradation and the size of frame which can be used in case of heterodyne configuration
was found. This value of frame size of around 45 OFDM symbols is used in the real-time
FPGA prototype platform. The DSP algorithms are realized in hardware on the real-time
FPGA platform and characterization of performance of these algorithms is done in the
next Section.
Figure 5.12: Configuration of Heterodyne Coherent Detection with standard single mode
fiber (SSMF) of 50 km. DSP processing is done offline in Matlab. Green blocks indicate
analogue blocks. Light Blue blocks indicate Optical components.
100
10−2
BER
10−3
10−4
10−5
10−6
2 3 4 5 6 7 8 9 10 11 12 13 14
OSNR (dB)
Figure 5.13: BER vs SNR for single-band CO-OFDM system for Heterodyne Detection
CO-OFDM Experiments 119
Figure 5.14: Real-Time FPGA Transmitter Block Diagram. PLL - Phase Locked Loop,
SFP+ - Enhanced Small Form-factor Pluggable, SMF Cable - Single Mode Fiber Cable,
I/F - Interface, CDR I/F - Clock Data Recovery Interface, DAC I/F - Digital-to-Analog
Converter Interface.
synchronization and fractional CFO estimation. Training symbol TS2 is used for integer
CFO estimation, least squares channel estimation. Channel and Phase tracking is done
using least mean squares (LMS) algorithm and common phase error (CPE) algorithm re-
spectively. Frame contains 47 data symbols and thus totally 49 symbols in each frame.
Table 5.2 shows the parameters used for TS1 , TS2 and DS1,2,..,47 .
Parameter Value
Sampling Frequency (FDAC , FADC ) 720 M Hz
FPGA Clock Frequency (Fclk ) 180 M Hz
Parallel Factor Used (R) 4
Cylic Prefix size (Ncyp ) 8
IFFT/FFT size (N ) 256
Symbol size (Nsym ) 264
Training Symbols in each frame 2
Data Symbols in each frame 47
OFDM Symbols in each frame 49
Mapping scheme used (M ) QPSK
Sub-carrier spacing (FDAC /N ) 2.8125 M Hz
OFDM symbol duration (Tsym ) 366.6 ns
Training Symbol TS1 Minn-Bhargava Type
TS1 Pattern used [1 1 − 1 1]
Number of Pilots per OFDM Symbol (Np ) 8
Position of Pilots [31 63 95 127 159 191 223 255]
DAC Voltage range 1 Vp−p
ADC Voltage range 0.5 V p − p
interface which connects Altera FPGA board and Xilinx FPGA board using single mode
fiber (SMF) cable.
The Xilinx Virtex 7 development board (VC707) uses clock data recovery circuit to
recover clock from the received data. The clock extracted is 180 M Hz clock, which is fed
into PLL (SILABS SI5338) to generate the 50 M Hz reference clock for PLL (HITTITE
HMC833LP6GE). The PLL (HITTITE HMC833LP6GE) generates twice the sampling fre-
quency of 1440 M Hz and gives it to a clock divider (HMC394LP4 programmable divider)
circuit. Xilinx FPGA board provides four samples per cycle at 180 M Hz to DAC sampling
at 720 M Hz. The output range of DAC (FMC204) is 1 V p − p and resolution is 10-bits.
The output range used by the platform is 0.4 V p − p. The real and imaginary outputs
are then fed into two 3 dB attenuators to reduce the peak-to-peak voltage. This reduction
is done to make the voltage to fit in the range of ADC (FMC126), which has an input
range of 0.25 V p − p. The output of attenuator is fed into ADC, which is equivalent to a
electrical back-to-back (B2B) experiment.
Receiver Platform - Figure 5.16 shows the receiver platform both in case of syn-
chronized sampling and asynchronized sampling by the use of switch. In case of synchro-
nized sampling, the sampling frequency for the ADC will be given by the PLL (HITTITE
HMC833LP6GE) of the transmitter chain. It will give a 1440 M Hz clock to ADC clock,
which has an internal divide by 2 circuit. This mode is indicated by switch value of "0".
In asynchronous sampling mode (switch value of "1"), the sampling clock for ADC is gen-
erated by another PLL (HITTITE HMC833LP6GE).
CO-OFDM Experiments 121
Figure 5.16: Real-Time FPGA Receiver Block Diagram. ADC - Analog-to-Digital Con-
verter, I/F - Interface, SFP+ I/F - Enhanced Small Form-factor Pluggable Interface, SMF
Cable - Single Mode Fiber Cable, CDR I/F - Clock Data Recovery Interface, PLL - Phase
Locked Loop.
The incoming real and imaginary data are sampled at 720 M Hz and given to Xilinx
FPGA Development Board (VC707). The clock for Xilinx FPGA is given by the clock
output of ADC, which is 180 M Hz. The ADC has a resolution of 10 bits. It gives four
parallel samples every cycle to Xilinx FPGA due to FPGA clock frequency being one-
fourth that of ADC sampling frequency. The Xilinx FPGA then transfers four parallel
real and imaginary samples to Altera FPGA Board by using SFP+ interface on SMF
cable. On the Altera FPGA board side, the clock data recovery (CDR) circuit recovers the
clock of 180 M Hz and it is used as the clock for OFDM Receiver. The OFDM receiver
consists of the blocks in the following order: Time synchronization block, fractional CFO
estimation block, CFO compensation block, FFT block, integer CFO estimation block,
Channel estimation and compensation block, CPE estimation block and demapper block.
Altera SignalTap block is connected to sample the outputs at various points of the receiver
to verify the correctness of the operation of the block.
Time Synchronization block uses the 4-Parallel block-parallel full-streaming architec-
ture proposed in this thesis. It is slightly modified to take care of the different sign pattern
used in 100GFLEX TS1 . After detection of starting point of the frame, it gives 4-parallel
samples to CFO compensation block along with estimate of fractional CFO estimate. The
output of CFO compensation block is given to FFT. Here, four separate FFTs are present
which work in a round robin fashion to process the input samples. The FFT uses radix-22
algorithm and single delay feedback (SDF) architecture and produces single output per
122 CO-OFDM Experiments
Figure 5.17: Snapshot of Real-Time FPGA Transceiver Platform. The topmost rack
shows the power supply for the configuration, the second rack is the EKINOPS Altera
FPGA Digital Transceiver, the third rack shows the Xilinx Virtex-7 FPGA interfaced
to DAC board, the bottom most rack shows the other Xilinx Virtex-7 FPGA interfaced
to ADC. The yellow cables are single mode fiber (SMF) cables to connect using SFP+
interface.
CO-OFDM Experiments 123
cycle. The output of FFT is then given to integer CFO estimation block, which uses cross-
correlation with known TS2 portion to estimate integer CFO. Then TS2 symbol is passed
through Least Squares equalizer to produce an initial channel estimate. During data sym-
bol compensation, it uses LMS equalizer to update the channel coefficients. Finally, CPE
is estimated using the pilots embedded in the OFDM symbol and compensated. Here, the
architecture is not end-to-end parallel like the one proposed in Chapter 4. A snapshot of
fully connected Real-time Transceiver platform is shown in Figure 5.17.
In the Electrical B2B experiment, with no presence of CFO, the only unknown to
estimate to demodulate OFDM signal is start of frame. The objective is to validate the
time synchronization algorithm. Proposed block-parallel architecture for Minn-Bhargava
algorithm was used for estimation of the starting point. The Verilog HDL generated using
CatapultC was integrated into the setup and performance of the system was examined.
To make the performance easier to observe, all the data symbols were coded the same and
hence validation of the correct starting point could be done easily by observing SignalTap
outputs after synchronization block. Figure 5.18 shows the output captured after timing
synchronization block. The bigger zero pulses and corresponding zeros indicate TS1 which
is not passed through the FFT and small zero pulses indicate cyclic prefix removed. The
gap between two large zeros indicate the time between two full frames. It can be observed
that width of zero pulses remains constant which is indication of correct synchronization
estimation. Figure 5.19 shows the zoomed-in version, where values of four parallel inputs
124 CO-OFDM Experiments
to FFT are indicated. The values in the four lanes has to be identical, since all data
symbols have the same values. It can be again observed that the four data symbols have
the same value and it is the correct starting point of the symbol. Thus, proposed syn-
chronization algorithm is validated using real-time FPGA platform experiment. The data
captured through SignalTap is then analyzed to see whether synchronization performance
remains the same over large number of frames. Similar results were observed even with
asynchronous sampling setup. Hence, the real-time platform setup and synchronization
algorithm were validated by observing the outputs after synchronization step. BER of the
system was calculated using the captured data. The data was captured repeatedly and
on every acquisition five OFDM frames were captured. BER was averaged over multiple
acquisition and found to be near zero.
• Homodyne Optical B2B experiment - The AWG and DSO in the offline configura-
tion (Figure 5.9) will be replaced by FPGA transmitter (Figure 5.14) and FPGA re-
ceiver (Figure 5.16) respectively. BER is calculated using data acquired from Signal-
Tap at different values of OSNR.
• Heterodyne Optical B2B experiment - Again AWG and DSO in the offline Hetero-
dyne configuration (Figure 5.12 will be replaced by FPGA transmitter and receiver.
BER calculation is done at various values of OSNR.
CO-OFDM Experiments 125
5.10 Conclusions
In this chapter, optical experiments were done to validate the frame structure and algo-
rithms adopted for demodulating CO-OFDM signal. The experiments with real optical
equipments prove the validity of the frame structure and the algorithms used. Value of
frame size obtained in heterodyne configuration was used in real-time FPGA prototype
platform to validate the hardware implementation of timing synchronization algorithm.
Both synchronous sampling and asynchronous sampling yielded similar results. It also
showed the architecture’s suitability for real-time processing of optical OFDM signals.
Further experiments to be performed with real-time FPGA platform is detailed and also
experiments using dual-polarization CO-OFDM system to estimate and compensate for
polarization effect.
Chapter 6
6.1 Overview
In this thesis, low-complexity algorithms and parallel architectures were explored for effi-
cient realization of the digital signal processing (DSP) blocks of a CO-OFDM transceiver.
To achieve the total data rate of 100 Gb/s using present day data converters (DAC and
ADC) bandwidth, multi-band CO-OFDM (MB-CO-OFDM) is adopted. MB-CO-OFDM
divides total bandwidth of 50 GHz into smaller sub-bands and thus bandwidth require-
ment of DAC/ADC is reduced significantly. Hence, the total MB-CO-OFDM architecture
consists of identical transceiver chains which transmit/decode the data in both polariza-
tions of every sub-band. The major idea is that, since identical DSP architectures are used
in each polarization of every sub-band, gains obtained due to resource optimization will
be multi-fold. Hence, exploration of low-complexity algorithms and parallel architectures
was done for single-polarization, single-band CO-OFDM transceiver. The only block which
changes from single-polarization and dual-polarization is the channel estimation block in
the receiver, with rest of the DSP blocks replicated. Also, realization of architecture on
FPGA platforms makes it necessary to have parallel architecture, since FPGA can reach
a maximum of few hundreds of MHz, while DAC/ADC interfaced to it will be in range
of GHz. Hence, scalable parallel architectures are required for every DSP block in the
CO-OFDM signal processing chain to avoid costly replication to match the input sam-
pling rate. With these set of requirements, the major contributions of this thesis are listed
below.
126
Conclusions 127
• The proposed block-parallel architecture is then modified to support the proposed hi-
erarchical synchronization algorithm and parallelization obtained for auto-correlation
and cross-correlation algorithms is reported.
• The algorithms and the frame structure adopted are validated by experiments per-
formed in the optical laboratory. Offline experiments using Matlab transmitter/receiver
are performed to validate the algorithm performance in homodyne and heterodyne
coherent detection configurations. From the heterodyne configuration, frame size
suitable for use with our optical setup was found. This value of frame size was
used in the development of the real-time FPGA platform. The validation of tim-
ing synchronization algorithm was done using real-time experiment in a electrical
back-to-back (B2B) configuration.
128 Conclusions
single-band. Presently, the maximum size of FFT/IFFT is limited by the LASER phase
noise variation to values less than or equal to 256. Digital CPE estimation algorithms
assume that phase noise is constant across a single OFDM symbol, which breaks in case of
larger OFDM symbol. RF-Pilot phase noise estimation scheme [57] has been proposed to
overcome this. But the method is computationally very complex. A phase noise estimation
scheme [58] which can use both RF-based pilot scheme and CPE method to handle large
OFDM symbol size and still be computationally efficient would enable very high rates.
The next option is when higher resolution DAC/ADC signal converters become available,
it will be possible to support higher constellations in sub-carriers namely 16-QAM and
64-QAM.
Publications
Journal Publications
1. P. Udupa, O. Sentieys and L. Bramerie,"A Scalable Parallel Architecture for Coarse
Time Synchronization for Coherent Optical-OFDM Systems," submitted to IEEE
Transaction Briefs on Very Large Scale Integration (VLSI) Systems, 2014.
130
Bibliography
[3] K. Roberts, D. Beckett, D. Boertjes, J. Berthold, and C. Laperle, “100G and Beyond
with Digital Coherent Signal Processing,” IEEE Communications Magazine, vol. 48,
no. 7, pp. 62–69, July 2010.
[5] Garrido, M. and Grajal, J. and Sanchez, M.A. and Gustafsson, O., “Pipelined Radix-
2k Feedforward FFT Architectures,” IEEE Transactions on Very Large Scale Inte-
gration (VLSI) Systems, vol. 21, no. 1, pp. 23–32, January 2013.
[6] J. D’Ambrosia, “100 Gigabit Ethernet and Beyond,” IEEE Communications Maga-
zine, vol. 48, no. 3, pp. S6–S13, March 2010.
[7] E. Ip, A. Lau, D. Barros, and J. Kahn, “Coherent Detection in Optical Fiber Systems,”
Optics Express, vol. 16, no. 2, pp. 753–791, 2008.
[8] S. Jansen, “Optical OFDM, a hype or is it for real?” in Optical Communication, 2008.
ECOC 2008. 34th European Conference on, sept. 2008, p. 1.
[9] M. Taylor, “Coherent Detection for Fiber Optic Communications using Real Time
Digital Signal Processing,” in Optical Fiber Communication and the National Fiber
Optic Engineers Conference, 2007. OFC/NFOEC 2007. Conference on, 2007, pp. 1–3.
[10] Y. Liu and P. Fan, “Modified Chu sequences with smaller alphabet size,” Electronics
Letters, vol. 40, no. 10, pp. 598–599, May 2004.
[11] P. Udupa, O. Sentieys and P.Scalart, “A Novel Hierarchical Low Complexity Synchro-
nization Method for OFDM Systems,” in IEEE 77th Vehicular Technology Conference,
June 2013, pp. 1–5.
131
132 Bibliography
[14] N. Kaneda, Q. Yang, X. Liu, S. Chandrasekhar, W. Shieh, and Y.-K. Chen, “Real-
Time 2.5 GS/s Coherent Optical Receiver for 53.3-Gb/s Sub-Banded OFDM,” Journal
of Lightwave Technology, vol. 28, no. 4, pp. 494–501, February 2010.
[15] X. Yi, W. Shieh, and Y. Tang, “Phase estimation for coherent optical ofdm,” Photonics
Technology Letters, IEEE, vol. 19, no. 12, pp. 919 –921, june15, 2007.
[16] S. Jansen, B. Spinnler, I. Morita, S. Randel, and H. Tanaka, “100GbE: QPSK versus
OFDM,” Optical Fiber Technology, vol. 15, no. 5-6, pp. 407–413, 2009.
[18] W. Shieh, X. Yi, Y. Ma, and Q. Yang, “Coherent Optical OFDM: has its time
come?[Invited],” Journal of Optical Networking, vol. 7, no. 3, pp. 234–255, 2008.
[19] J. Bingham, “Multicarrier Modulation for Data Transmission: An Idea Whose Time
Has Come,” IEEE Communications Magazine, vol. 28, no. 5, pp. 5–14, May 1990.
[20] Melle, S. and Jaeger, J. and Perkins, D. and Vusirikala, V., “Market Drivers and
Implementation Options for 100-GBE Transport over the WAN,” IEEE Communica-
tions Magazine, vol. 45, no. 11, pp. 18–24, 2007.
[22] W. Shieh, H. Bao, and Y. Tang, “Coherent Optical OFDM: Theory and Design,”
Optics Express, vol. 16, no. 2, pp. 841–859, 2008.
[23] B.-J. Choi, E.-L. Kuan, and L. Hanzo, “Crest-Factor Study of MC-CDMA and
OFDM,” in IEEE VTS 50th Vehicular Technology Conference, vol. 1, September 1999
- Fall, pp. 233–237.
[24] P. Liu and Y. Bar-Ness, “Closed-Form Expressions for BER Performance in OFDM
Systems with Phase Noise,” in IEEE International Conference on Communications,
vol. 12, june 2006, pp. 5366–5370.
[25] L. Tomba, “On the Effect of Wiener Phase Noise in OFDM Systems,” IEEE Trans-
actions on Communications, vol. 46, no. 5, pp. 580–583, May 1998.
Bibliography 133
[27] Sanchez, M.A. and Garrido, M. and Lopez-Vallejo, M. and Grajal, J., “Implementing
FFT-based Digital Channelized Receivers on FPGA Platforms,” IEEE Transactions
on Aerospace and Electronic Systems, vol. 44, no. 4, pp. 1567–1585, 2008.
[28] Johnston, J. A., “Parallel Pipeline Fast Fourier Transformer,” IEE Proceedings for
Communications, Radar and Signal Processing, vol. 130, no. 6, pp. 564–572, 1983.
[31] Inan, B. and Adhikari, S. and Karakaya, O. and Kainzmaier, P. and Mocker, M. and
von Kirchbauer, H. and Hanik, N. and Jansen, S.L., “Realization of a real-time 93.8-
Gb/s polarization-multiplexed OFDM transmitter with 1024-point IFFT,” in Optical
Communication (ECOC), 2011 37th European Conference and Exhibition on, 2011,
pp. 1–3.
[33] T. Schmidl and D. Cox, “Robust Frequency and Timing Synchronization for OFDM,”
IEEE Transactions on Communications, vol. 45, no. 12, pp. 1613–1621, December
1997.
[34] H. Minn, V. Bhargava, and K. Letaief, “A Robust Timing and Frequency Synchroniza-
tion for OFDM Systems,” IEEE Transactions on Wireless Communications, vol. 2,
no. 4, pp. 822–839, July 2003.
[35] K. Shi and E. Serpedin, “Coarse Frame and Carrier Synchronization of OFDM Sys-
tems: A New Metric and Comparison,” IEEE Transactions on Wireless Communica-
tions, vol. 3, no. 4, pp. 1271–1284, July 2004.
[36] B. Park, H. Cheon, C. Kang, and D. Hong, “A Novel Timing Estimation Method
for OFDM Systems,” IEEE Communications Letters, vol. 7, no. 5, pp. 239–241, May
2003.
134 Bibliography
[37] S. D. Choi, J. M. Choi, and J. H. Lee, “An Initial Timing Offset Estimation Method
for OFDM Systems in Rayleigh Fading Channel,” in IEEE 64th Vehicular Technology
Conference, September 2006, pp. 1–5.
[38] E. Zhou, X. Hou, Z. Zhang, and H. Kayama, “A Preamble Structure and Synchroniza-
tion Method Based on Central-Symmetric Sequence for OFDM,” in IEEE Vehicular
Technology Conference, 2008, pp. 1478–1482.
[39] Yun Hee Kim and Young-Kwon Hahm and Hye Jung Jung and Iickho Song, “An
Efficient Frequency Offset Estimator for Timing and Frequency Synchronization in
OFDM Systems,” in IEEE Pacific Rim Conference on Communications, Computers
and Signal Processing, 1999, pp. 580–583.
[40] Chiueh, Tzi-Dar and Tsai, Pei-Yun and Lai, I-Wei, Baseband Receiver Design for
Wireless MIMO-OFDM Communications. John Wiley and Sons Singapore Pte.
Ltd., 2007, ch. 7, pp. 167–208.
[41] M. Morelli, C.-C. Kuo, and M.-O. Pun, “Synchronization Techniques for Orthogonal
Frequency Division Multiple Access (OFDMA): A Tutorial Review,” Proceedings of
the IEEE, vol. 95, no. 7, pp. 1394–1427, July 2007.
[43] T. Pollet, M. Van Bladel, and M. Moeneclaey, “BER Sensitivity of OFDM Systems
to Carrier Frequency Offset and Wiener Phase Noise,” IEEE Transactions on Com-
munications, vol. 43, no. 234, pp. 191–193, 1995.
[44] J. van de Beek, M. Sandell, and P. Borjesson, “ML estimation of time and frequency
offset in ofdm systems,” IEEE Transactions on Signal Processing, vol. 45, no. 7, pp.
1800–1805, july 1997.
[45] M. Morelli and U. Mengali, “An Improved Frequency Offset Estimator for OFDM
Applications,” in Communication Theory Mini-Conference, June 1999, pp. 106–109.
[48] P. Serena, M. Bertolini and A. Vannucci. (2009) Optilux Toolbox. [Online]. Available:
http://www.optilux.sourceforge.net
Bibliography 135
[49] P. Udupa, O. Sentieys and P.Scalart, “A Block-Parallel Architecture for Initial and
Fine Synchronization in OFDM Systems,” in IEEE International Conference on Com-
munications (ICC), 2013, pp. 4761–4765.
[51] Sander L. Jansen and Itsuro Morita and Noriyuki Takeda and Hideaki Tanaka, “20-
Gb/s OFDM Transmission over 4,160-km SSMF Enabled by RF-Pilot Tone Phase
Noise Compensation,” in Optical Fiber Communication Conference and Exposition
and The National Fiber Optic Engineers Conference. Optical Society of America,
2007.
[52] W. Shieh, Q. Yang, and Y. Ma, “107 Gb/s Coherent Optical OFDM Transmission over
1000-km SSMF Fiber using Orthogonal Band Multiplexing,” Optics express, vol. 16,
no. 9, pp. 6378–6386, 2008.
[54] Shousheng He and Torkelson, M., “A New Approach to Pipeline FFT Processor,” in
The 10th International Parallel Processing Symposium, 1996, pp. 766–770.
[55] P. Udupa, O. Sentieys and L.Bramerie, “Design and Implementation of DSP algo-
rithms for 100Gbps Optical OFDM System,” in XXIV Colloque GRETSI, September
2013.
[56] Lee, D. and Kyungwhoon Cheun, “A New Symbol Timing Recovery Algorithm for
OFDM Systems,” IEEE Transactions on Consumer Electronics, vol. 43, no. 3, pp.
767–775, August 1997.
[58] S.Hussin, K.Puntsri and R.Noe, “Improvement of RF-Pilot Phase Noise Compensa-
tion for Coherent Optical OFDM Systems via CPE Equalizer,” 2013.
Résumé en français : Optique Cohérente-OFDM (CO-OFDM) a été pro-
posée comme un candidat viable pour 100 Gigabit Ethernet (100 GbE) nœud.
CO-OFDM que tout traitement à l’aide de signaux numériques de traitement (DSP)
algorithmes pour estimer et compenser tous les non-idéalités de canal et opto-
électronique les systèmes d’extrémité avant. Dans cette thèse, les algorithmes
de faible complexité, les architectures parallèles évolutives pour grands blocs de
calcul complexe de CO-OFDM émetteur-récepteur sont explorées. Un temps
faible complexité synchronisation est proposé qui donne de meilleures perfor-
mances que algorithmes d’auto-corrélation de canal optique. Une architecture
parallèle évolutive est proposé pour l’algorithme qui peut prendre en charge
plusieurs échantillons parallèles et réduit l’utilisation des ressources de l’ordre
de 70% par rapport à la proposition précédente. Un parallèle bout-à-bout CO-
OFDM l’architecture d’émetteur-récepteur est proposé qui intègre parallèlement
à radix-22 bloc IFFT/FFT, ce qui réduit considérablement la complexité de cal-
cul par rapport à radix-2 architecture et canal bloc d’estimation qui utilise la
représentation des données optimisations pour supprimer multiplicateurs, en-
traînant des gains de 24% de la région. Enfin, les algorithmes et architectures
ont été validés par des expériences hors ligne/Matlab et FPGA en temps réel la
plate-forme expériences, respectivement.