Interference Management for Tomorrow's Wireless Networks - Newcom# Summer School Sophia-Antipolis, 31st May 2013

# Interference mitigation in HetNet systems

#### From theory to practice

#### **Oriol Font-Bach**

(Researcher)



Centre Tecnològic de Telecomunicacions de Catalunya

# Outline

- 1. Introduction
- 2. Motivation
- 3. Digital design tendencies
- 4. Choosing a digital design flow for processing demanding systems
- 5. Development flow and prototyping platform
- 6. Interference management in HetNets
- 7. RTL design
- 8. Validation and results using the GEDOMIS® testbed



# 1. Introduction

**Newcom# Summer School** 



# **Opportunistic spectrum reuse**

- Evolution of wireless communication systems needs to address many issues
  - Congested RF spectrum → <u>opportunistic reuse</u>: i.e., objective of CR
    - Problem: in-band interference → secondary communication degrades QoS perceived by primary users
    - Interference management solutions are required!
  - Combined with high performance and demanding operating conditions 

     advanced PHY-layer schemes (e.g., MIMO-OFDM(A), closed-loop, wide bandwidth).



## Spectrum-reuse example: femtocells



## **Interference management**

- Inter-Cell Interference Coordination (ICIC) schemes for HetNet (e.g., Macro/Femto)
  - Interference avoidance
    - E.g., spectrum sensing → allocate unused bands
  - Interference mitigation
    - E.g., interference-detection & adaptation of opportunistic transmission
      - required to enable frequency reuse → same band is used by different users among adjacent (heterogeneous) cells



# 2. Motivation

**Newcom# Summer School** 



# What motivates this tutorial?

• Two main factors:

#### 1) Practical need for real-time implementations

- a. PHY-layer of a BWA system (e.g., LTE) featuring an interference management scheme
- b. Utilization of a heterogeneous prototyping platform (e.g., FPGA-based, using COTS RF + channel emulation)

# 2) Need to employ innovating digital design techniques to fulfill 1)

- a. Efficient utilization of baseband processor capacity
- b. Address implementation challenges posed by real-time DSP, wide bandwidth & complex baseband algorithmic



# Why real-time PHY prototyping? (I)

I can rapidly model my algorithm/system with floating point code in a computer-based simulation and simulate as many scenarios as I wish... If I'm struggling with extremely long simulation times, I can always make system-wide simplifications

Why do I have to bother about implementation?

No worries for hardware specifications, implementation cost, undesired operating or signal conditions ...

Newcom# Summer School

**Oriol Font-Bach** 

9

# Why real-time PHY prototyping? (II)

- **Objective** = realistic validation of a (highperformance) Macro/Femto interferencemanagement scheme
- What affects the performance of the PHY-layer scheme under analysis?
  - Low-level HW-specifications, limitations & impairments introduced to the signal
  - Realistic signal propagation conditions (including mobile channels, noise and interference)
  - Capacity of the target processing solution



# **Validation options**

- Each has different objectives and capacities:
  - 1. High-level modelling & computer simulation
  - 2. Off-line prototyping
  - 3. <u>Real-time prototyping</u>



# **HLPL-based PHY-layer modelling**

- Natural starting point of DSP research (e.g., MATLAB)
  - Pros: flexible, low cost (time and money), rapid evaluation of innovative techniques
    - <u>Cons:</u> limited capacity to...
      - deal with computational intensive data processing (e.g., wide bandwidth, MIMO)
      - model or account for realistic signal conditions and hardwareintroduced impairments (e.g., mobile channel)
      - reproduce dynamic behavior at run-time (e.g., closed-loop)

Common assumptions and simplifications: idealized channel conditions, perfect sycnhronization, perfect CSI at Tx, ignores implementation cost, unlimited numerical precision... → IMPACTS PERFORMANCE ASSESSMENT





# **Off-line PHY-layer prototyping (II)**

- Combination of HLPL-based PHY-layer with COTS RF + over-the-air/channel emulation
  - <u>Pros:</u> keeps flexibility and low-development cost of software-based PHY modelling, considers realistic signal conditions (including HW-introduced impairments)
  - <u>Cons</u>: equipment cost (& stability of setup), still features limited capacity to...
    - deal with computational intensive data processing
    - reproduce dynamic behavior at run-time

An improved step towards realistic validation, but it is still common for modelled PHY to feature assumptions and simplifications: perfect CSI at Tx, ignore implementation cost... → IMPACTS PERFORMANCE ASSESSMENT



# **Real-time PHY-layer prototyping**

- COTS RF + over-the-air/channel emulation + real-time DSP implementation (e.g., FPGA)
  - <u>Pros:</u> enables bit-intensive (adaptive) DSP, allows realistic validation by considering: close to real-life operating and signal conditions & HW limitations and implementation cost.
  - <u>Cons</u>: development bounded by...
    - long cycle → design, implementation and verification requires a lot of effort (time!)
    - elevated hardware cost
    - HW-specifications (finite resources & dynamic range)

Limits range of scenarios and PHY-layer schemes that can be considered (e.g., number of users, number of antennas...) → proof-of-concept



# Why is it required innovative digital design? (I)

Ok, let's consider the real-time implementation of the proposed PHYlayer schemes...

What does it make their development so demanding?



# Why is it required innovative digital design? (II)

- Design complexity increases because of:
  - Bandwidth
    - 4G BWA → up to 100 MHz!
  - Number of antennas
  - Real-time operation
    - Requiring parallelism + large storage capacity 

       complicated controlplane
  - Run-time adaptivity
    - Feedback generation, transmission, reception and reconfiguration of the PHY-layer
  - Realistic signal impairments
    - DFE, channel estimation...
  - Depending on the application, due to the required intelligentutilization of the provided FPGA-resources



# Why is it required innovative digital design? (III)

**Conclusion:** the design complexity motivates the inclusion of critical novelties, which are not directly related to the proposed DSP algorithmic, but to its actual implementation in a dedicated processing architecture

... plus modern FPGAs are offering unprecedented processing capacity

But nowadays there are plenty of vendor-provided tools to convert my HLPLbased model to a fully working FPGA implementation, right?

Newcom# Summer School



# 3. Digital design tendencies

**Newcom# Summer School** 



# **General overview**

- Focusing on FPGA-based developments
- Main HDL design approaches
  - Automated HDL generation
    - HLPL-to-RTL
    - Schematic-entry to HDL
  - Custom HDL with 3rd-party IPs
  - Full-custom HDL (i.e., gate-level design)



# HLPL-to-RTL (I)

Growing (EDA industry) interest in higher level design methodologies

System level tools/design methodologies are being explored.

- Motivation #1 → getting to a broader audience
  - No requirement for HDL or digital design skills
- Motivation #2 -> IP reuse
  - Marketing & commercial tool for FPGA manufacturers
- Motivation #3 
   need for High-Level Design
  - Higher level of abstraction 
     ever-increasing design complexity
  - Reduce design efforts
  - Fast development time
  - Technology independence 
     no need to consider low-level architecture of target FPGA device (?)
  - Ease of HW/SW partitioning



# HLPL-to-RTL (II)

- Multitude of solutions today
  - <u>C-based:</u> SystemC, Simulink Coder, Synphony C
     Compiler, Catapult HLS, Xilinx Vivado...
  - Matlab-based: Mathworks HDL Coder, AccelChip
     DSP, System Generator for DSP...
  - Java-based: Forge, JHDL
  - Python-to-HDL: MyHDL



# Schematic-entry to HDL (I)

#### Case study - Matlab-Simulink + System Generator for DSP

- Model-based design entry
  - Drag n' drop processing blocks + interconnect them
- Provides SW design utilities & precompiled mathematical functions
  - Many signal processing or specialized toolboxes included
- Includes optimized RTL IP libraries -> System Generator for DSP
  - Xilinx offers a limited subset of the Core Generator IP cores
- Computer based simulation + automatic HDL generation
- Allows combination with other HDL coding approaches:
  - <u>HLPL-to-RTL</u>: user can include custom Matlab (M-code blocks)/C code
  - <u>Custom HDL</u>: user can instantiate it using the "black box primitive"
- Offers a hardware-software co-simulation environment -> e.g., HIL



# **Schematic-entry to HDL (II)**

Matlab-Simulink + System Generator for DSP development flow



# Automated HDL (I)

#### That is a great step forward!

### However... extracting all the concurrency from a sequential HLPL description is not an easy problem

**Newcom# Summer School** 



# Automated HDL(II)

#### The downsides...

- HLS are inevitably less efficient (than custom RTL design)
  - Problematic for complex designs requiring an elevated amount of FPGA resources 
     cannot meet the required timing, area, or performance
    - Also... limited access to low-level implementation options of EDA tools
  - <u>E.g.</u>, in C-to-RTL efficiency might be increased by introducing specialized FPGA-constructs (to force the utilization of specific embedded resources) → increases design time & complexity
- Coding limitations 
   → HLPLs may...
  - Permit a certain subset of known commands
  - Require a specific source-code syntax
  - Impose/require certain code optimizations/restrictions
  - Constraint the maximum achievable performance
- Requirements for parallelism 
   high performance computing
  - Makes tougher to code with HLPLs



# Special focus on the Vivado IDE (I)

- Xilinx promotes HW/SW co-design
  - Vivado is centred around high-level design
    - IP re-use + HLS
  - Zynq devices
    - Co-processing architecture
      - FPGA + dual-core ARM processor
    - Flexibility + performance
    - Wider range of end applications & custormers





# Special focus on the Vivado IDE (II)

- <u>Pros:</u> Flexibility complements the traditional parallelism offered by programmable logic
- <u>Cons</u>: HW/SW co-design and use of HLS is not trivial, although specialized SW tools and IP cores are being made avialable
  - E.g: Vivado HLS → specialized C-code including "FPGA-pragmas" and requiring several refinement iterations → development cycle time and design complexity comparable to that of custom HDL code generation

| Metric                | RTL<br>expert | AutoPilot<br>expert | AutoPilot<br>expert | Implementation results for a                                                                     |
|-----------------------|---------------|---------------------|---------------------|--------------------------------------------------------------------------------------------------|
| Dev. time (man-weeks) | 4.5           | 3                   | 5                   | 8x8 MIMO sphere decoder                                                                          |
| LUTs                  | 5,082         | 6,344               | 3,862               | (Note that Xilinx bought the AutoPilot<br>HLS tool from UCLA and incorporated it<br>into Vivado) |
| Registers             | 5,699         | 5,692               | 4,931               |                                                                                                  |
| DSP48 s               | 30            | 46                  | 30                  |                                                                                                  |
| 18 K BRAMs            | 19            | 19                  | 19                  |                                                                                                  |

J. Noguera, S. Neuendorffer, S. V. Haastregt, J. Barba, K. Vissers, and C. Dick, "Implementation of Sphere Decoder for MIMO-OFDM on FPGAs Using High-level Synthesis Tools," *Analog Integrated Circuits and Signal Processing*, vol. 69, no. 3, pp. 119–129, Sep. 2011.

Newcom# Summer School

# **Custom HDL coding (I)**

- Custom HDL is hard to deliver and very costly in time but it will always be necessary...
  - Lack of pre-verified IP cores
  - Dense designs
  - Whenever an optimum HDL implementation is the goal

... even if is only utilized on small portions of the design



# **Custom HDL coding (II)**

- Provides the means to control every important aspect of the design
  - Low-level definition of a dedicated RTL architecture → optimized for performance, minimized resource utilization...
  - Benefits from utilization of 3rd-party IP cores
    - Optimized for target FPGA device: e.g., Xilinx Core Generator (FFT, FIR...)
- Efficient design requires in-depth knowledge of target FPGA architecture & associated EDA tools



# **Custom HDL coding (III)**

#### **Example - Three steps to boost performance**

#### 1. Utilize embedded (dedicated) resources

- DSP slice, block RAM, ISERDES, OSERDES, EMAC, and MGT
- Dedicated hardware block timing is correct by construction
- Not dependent on programmable routing
- Offers as much as 3x the performance of soft implementations

#### 2. Write code for performance

- Use pipeline stages—more bandwidth
- Use synchronous reset—better system control
- Use Finite State Machine (FSM) optimizations
- Use inferable resources (e.g. MUX, Shift Register LUT (SRL), BRAMs, Cascade DSP)
- Think about the levels of logic required for the logic you are building
- Be aware of the inferred circuits & the expected combinatorial complexity

#### 3. Drive your synthesis and Place & Route tools

- Try different synthesis optimization techniques
- Add critical timing constraints in synthesis
- Preserve hierarchy
- Apply full and correct constraints
- Use High effort

#### **Newcom# Summer School**



# **Example of the impact of constraints** in EDA tools

**Example Reed-Solomon design** 



Newcom# Summer School

# Full-custom HDL (I)

- HDL design = top-down methodology
- Code is translated (in various phases) to a lowlevel description of the circuit
  - Very abstract design description yields poor results
  - Detailed description drives the decisions of the translation process



33

# Full-custom HDL (II)

- Gate-level design → force utilization of the instantiated primitives (avoid automatic inference)
- Fully-optimized design → ASIC prototyping
  - Area, performance, consumption...
- Requires full knowledge of low-level architecture of the target
   FPGA Virtex 7 DSP48E1 slice
   CARRYCASCOUT\*



# 4. Choosing a digital design flow for processing demanding systems

**Newcom# Summer School** 



# Different system-design cases require different design solutions!

- 1. Investigate thoroughly your system design requirements
- 2. <u>Select the most appropriate</u> <u>development flow</u>
  - a) HDL coding approach
  - b) IDE solution/target technology
  - c) Validation strategy

Example: combination of custom and automated HDL coding approaches



# How to select an appropriate design methodology?

- Parameters to consider:
  - Target use 
     experimental prototype, product...
  - Scope of application defines fundamental specifications 
     BWA, power-line communications, space, medical...
  - Cost! → budget for HW, SW, PMs...
  - Design objectives 

     performance, low-power, area...
     trade-off?
  - Operation mode defines design constraints & HW complexity 
     real-time, off-line?
  - Technical skills of the team + available HW/SW → mature processing technology, pre-verified IPs...



# Use case: designing a processing demanding PHY-layer scheme (I)

#### Analysis of the presented use case

| Target use                               | Experimental prototype using COTS HW                                                                                           |
|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|
| Application: general scope               | Macro/femto interfernce-mitigation scheme for BWA systems                                                                      |
| Application: low-level scope             | High performance adaptive DSP, baseline standard compliance (3GPP LTE Rel. 9)                                                  |
| Cost                                     | HW/SW budget defined, PM defined                                                                                               |
| Design objectives                        | Performance, Portability, Extendibility                                                                                        |
| Operation mode                           | Real-time                                                                                                                      |
| Existing programming & design expertise  | VHDL, Matlab, C, Java                                                                                                          |
| Existing SW tools and HW equipment       | Prototyping boards, pre-verified IPs, licenses for SW design packages                                                          |
| Capacity for real-life system validation | Signal generation/acquisition HW, system-<br>wide testing & debugging using various<br>equipment, board-level code integration |

## Use case: designing a processing demanding PHY-layer scheme (II)

• Given the previous analysis and the described motivation it has been selected a...

#### CUSTOM HDL CODING APPROACH RELYING ON 3rd-party IP CORES

(more details on the full development flow and target prototyping platform follow)

**Newcom# Summer School** 

**Oriol Font-Bach** 



## 5. Development flow and prototyping platform

**Newcom# Summer School** 

**Oriol Font-Bach** 



#### **Proposed incremental development flow**



•

.

#### Idealized HLPL model

algorithm selection

1.

- implementation cost
- 2. Off-line Tx prototyping
- hardware-validation of Tx
- experimental captures •
- **HLPL-model refinement** 3.
- realistic signal
- **RTL-awareness**
- 4. RTL design (custom HDL)
  - co-simulation (IP config.)
- test DFE in HW 

  back to 3
- 5. **FPGA** implementation
  - platform integration
- 6. **On-lab** validation
  - debugging -> chipscope + equipment
- real-time data captures
- Performance assessment
  - post-processing  $\rightarrow$  metrics

#### **Development challenges**

- Heterogeneous prototyping platform
  - Characterization 

     early identification of performance bottlenecks

  - Hardware-originated signal impairments
- Channel and mobility effects
- FPGA-design partitioning
- Design and implementation software tools 
   recall previous example!



#### The GEDOMIS® testbed (I)



**Newcom# Summer School** 

**Oriol Font-Bach** 



### The GEDOMIS® testbed (II)

- Signal conversion and baseband processing
  - Lyrtech ADP

| VHS-ADC                                                                                                                                                  | VHS-DAC                                                                                                                                                  | SMQUAD-4                                                                                                                                                                                                            | DRC                                                                                                                                |
|----------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| ADCs:                                                                                                                                                    | DACs:                                                                                                                                                    | FPGA devices:                                                                                                                                                                                                       | FPGA device:                                                                                                                       |
| <ul> <li>AD6645 (8x) sampling<br/>rate 105 MSPS, 14-bit<br/>resolution</li> </ul>                                                                        | <ul> <li>DAC5687 (4x) sampling<br/>rate 480 MSPS (14-bit<br/>resolution)</li> </ul>                                                                      | <ul> <li>2 Xilinx Virtex-4<br/>XC4VLX160</li> <li>DSP microprocessors:</li> </ul>                                                                                                                                   | <ul> <li>Xilinx Virtex-4<br/>XC4VSX35</li> <li>Onboard flash</li> </ul>                                                            |
| Control & pre-processing:                                                                                                                                | Control & pre-processing:                                                                                                                                | • 4 TMS320C6416 DSPs                                                                                                                                                                                                | PROM                                                                                                                               |
| <ul> <li>Virtex-4 XC4VLX160<br/>FPGA</li> <li>128-MB SDRAM</li> <li>Off-board I/O:</li> <li>RapidCHANNEL TX &amp; RX,<br/>1 GBps, full-duplex</li> </ul> | <ul> <li>Virtex-4 XC4VLX160<br/>FPGA</li> <li>128-MB SDRAM</li> <li>Off-board I/O:</li> <li>RapidCHANNEL TX &amp; RX,<br/>1 GBps, full-duplex</li> </ul> | <ul> <li>SDRAM memories:</li> <li>128MB per DSP/FPGA</li> <li>Off-board I/O:</li> <li>RapidCHANNEL TX &amp; RX, 1 GBps, full-duplex</li> <li>On-board inter-FPGA bus:</li> <li>LYRIO 1-GBps (1 RX, 1 TX)</li> </ul> | Off-board I/O:<br>RapidCHANNEL TX &<br>RX, 1 GBps, full-<br>duplex<br>On-board inter-FPGA<br>bus:<br>LYRIO 1-GBps (1 RX ,<br>1 TX) |
| Neuroem# Cummer School                                                                                                                                   | Orial Fant Back                                                                                                                                          | OTTOR                                                                                                                                                                                                               |                                                                                                                                    |



#### The GEDOMIS® testbed (III)

- <u>RF section</u>
  - − Upconversion → Agilent E4438C ESG VSG
    - Also off-line prototyping (arbitrary waveform generator)
  - Downconversion → MCS RF 3000T (4 channels)

|                      | Equipment                           | Main specifications                       |  |
|----------------------|-------------------------------------|-------------------------------------------|--|
|                      |                                     | 250 kHz to 6 GHz                          |  |
|                      |                                     | 80 MHz bandwidth                          |  |
| Agilent ESG4438C VSG | Agilent ESG4438C VSG                | +17  dBm  output power                    |  |
|                      |                                     | <-134 dBc phase noise at 20 kHz offset    |  |
|                      |                                     | $\pm 1$ ppm internal reference accuracy   |  |
|                      | MCS Echoteck Series RF 3000T Tuners | 20  MHz to  3  GHz                        |  |
|                      |                                     | 65 MHz bandwidth                          |  |
|                      |                                     | Manual gain control 85 dB                 |  |
|                      |                                     | <-115 dBc phase noise at 10 kHz offset    |  |
|                      |                                     | $\pm 0.5$ ppm internal reference accuracy |  |



#### The GEDOMIS® testbed (IV)

- Provision of realistic signal conditions
  - EB Propsim C8 Channel Emulator
    - Real-time standard/custom channels, up to 4x4 MIMO
  - AI (extremely flat) AWGN generators

Oriol Font-Bach

• E.g., BER vs SNR testing

|  | Equipment                      | Main specifications                                      |  |
|--|--------------------------------|----------------------------------------------------------|--|
|  |                                | 350 MHz to 6 GHz                                         |  |
|  | EB Propsim C8 Channel Emulator | 70 MHz bandwidth                                         |  |
|  |                                | Up to 48 fading paths per channel                        |  |
|  |                                | Propagation delay up to 6.4 ms                           |  |
|  |                                | Mobile speed up to $40,000 \text{ km/h}$                 |  |
|  | AI NS-3 RF Noise Source        | 5 MHz to 2.15 GHz                                        |  |
|  |                                | 30 dB range with 0.1 dB steps                            |  |
|  |                                | -90 dBm/Hz maximum output power                          |  |
|  |                                | $\pm 2.0~\mathrm{dB}$ flatness over full operating range |  |
|  |                                |                                                          |  |

#### The GEDOMIS® testbed (V)

#### Other specialized equipment

Clock generation 
 → Holzworth microwave sources

|          |  | Equipment               | Main specifications                                                                                                                                                                                                               |
|----------|--|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <image/> |  | HSC1001A RF synthesizer | 8 MHz to 1 GHz<br>0.001 Hz resolution<br>-110 to +15 dBm output power range<br><-131 dBc phase noise at 10 kHz offset<br>Internal reference 100 MHz<br>±1 ppb internal reference accuracy                                         |
|          |  | HSM1001A RF synthesizer | 250 kHz to 1 GHz<br>0.001 Hz resolution<br>-70 to +10 dBm output power range<br><-133 dBc phase noise at 10 kHz offset<br>Internal reference 100 Mhz<br>±1 ppb internal reference accuracy<br>External reference input 10/100 MHz |



#### 

|         | Equipment                                | Main specifications                                                                                                                              |
|---------|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
|         | R& S FSQ Signal Analyzer                 | 20 Hz to 26.5 GHz<br>120 MHz bandwidth<br>-173 dBm displayed average noise level<br>235 MSa I/Q memory<br><-133 dBc phase noise at 10 kHz offset |
| Newcorr | Agilent DSO80804B Infiniium Oscilloscope | 4 analog channels<br>10 GHz bandwidth<br>40 GSaPS<br>Noise floor 294 $\mu$ V (5 mV/div)                                                          |

# Signal impairments resulting from the utilization of GEDOMIS®

- High-end RF equpiment:
  - Negligible I/Q phase/gain imbalances
  - CFO can be accuratelly generated
- <u>High precision clock synthesis equipment:</u>
  - It can be safely ignored: effects of inaccuracy between sampling clock at Tx/Rx in respect to exact sampling frequency, LO drifts/instability
  - LO coupling at RF transmitter still needs to be accounted → it is converted to an in-band sinusoid
- Extremely flat AWGN generator:
  - Precise control of noise level
- The chassis of the ADP introduces a DC signal:
  - Out-of-band signal which needs to be filtered in the digital domain
- Channel emulator:
  - Allows the reproduction of standard and custom channels (e.g., mobility conditions, interference)



## 6. Interference management in HetNets

**Oriol Font-Bach** 



#### **Scenario definition**

Opportunistic femto communication

Macro UL Signal

Macro DL Signal

- Same frequency-band
- Same DL signal BW

**Dedicated feedback link** 

Macro BS Newcom# Summer School

**Oriol Font-Bach** 



Macro UE

**Femto DL** Signal

Femto BS

Femto UE

#### **System specifications**

| Parameter                      | Value                                 |
|--------------------------------|---------------------------------------|
| Wireless telecommunication sta | andard 3GPP LTE (Rel. 9)              |
| Antenna scheme: SISO           | 1x1                                   |
| Channel bandwidth (MHz         | 20                                    |
| Cyclic prefix (samples)        | $512 \ (1/4 \text{ of the symbol})$   |
| Modulation type                | QPSK                                  |
| Duplex mode                    | FDD                                   |
| Active subcarriers per OFDM s  | ymbol 1200                            |
| Null subcarriers per OFDM sy   | rmbol 848                             |
| FFT size                       | 2048                                  |
| OFDM symbols per frame: total  | active 120   83                       |
| Closed-loop transmission sch   | eme Interference-aware PRB allocation |
| ADC sampling frequency (M      | Hz) 61.44                             |
| Baseband sampling frequency (  | (MHz) 30.72                           |
| RF band (GHz)                  | 2.6                                   |
| IF (MHz)                       | 46.08                                 |
| Tested channel model           | ITU Ped. B (up to $3 \text{ km/h}$ )  |
|                                |                                       |



### **3GPP LTE (Rel. 9; FDD)**



**OFDM** symbols are organized in Physical Resource Blocks (PRBs) Nº of PRBS depends on BW → 20 MHz = 100 PRBs Slot = 6 OFDM symbols Subframe = 2 slots Frame = 10 subframes RSs found in one of every 3 **OFDM** symbols 4 predefined values  $\rightarrow \pm \frac{1}{\sqrt{2}} \pm \frac{1}{\sqrt{2}}$ 



#### **Considered SIR values**

 3GPP suburban deployment of LTE femtocells → pathloss modelling of 3 DL signals:

(1) macro BS  $\rightarrow$  macro UE (2) femto BS  $\rightarrow$  femto UE (2) femto BS  $\rightarrow$  femto UE

(3) femto BS  $\rightarrow$  macro UE

• SIR

- ratio between (1) and (3)
- range from 12 to 20 dB

Simulation assumptions and parameters for FDD HeNB RF requirements, 3GPP TSG RAN WG4 R4-092042.



#### **Interference-management algorithm**

- Distributed ICIC algorithm → Victim User Aware Soft Frequency Reuse in macrocell/femtocell HetNets
  - Available BW is divided in N sub-bands
  - Instantaneous channel conditions of macro UE are exploited to adapt femto DL transmission
  - <u>Objective</u>: avoid interfering primary communication , while deactivating least #sub-bands in secondary DL communication

M. Shariat, A. u. Quddus, M. Bennis, Z. Bharucha, M. Lalam, M. Maqbool, S. Mayrargue, C. Kosta, A. De Domenico, E. Calvanese-Strinati, R. Mahapatra, C. H. M. de Lima and S. Uygungelen, "Promising Interference and Radio Management Techniques for Indoor Standalone Femtocells", *Deliverable D3.2, ICT 248523 FP7 Broadband Evolved FEMTO Network (BeFEMTO) Project,* Jun. 2012.



## Scaling the scenario to fit the proofof-concept (I)

- Baseline interference-management algorithm
  - 1 macro BS-UE pair & 1 femto BS-UE pair
  - 20 MHz BW → two 10 MHz bands
    - 4 pre-defined femto PRB allocation cases



## Scaling the scenario to fit the proofof-concept (II)

- PHY-layer specifications
  - Point-to-point DL communication
  - Emulated UL 
     real-time intra-FPGA link
  - Fixed frame format
    - 10 ms radio frame divided in two 5-ms, separated by quasi-quiet periods (i.e., no data, only RSs)
      - facilitate vital DFE processes 

         gain adjustment, CFO correction



### System modelling (I)



## System modelling (II)

Received signal model (considering the utilization of GEDOMIS<sup>®</sup>):

$$\begin{split} c(t) = \Re\{x(t) \cdot e^{j2\pi(f_{IF} + \Delta f)t}\} + \Re\{u(t) \cdot e^{j2\pi(f_{IF} + \Delta f_u)t}\} + A + \\ &+ B \cdot \cos(2\pi(f_{IF} + \Delta f)t + \varphi) + w(t), \end{split}$$

- x(t): useful part of received baseband signal
- u(t): (asynchronous) interference signal
- $-f_{IF}$ : IF (46.08 MHz)/ $\Delta f$ : CFO /  $\Delta f_u$ : CFO interf.
- A: DC level introduced by baseband boards
- $B \cdot cos(2\pi (f_{IF} + \Delta f)t + \phi)$ : unwanted in-band residual carrier  $\rightarrow$  LO coupling
- w(t): Gaussian noise



## Synchronization/interferencedetection techniques (I)

- CP-based synchronization → cross-correlation exploiting the self-similarity of the received OFDM symbols due to CP:
  - 1. Far less complex implementation than technique based on PSS/SSS
  - Cross-correlation values can be opportunistically reused to detect interference → degradation directly related to SIR
  - Design favouring resource-reuse → required for its FPGA implementation!



## Synchronization/interferencedetection techniques (II)

 ITU pedestrian B channel → cross-correlation using a 2048+467 sample-window:

$$|r_s[n]|^2 = \frac{|\sum_{l=0}^{466} s^*[n+l] \cdot s[n+l+2048]|^2}{(\sum_{l=0}^{466} |s[n+l]|^2) \cdot (\sum_{l=0}^{466} |s[n+l+2048]|^2}$$

- Peak of |r<sub>s</sub>[n]|<sup>2</sup> indicates position of CP → location of FFT-window
- Phase of r<sub>s</sub>[n] can be used to estimate the phase shift of the received signal in the presence of CFO



### Synchronization/interferencedetection techniques (III)

 Ideally (i.e., no noise and no interference) → peakamplitude of |r<sub>s</sub>[n]|<sup>2</sup> =1, but...



SIR = 12 dB

... the cross-correlation profile is degraded in the presence of noise and interference.

#### **General DFE architecture**



**Newcom# Summer School** 



#### **Interference-detection algorithm**

 Algorithm applied to each 5-ms frame → decides which band(s) are interfered

#### Algorithm 1

```
if wholeband_detection == 0 then
```

```
decision = no interference;
```

#### else

```
if low_10MHz_band_detection == 1 and high_10MHz_band_detection == 0 then decision = interference detected in the low 10 MHz band;
```

else if low\_10MHz\_band\_detection == 0 and high\_10MHz\_band\_detection == 1 then decision = interference detected in the high 10 MHz band;

#### else

decision = interference detected in the entire bandwidth;

end if

end if



#### How is interference detected?

- Amount of degradation is directly related to power of received interference → presence of interference can be detected by defining a trheshold (i.e., peak-value below threshold = interference)
- Threshold definition aims at fulfilling a KPI:
  - Probability that raw/uncoded BER is below 10<sup>-2</sup>
     <u>must be above 0.8</u> (conditioned on the fact that interference is detected)



## **Thresholds definition (I)**

- Exhaustive MATLAB simulations
- <u>Step 1</u>) all-synthetic signals
- Step 2)

data recorded using GEDOMIS

- 1. load MATLAB-generated BSs' I/Q vectors to VSGs
- 2. configuration of channel emulator
- real-time signal reception & data capturing
- 4. off-line simulation of NewCom#SummerSchool



### **Thresholds definition (II)**

| Threshold of main | Interference over the whole bandwidth |                      | Interference over half of the bandwidth |                                                                                                                |  |
|-------------------|---------------------------------------|----------------------|-----------------------------------------|----------------------------------------------------------------------------------------------------------------|--|
| branch            | Prob. of detection                    | Prob. of false alarm | Prob. of detection                      | Prob. of false alarm                                                                                           |  |
| 0.88              | 0.46                                  | 0.04                 | 0.48                                    | 0.02                                                                                                           |  |
| 0.89              | 0.53                                  | 0.05                 | 0.56                                    | 0.03                                                                                                           |  |
| 0.90              | 0.60                                  | 0.07                 | 0.63                                    | 0.04                                                                                                           |  |
| 0.91              | 0.68                                  | 0.09                 | 0.71                                    | 0.06                                                                                                           |  |
| 0.92              | 0.77                                  | 0.12                 | 0.81                                    | 0.09                                                                                                           |  |
| 0.93              | 0.87                                  | 0.17                 | 0.89                                    | 0.15                                                                                                           |  |
| 0.94              | 0.93                                  | 0.26                 | 0.95                                    | 0.25                                                                                                           |  |
|                   |                                       |                      |                                         | and a second |  |

|                                  | Interference over the whole bandwidth |                      | ndwidth Interference over half of the bandwidth |                      |
|----------------------------------|---------------------------------------|----------------------|-------------------------------------------------|----------------------|
| Threshold of<br>secondary branch | Prob. of detection                    | Prob. of false alarm | Prob. of detection                              | Prob. of false alarm |
| 0.88                             | 0.37                                  | 0.04                 | 0.73                                            | 0.11                 |
| 0.89                             | 0.43                                  | 0.05                 | 0.78                                            | 0.13                 |
| 0.90                             | 0.49                                  | 0.06                 | 0.83                                            | 0.18                 |
| 0.91                             | 0.56                                  | 0.08                 | 0.88                                            | 0.24                 |
| 0.92                             | 0.64                                  | 0.10                 | 0.92                                            | 0.31                 |
| 0.93                             | 0.73                                  | 0.14                 | 0.95                                            | 0.39                 |
| 0.94                             | 0.82                                  | 0.19                 | 0.98                                            | 0.49                 |



## 7. RTL design

**Newcom# Summer School** 

**Oriol Font-Bach** 



#### **Extended DFE**

- The focus is set on the interference-aware DFE of the macro UE
  - It is one of the most complex processing blocks in the PHY-layer of the presented system
  - It has a critical impact on the performance of the whole interference-management scheme



#### AGC and DDC blocks (I)



### AGC and DDC blocks (II)

- DDC (using various Xilinx IP cores):
  - (1) frequency translation
  - (2) I/Q components extraction + decimation
    - MATLAB FDA tool



#### Hardware-efficient filtering stage (I)

- Xilinx FIR filter IP core
  - Direct link to MATLAB FDAtool → 51 18-bit complex-valued symmetric coefficients
  - … but only accepts real-valued coefficients!
  - A single filter requires a large amount of DSP and regular FPGA slices... we would need 4!
- Design exploits fact that the <u>coefficients</u> of the required filters are the <u>complex</u> conjugate of each other:  $h_{low}[n] = h_i[n] + j \cdot h_q[n]$

 $h_{\text{high}}[n] = h_i[n] - j \cdot h_q[n]$ 



#### Hardware-efficient filtering stage (II)

• Resource-sharing pipelined architecture, using two 2-channel



# Joint synchrozation/interferencedetection (I)

RTL-optimized calculation of cross-correlation

 $|r_s[n]|^2 = \frac{|dn[n]|^2}{ds0[n] \cdot ds1[n]} \longrightarrow dn[n+1] = \begin{cases} dn[n] + s^*[n+467] \cdot s[n+2048+467] \\ \text{if } n \le 467 \\ dn[n] - s^*[n] \cdot s[n+2048] \\ + s^*[n+467] \cdot s[n+2048+467] \\ \text{if } n > 467, \end{cases}$ 

Only four samples need to be introduced to the already calculated correlation, each clock cycle
 DSP48-slice savings!



# Joint synchrozation/interferencedetection (II)

- Peak-detection based on triggering threshold
  - Because of RSs, peaks can be also found in the quasi-quiet periods → values of dsO[n]·ds1[n] are used to determine legitimate peaks



## Joint synchrozation/interferencedetection (III)



#### **Centralized control unit**



# 8. Validation and results using the GEDOMIS® testbed

**Newcom# Summer School** 

**Oriol Font-Bach** 



#### **Multi-FPGA implementation**



| Device  | XC4VLX160 | XC4VSX35 | XC4VLX160 | XC4VLX160 | XC4VLX160 |
|---------|-----------|----------|-----------|-----------|-----------|
| Slices  | 67%       | 44%      | 33%       | 36%       | 27%       |
| DSP48s  | 90%       | 23%      | 94%       | 52%       | 78%       |
| RAMB16s | 78%       | 35%      | 62%       | 78%       | 82%       |

### Setup of GEDOMIS®



#### **Visualization of the cross-correlation**



# Visualization of the BER (I)

- Two transmission modes are defined for the Femto BS
  - According to feedback or ignoring it (i.e., whole 20 MHz band transmission)
  - Transmission mode changes every N seconds
- Real-time calculation of macro UE VER
  - Replication of macro BS' PRBS generator



# Visualization of the BER (II)

Interference in the low 10 MHz band, SIR = 12 dB, low mobility pedestiran B channel (i.e., 0.2 km/h)



# Visualization of the BER (III)

Interference in the high 10 MHz band, SIR = 14 dB, mobile pedestrian B channel (i.e., 3 km/h)







#### **Development team**

- Signal processing and algorithmic

   Antonio Pascual (UPC), Miquel Payaró (CTTC)
- High-level modelling and simulations
  - Luís Blanco & Jordi Serra (CTTC), Marc Molina (UPC)
- RTL design and VHDL coding
  - Pepe Rubio & Oriol Font (CTTC)
- Laboratory setup and debugging
  - Nikolaos Bartzoudis & David López (CTTC)



# **Questions?**

#### Oriol Font-Bach oriol.font@cttc.cat



**Newcom# Summer School** 

**MIN** 

CTTC