# **40 Gigabit Ethernet IP Solution**

## Product Brief (HTK-40G-ETH-128-FPGA)



The 40Gbps Ethernet IP solution offers a fully integrated IEEE802.3-2015 compliant package for NIC (Network Interface Card) and Ethernet switching applications. As shown in the figure below, the 40Gbps Ethernet IP includes:

- 40Gbps MAC core
- 40Gbps (40GBase-R) PCS core
- Technology dependent transceiver wrapper for Altera and/or Xilinx FPGAs
- Statistics counter block (for RMON and MIB)
- MDIO and I2C cores for optical module status and control



A complete reference design using a synthesizable L2 (MAC level) packet generator/checker is also included to facilitate quick integration of the Ethernet IP in a user design. A GUI application interacts with the reference design's hardware elements through a UART interface (a PCIe option is also available). A basic Linux PCIe driver/API is also provided for memory mapped read/write access to the internal registers with the PCIe interface option. See **Appendix A** for details.

MAC and PCS cores are designed with 128-bit data path operating at 312.5MHz.

As the transceiver wrapper is included with the Ethernet IP solution, the line side directly connects the 10.3125Gbps FPGA transceivers to the optical module (QSFP+, CFP, etc).

Ethernet IP solution implements two user (application) side interfaces. The register configuration and control port is a 32-bit AXI4-Lite interface. Depending upon the application layer, user can select a 128-bit @ 312.5MHz or 256-bit @ 156.25MHz AXI-4 streaming bus to interface with the MAC block.

40Gbps Ethernet IP supports advanced features like perpriority pause frames (compliant with 802.3bd specifications) to enable Converged Enhanced Ethernet (CEE) applications like data center bridging that employ IEEE 802.1Qbb Priority Flow Control (PFC) to pause traffic based on the priority levels.

#### <u>Features Overview</u>

#### **MAC Core Features**

- Implements the full 802.3ba specification with preamble/SFD generation, frame padding generation, CRC generation and checking on transmit and receive respectively
- Implements 802.3bd specification with ability to generate and recognize PFC pause frames
- Implements reconciliation sublayer functionality with start and terminate control characters alignment, error control character and fault sequence insertion and detection
- Implements a 128-bit XLGMII interface operating at 312.5MHz for 40G EMAC
- Implements Deficit Idle Count (DIC) mechanism to ensure maximum possible throughput at the transmit interface
- Implements logic for padding of frames on the transmit path if the size of frame is less than 64 bytes
- Implements fully automated XON and XOFF Pause Frame (802.3 Annex 31A) generation and termination providing flow control without user application intervention. Non PFC mode only.
- Pause frame generation additionally controllable by user application offering flexible traffic flow control
- Support for VLAN tagged frames according to IEEE 802.1Q
- Support any type of Ethernet Frames such as SNAP / LLC, Ethernet II/DIX or IP traffic
- Discards frames with mismatching destination address on receive (Except Broadcast and Multicast frames)
- Supports programmable promiscuous mode to omit MAC destination address checking on receive EMAC
- Optional multicast address filtering with 64-bit HASH Filtering table providing imperfect filtering to reduce load on higher layers
- CRC-32 generation and checking at high speed using an efficient pipelined CRC calculation algorithm
- Implements logic for optional padding removal on RX path for NIC applications or forwarding of unmodified data to the user interface
- Optional discard of runt frames (less than 64 Byte) at the core's reconciliation sublayer or forwarding of runt frames to the user application interface
- Implements logic for optional forwarding of the CRC field to user application interface



- Implements logic for optional forwarding of received pause frames to the user application interface
- Programmable frame maximum length providing support for any standard or proprietary frame length (e.g. 9K-Bytes Jumbo Frames)
- Status signals available with each Frame on the user interface providing information such as frame length, VLAN frame type indication and error information
- Implements programmable internal XLGMII Loopback
- Implements statistics indicators for frame traffic as well as errors (alignment, CRC, length) and pause frames
- Implements statistics and event signals providing support for 802.3 basic and mandatory managed objects as well as IETF Management Information Database (MIB) package (RFC 2665) and Remote Network Monitoring (RMON) required in SNMP environments
- Implements a streaming user application interface. The application interface is designed for either a 128-bit interface operating @ 312.5MHz or a 256-bit interface operating @ 156.25MHz.

#### **PCS Core Features**

- Implements 40GBase-R PCS core compliant with IEEE 802.3ba Specifications
- Implements a 128-bit XLGMII interface operating at 312.5MHz for 40G Ethernet
- Implements 64b/66b encoding/decoding for transmit and receive PCS
- Implements 40G scrambling/descrambling using 802.3ba specified polynomial 1 + x39 + x58
- Implements Multi-Lane Distribution (MLD) across 4 Virtual Lanes (VLs) for 40Gbps operation
- Implements periodic insertion of Alignment Marker (AM) on the transmit path and deletion on the receive path
- Implements 66-bit block synchronization and Alignment Marker Lock machines as specified in 802.3ba specifications
- Implements skew compensation logic in order to realign all the virtual lanes and reassemble an aggregate 40G stream (with all 64b/66b blocks in the correct order)
- Implements lane reordering to support reception of any virtual lane on any physical lane

- Implements BIP-8 insertion/checking per Virtual Lane on transmit/receive respectively
- Implements Inter Packet Gap (IPG) Insertion/Deletion for Alignment marker and clock compensation while maintaining a minimum of 1 byte IPG
- Implements gear-box logic to convert 66-bit blocks to 40-bit for 40G PCS. The 40-bit operate at the 10.3125Gbps transceiver reference clock.
- Implements programmable internal XLGMII loopback which directs traffic received from core's receive path back to transmit PCS
- Implements Bit Error Rate (BER) monitor for monitoring excessive error ratio. In addition to that implements various status and statistics required by the IEEE 802.3ba such as block synchronization status, AM lock status, lane deskew and lane reordering status and BIP-8 error counters per virtual lane.

#### Licensing and Maintenance

- <u>NO</u> yearly maintenance fees for upgrades and bug fixes
- Basic core licensing for a single vendor (either Xilinx or Altera) compiled (synthesized netlist) binary
- Option for vendor and device family agnostic source code (Verilog) license

#### **Contact and Sales Information**

*Phone:* +1-301-528-2244

Email: info@mantaro.com



### <u>Resource Utilization</u>

The core utilization summary for the 40G Ethernet solution is given in following tables. The utilization numbers are best in class as compared to other available 40G Ethernet cores. The Ethernet solution has been fully verified on different hardware platforms for both Altera and Xilinx FPGAs and has also been verified for interoperability with other 40G capable devices.

| Device                     | User Interface<br>(AXI4-ST) | Priority Flow<br>Control<br>(PFC) | Slice<br>LUTS | Slice<br>Registers | BRAMs             |
|----------------------------|-----------------------------|-----------------------------------|---------------|--------------------|-------------------|
| Ultrascale/<br>Ultrascale+ | 128-Bit                     | No                                | 5,442         | 7,271              | 18K = 3; 36K = 8  |
|                            |                             | Yes                               | 5,656         | 7,859              | 18K = 3; 36K = 8  |
|                            | 256-Bit                     | No                                | 6,214         | 9,143              | 18K = 4; 36K = 12 |
|                            |                             | Yes                               | 6,428         | 9,731              | 18K = 4; 36K = 12 |
| 7-Series                   | 128-Bit                     | No                                | 5,477         | 7,271              | 18K = 3; 36K = 8  |
|                            |                             | Yes                               | 5,783         | 7,859              | 18K = 3; 36K = 8  |
|                            | 256-Bit                     | No                                | 6,249         | 9,143              | 18K = 4; 36K = 12 |
|                            |                             | Yes                               | 6,555         | 9,731              | 18K = 4; 36K = 12 |
| Note:                      |                             |                                   |               |                    |                   |

#### 40G Ethernet - Resource Usage for <u>Xilinx</u> Devices

• These utilization numbers include MAC and PCS register files

Register based RMON statistics block adds additional 1948 LUTs and 1807 registers.

#### 40G Ethernet - Resource Usage for Altera Devices

| Device    | User Interface<br>(AXI4-ST) | Priority Flow<br>Control<br>(PFC) | COMB.<br>ALUTs | Registers | Memory Blocks |  |
|-----------|-----------------------------|-----------------------------------|----------------|-----------|---------------|--|
| Arria10   | 128-Bit                     | No                                | 5,032          | 8,371     | M20K = 10     |  |
|           |                             | Yes                               | 5,143          | 8,966     | M20K = 10     |  |
|           | 256-Bit                     | No                                | 5,151          | 9,970     | M20K = 26     |  |
|           |                             | Yes                               | 5,357          | 10,584    | M20K = 26     |  |
| Stratix-V | 128-Bit                     | No                                | 4,881          | 8,039     | M20K = 10     |  |
|           |                             | Yes                               | 4,980          | 8,679     | M20K = 10     |  |
|           | 256-Bit                     | No                                | 4,980          | 9,708     | M20K = 26     |  |
|           |                             | Yes                               | 5,184          | 10,300    | M20K = 26     |  |
| Note:     |                             |                                   |                |           |               |  |

• These utilization numbers include MAC and PCS register files

• Register based RMON statistics block adds additional 2003 LUTs and 1808 registers.



### <u>Deliverables</u>

- Compiled synthesizable binaries or encrypted RTL for the MAC and PCS cores
- Source code RTL (Verilog) for I2C, MDIO, RMON and Register-File blocks
- Self-checking behavioral models and test benches for simulation
- Constraint files and synthesis scripts for design compilation
- A complete PCIe/UART host interface based reference design with:
  - Top level wrapper (source files, Verilog) for user specific customizations
  - Source files (Verilog) for the PICe application layer
  - Binaries for a basic L2 packet generator and checker
  - PCIe driver/API (source files, C) for Linux
  - o UART and command interpreter blocks with the optional UART host interface
  - GUI application (Linux only for PCIe, Linux and Windows for UART) for interfacing to the reference design
- Design guide(s) and user manuals
- USA based technical support by developers

### Ported/Validated Modules List

- HiTech Global HTG-K800, HTG-K816 and HTG-830; Xilinx Virtex Ultrascale and Kintex Ultrascale FPGAs; Interface through FMC (HTG-FMC-X2QSFP+) and Z-Ray (HTG-ZR-X2QSFP+) QSFP+ modules. (<u>http://hitechglobal.com/Boards/Kintex-UltraScale.htm</u>) (<u>http://www.hitechglobal.com/Boards/Kintex\_UltraScale\_half-size\_PCIe.htm</u>) (<u>http://hitechglobal.com/boards/Virtex-UltraScale\_FPGA.htm</u>)
- 2. *HiTech Global HTG-K700; Xilinx Kintex-7* FPGA; Interface through FMC (HTG-FMC-X2QSFP+) QSFP+ module. (<u>http://www.hitechglobal.com/Boards/Kintex-7 PCIE.htm</u>)
- 3. *Xilinx VC707* (*Xilinx Virtex-7* 485) and *Xilinx KC705* (*Xilinx Kintex-7* 325) evaluation modules; Interface through with FMC (HTG-FMC-X2QSFP+) QSFP+ module.
- 4. HiTech Global HTG-510; Altera Stratix-V FPGA, with integrated QSFP+ and SFP+ interfaces (<u>http://hitechglobal.com/Boards/Stratix-V\_PCIExpress.htm</u>)



## A. Reference Design Details

### A.1 Overview

A 40Gbps reference design is included as part of the IP deliverable to facilitate quick L1 and L2 layer testing and verification of the 40Gbps Ethernet on target platform. The capability to run the L1 PRBS pattern and configure each transceiver independently can be used for a fast module bring-up in the lab and can also be used for factory diagnostics.

The UART (normally through an onboard USB-to-UART converter) based 40Gbps Ethernet reference design can be seamlessly ported to various COTS FPGA networking and evaluation modules (see section for the list of verified modules). A GUI application controls the register read/writes to the FPGA through a UART core with integrated command interpreter. Both Linux and Windows platforms are supported for the UART based interface control.

This reference design can also be used on custom embedded design where the FPGA connects to the host processor via a PCIe interface. For the PCIe control interface, GUI application is hosted on a Linux platform (as PCIe driver/API is provided for Linux only).

### A.2 Functional Description

Following figure shows the connectivity and the elements of the 40Gbps Ethernet IP reference design. A Linux host (embedded or standard PC) running a GUI application is used to configure and control the 40G Ethernet. I2C, MDIO and GPIO interfaces included in the reference design can be used to control any optical module on the target platform including the QSFP+ (I2C), 300Pin MSA (I2C) and CFP (dual/single 40G, MDIO) MSA compliant modules.



For L1 (physical layer verification and testing) GUI application provides an interface to independently control and configure all 4 10.3125Gbps transceivers used for 40G Ethernet transport. User can configure the transceivers to run various PRBS pattern and configure various transceivers parameters like transmit voltage, transmit pre-emphasis, receive equalization and receive gain.



For L2 testing, GUI application uses the 40Gbps packet generator/checker inside the FPGA to generate and check MAC frames up to full line rate. The packet generator supports a basic rate control mechanism to control the packet/data rate on the interface. The generator can be configured for fixed size as well as pseudo random packet size packet transmission. An incrementing counter is used as payload for the MAC frames. The checker on the receive side verifies the payload of receive MAC frames and reports error in the payload.

A comprehensive set of transmit and receive counters in the MAC core provide a detailed view of the packet statistics including various error types.

Following is a snapshot for the GUI application for the L2 packet test results screen.

| 🚯 ETHERNET DEBUG API                                                                                                                                                                               | PLICATION                                                                                                                                                                                                                                     |                                                           |            |                                      |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|------------|--------------------------------------|
| Serial Port Setup                                                                                                                                                                                  | Execute                                                                                                                                                                                                                                       | Tcl Script                                                | Core       | Revisions                            |
| Port ID COM8                                                                                                                                                                                       | ✓ Use UART Clear Buffer ethernet                                                                                                                                                                                                              | t_test.tcl                                                | Browse FPG | A-A • MAC • XX_XX_XX_XX              |
| Baud Rate 115200                                                                                                                                                                                   | Open Close STATUS                                                                                                                                                                                                                             | : Ready                                                   | Execute    | All Get Revision                     |
| REGISTER<br>READ/WRITE<br>MDIO/I2C<br>READ/WRITE<br>XCVR DRP<br>READ/WRITE<br>MAC/PCS<br>CONTROL<br>REFDESIGN<br>CONTROL<br>MAC/PCS<br>STATISTICS<br>RMON<br>STATISTICS<br>REFDESIGN<br>STATISTICS | Reset All Select All C   Image: TX Pause Frame Count 0x034 H 00000000   Frame Length Error Count 0x03C H 00000000   Runt Frame Count 0x044 H 00000000   RX PFC Pause Frame Count 0x044 H 00000000   RX PFC Pause Frame Count 0x04C H 00000000 | RX Pause Frame Count   0x038 H 00000000   CRC Error Count | ► Statu    | s Log<br>Clear Log Save Log Add Time |
| BANDWIDTH<br>CALCULATION<br>REGRESSION                                                                                                                                                             | PCS                                                                                                                                                                                                                                           |                                                           |            |                                      |
| TEST                                                                                                                                                                                               | Blocksync Status 0x108 H 00000000                                                                                                                                                                                                             | AM Lock Status 0x10C H 00000000                           |            |                                      |
|                                                                                                                                                                                                    | Deskew Error                                                                                                                                                                                                                                  | VL0 BIP-8 Error Count                                     |            |                                      |
|                                                                                                                                                                                                    | 0x110 H 00000000                                                                                                                                                                                                                              | 0x114 H 00000000                                          |            |                                      |
|                                                                                                                                                                                                    | VL1 BIP-8 Error Count                                                                                                                                                                                                                         | VL2 BIP-8 Error Count                                     |            |                                      |
|                                                                                                                                                                                                    | 0x118 H 00000000                                                                                                                                                                                                                              | 0x11C H 00000000                                          | -          | • • •                                |