Marvell OcteonTx2 RVU Kernel Drivers

Copyright (c) 2020 Marvell International Ltd.

Contents

Overview

Resource virtualization unit (RVU) on Marvell’s OcteonTX2 SOC maps HW resources from the network, crypto and other functional blocks into PCI-compatible physical and virtual functions. Each functional block again has multiple local functions (LFs) for provisioning to PCI devices. RVU supports multiple PCIe SRIOV physical functions (PFs) and virtual functions (VFs). PF0 is called the administrative / admin function (AF) and has privileges to provision RVU functional block’s LFs to each of the PF/VF.

RVU managed networking functional blocks
  • Network pool or buffer allocator (NPA)

  • Network interface controller (NIX)

  • Network parser CAM (NPC)

  • Schedule/Synchronize/Order unit (SSO)

  • Loopback interface (LBK)

RVU managed non-networking functional blocks
  • Crypto accelerator (CPT)

  • Scheduled timers unit (TIM)

  • Schedule/Synchronize/Order unit (SSO) Used for both networking and non networking usecases

Resource provisioning examples
  • A PF/VF with NIX-LF & NPA-LF resources works as a pure network device

  • A PF/VF with CPT-LF resource works as a pure crypto offload device.

RVU functional blocks are highly configurable as per software requirements.

Firmware setups following stuff before kernel boots
  • Enables required number of RVU PFs based on number of physical links.

  • Number of VFs per PF are either static or configurable at compile time. Based on config, firmware assigns VFs to each of the PFs.

  • Also assigns MSIX vectors to each of PF and VFs.

  • These are not changed after kernel boot.

Drivers

Linux kernel will have multiple drivers registering to different PF and VFs of RVU. Wrt networking there will be 3 flavours of drivers.

Admin Function driver

As mentioned above RVU PF0 is called the admin function (AF), this driver supports resource provisioning and configuration of functional blocks. Doesn’t handle any I/O. It sets up few basic stuff but most of the funcionality is achieved via configuration requests from PFs and VFs.

PF/VFs communicates with AF via a shared memory region (mailbox). Upon receiving requests AF does resource provisioning and other HW configuration. AF is always attached to host kernel, but PFs and their VFs may be used by host kernel itself, or attached to VMs or to userspace applications like DPDK etc. So AF has to handle provisioning/configuration requests sent by any device from any domain.

AF driver also interacts with underlying firmware to
  • Manage physical ethernet links ie CGX LMACs.

  • Retrieve information like speed, duplex, autoneg etc

  • Retrieve PHY EEPROM and stats.

  • Configure FEC, PAM modes

  • etc

From pure networking side AF driver supports following functionality.
  • Map a physical link to a RVU PF to which a netdev is registered.

  • Attach NIX and NPA block LFs to RVU PF/VF which provide buffer pools, RQs, SQs for regular networking functionality.

  • Flow control (pause frames) enable/disable/config.

  • HW PTP timestamping related config.

  • NPC parser profile config, basically how to parse pkt and what info to extract.

  • NPC extract profile config, what to extract from the pkt to match data in MCAM entries.

  • Manage NPC MCAM entries, upon request can frame and install requested packet forwarding rules.

  • Defines receive side scaling (RSS) algorithms.

  • Defines segmentation offload algorithms (eg TSO)

  • VLAN stripping, capture and insertion config.

  • SSO and TIM blocks config which provide packet scheduling support.

  • Debugfs support, to check current resource provising, current status of NPA pools, NIX RQ, SQ and CQs, various stats etc which helps in debugging issues.

  • And many more.

Physical Function driver

This RVU PF handles IO, is mapped to a physical ethernet link and this driver registers a netdev. This supports SR-IOV. As said above this driver communicates with AF with a mailbox. To retrieve information from physical links this driver talks to AF and AF gets that info from firmware and responds back ie cannot talk to firmware directly.

Supports ethtool for configuring links, RSS, queue count, queue size, flow control, ntuple filters, dump PHY EEPROM, config FEC etc.

Virtual Function driver

There are two types VFs, VFs that share the physical link with their parent SR-IOV PF and the VFs which work in pairs using internal HW loopback channels (LBK).

Type1:
  • These VFs and their parent PF share a physical link and used for outside communication.

  • VFs cannot communicate with AF directly, they send mbox message to PF and PF forwards that to AF. AF after processing, responds back to PF and PF forwards the reply to VF.

  • From functionality point of view there is no difference between PF and VF as same type HW resources are attached to both. But user would be able to configure few stuff only from PF as PF is treated as owner/admin of the link.

Type2:
  • RVU PF0 ie admin function creates these VFs and maps them to loopback block’s channels.

  • A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of VF0 will be received by VF1 and vice versa.

  • These VFs can be used by applications or virtual machines to communicate between them without sending traffic outside. There is no switch present in HW, hence the support for loopback VFs.

  • These communicate directly with AF (PF0) via mbox.

Except for the IO channels or links used for packet reception and transmission there is no other difference between these VF types. AF driver takes care of IO channel mapping, hence same VF driver works for both types of devices.

Basic packet flow

Ingress

  1. CGX LMAC receives packet.

  2. Forwards the packet to the NIX block.

  3. Then submitted to NPC block for parsing and then MCAM lookup to get the destination RVU device.

  4. NIX LF attached to the destination RVU device allocates a buffer from RQ mapped buffer pool of NPA block LF.

  5. RQ may be selected by RSS or by configuring MCAM rule with a RQ number.

  6. Packet is DMA’ed and driver is notified.

Egress

  1. Driver prepares a send descriptor and submits to SQ for transmission.

  2. The SQ is already configured (by AF) to transmit on a specific link/channel.

  3. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF.

  4. NIX block transmits the pkt on the designated channel.

  5. NPC MCAM entries can be installed to divert pkt onto a different channel.

Quality of service

Hardware algorithms used in scheduling

octeontx2 silicon and CN10K transmit interface consists of five transmit levels starting from SMQ/MDQ, TL4 to TL1. Each packet will traverse MDQ, TL4 to TL1 levels. Each level contains an array of queues to support scheduling and shaping. The hardware uses the below algorithms depending on the priority of scheduler queues. once the usercreates tc classes with different priorities, the driver configures schedulers allocated to the class with specified priority along with rate-limiting configuration.

  1. Strict Priority

    • Once packets are submitted to MDQ, hardware picks all active MDQs having different priority using strict priority.

  2. Round Robin

    • Active MDQs having the same priority level are chosen using round robin.

Setup HTB offload

  1. Enable HW TC offload on the interface:

    # ethtool -K <interface> hw-tc-offload on
    
  2. Crate htb root:

    # tc qdisc add dev <interface> clsact
    # tc qdisc replace dev <interface> root handle 1: htb offload
    
  3. Create tc classes with different priorities:

    # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 1
    
    # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 7
    
  4. Create tc classes with same priorities and different quantum:

    # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 2 quantum 409600
    
    # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 2 quantum 188416
    
    # tc class add dev <interface> parent 1: classid 1:3 htb rate 10Gbit prio 2 quantum 32768
    

RVU Representors

RVU representor driver adds support for creation of representor devices for RVU PFs’ VFs in the system. Representor devices are created when user enables the switchdev mode. Switchdev mode can be enabled either before or after setting up SRIOV numVFs. All representor devices share a single NIXLF but each has a dedicated Rx/Tx queues. RVU PF representor driver registers a separate netdev for each Rx/Tx queue pair.

Current HW does not support built-in switch which can do L2 learning and forwarding packets between representee and representor. Hence, packet path between representee and it’s representor is achieved by setting up appropriate NPC MCAM filters. Transmit packets matching these filters will be loopbacked through hardware loopback channel/interface (i.e, instead of sending them out of MAC interface). Which will again match the installed filters and will be forwarded. This way representee => representor and representor => representee packet path is achieved. These rules get installed when representors are created and gets active/deactivate based on the representor/representee interface state.

Usage example:

  • Change device to switchdev mode:

    # devlink dev eswitch set pci/0002:1c:00.0 mode switchdev
    
  • List of representor devices on the system:

    # ip link show
    Rpf1vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether f6:43:83:ee:26:21 brd ff:ff:ff:ff:ff:ff
    Rpf1vf1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 12:b2:54:0e:24:54 brd ff:ff:ff:ff:ff:ff
    Rpf1vf2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 4a:12:c4:4c:32:62 brd ff:ff:ff:ff:ff:ff
    Rpf1vf3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether ca:cb:68:0e:e2:6e brd ff:ff:ff:ff:ff:ff
    Rpf2vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 06:cc:ad:b4:f0:93 brd ff:ff:ff:ff:ff:ff
    

To delete the representors devices from the system. Change the device to legacy mode.

  • Change device to legacy mode:

    # devlink dev eswitch set pci/0002:1c:00.0 mode legacy
    

RVU representors can be managed using devlink ports (see Documentation/networking/devlink/devlink-port.rst) interface.

  • Show devlink ports of representors:

    # devlink port
    pci/0002:1c:00.0/0: type eth netdev Rpf1vf0 flavour physical port 0 splittable false
    pci/0002:1c:00.0/1: type eth netdev Rpf1vf1 flavour pcivf controller 0 pfnum 1 vfnum 1 external false splittable false
    pci/0002:1c:00.0/2: type eth netdev Rpf1vf2 flavour pcivf controller 0 pfnum 1 vfnum 2 external false splittable false
    pci/0002:1c:00.0/3: type eth netdev Rpf1vf3 flavour pcivf controller 0 pfnum 1 vfnum 3 external false splittable false
    

Function attributes

The RVU representor support function attributes for representors. Port function configuration of the representors are supported through devlink eswitch port.

MAC address setup

RVU representor driver support devlink port function attr mechanism to setup MAC address. (refer to Devlink Port)

  • To setup MAC address for port 2:

    # devlink port function set pci/0002:1c:00.0/2 hw_addr 5c:a1:1b:5e:43:11
    # devlink port show pci/0002:1c:00.0/2
    pci/0002:1c:00.0/2: type eth netdev Rpf1vf2 flavour pcivf controller 0 pfnum 1 vfnum 2 external false splittable false
    function:
            hw_addr 5c:a1:1b:5e:43:11
    

TC offload

The rvu representor driver implements support for offloading tc rules using port representors.

  • Drop packets with vlan id 3:

    # tc filter add dev Rpf1vf0 protocol 802.1Q parent ffff: flower vlan_id 3 vlan_ethtype ipv4 skip_sw action drop
    
  • Redirect packets with vlan id 5 and IPv4 packets to eth1, after stripping vlan header.:

    # tc filter add dev Rpf1vf0 ingress protocol 802.1Q flower vlan_id 5 vlan_ethtype ipv4 skip_sw action vlan pop action mirred ingress redirect dev eth1