Diagnostic Concept for Investigating Twisted Pair Ethernet Variants at OSI Layer 1

Introduction

This documentation is designed for two primary audiences:

  1. Users and System Administrators: For those dealing with real-world Ethernet issues, this guide provides a practical, step-by-step troubleshooting flow to help identify and resolve common problems in Twisted Pair Ethernet at OSI Layer 1. If you’re facing unstable links, speed drops, or mysterious network issues, jump right into the step-by-step guide and follow it through to find your solution.

  2. Kernel Developers: For developers working with network drivers and PHY support, this documentation outlines the diagnostic process and highlights areas where the Linux kernel’s diagnostic interfaces could be extended or improved. By understanding the diagnostic flow, developers can better prioritize future enhancements.

Step-by-Step Diagnostic Guide from Linux (General Ethernet)

This diagnostic guide covers common Ethernet troubleshooting scenarios, focusing on link stability and detection across different Ethernet environments, including Single-Pair Ethernet (SPE) and Multi-Pair Ethernet (MPE), as well as power delivery technologies like PoDL (Power over Data Line) and PoE (Clause 33 PSE).

The guide is designed to help users diagnose physical layer (Layer 1) issues on systems running Linux kernel version 6.11 or newer, utilizing ethtool version 6.10 or later and iproute2 version 6.4.0 or later.

In this guide, we assume that users may have limited or no access to the link partner and will focus on diagnosing issues locally.

Diagnostic Scenarios

  • Link is up and stable, but no data transfer: If the link is stable but there are issues with data transmission, refer to the OSI Layer 2 Troubleshooting Guide.

  • Link is unstable: Link resets, speed drops, or other fluctuations indicate potential issues at the hardware or physical layer.

  • No link detected: The interface is up, but no link is established.

Verify Interface Status

Begin by verifying the status of the Ethernet interface to check if it is administratively up. Unlike ethtool, which provides information on the link and PHY status, it does not show the administrative state of the interface. To check this, you should use the ip command, which describes the interface state within the angle brackets “<>” in its output.

For example, in the output <NO-CARRIER,BROADCAST,MULTICAST,UP>, the important keywords are:

  • UP: The interface is in the administrative “UP” state.

  • NO-CARRIER: The interface is administratively up, but no physical link is detected.

If the output shows <BROADCAST,MULTICAST>, this indicates the interface is in the administrative “DOWN” state.

  • Command: ip link show dev <interface>

  • Expected Output:

    4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ...
       link/ether 88:14:2b:00:96:f2 brd ff:ff:ff:ff:ff:ff
    
  • Interpreting the Output:

    • Administrative UP State:

      • If the output contains “UP”, the interface is administratively up, and the system is trying to establish a physical link.

      • If you also see “NO-CARRIER”, it means the physical link has not been detected, indicating potential Layer 1 issues like a cable fault, misconfiguration, or no connection at the link partner. In this case, proceed to the Inspect Link Status and PHY Configuration section.

    • Administrative DOWN State:

      • If the output lacks “UP” and shows only states like “<BROADCAST,MULTICAST>”, it means the interface is administratively down. In this case, bring the interface up using the following command:

        ip link set dev <interface> up
        
  • Next Steps:

    • If the interface is administratively up but shows NO-CARRIER, proceed to the Inspect Link Status and PHY Configuration section to troubleshoot potential physical layer issues.

    • If the interface was administratively down and you have brought it up, ensure to repeat this verification step to confirm the new state of the interface before proceeding

    • If the interface is up and the link is detected:

      • If the output shows “UP” and there is no `NO-CARRIER`, the interface is administratively up, and the physical link has been successfully established. If everything is working as expected, the Layer 1 diagnostics are complete, and no further action is needed.

      • If the interface is up and the link is detected but no data is being transferred, the issue is likely beyond Layer 1, and you should proceed with diagnosing the higher layers of the OSI model. This may involve checking Layer 2 configurations (such as VLANs or MAC address issues), Layer 3 settings (like IP addresses, routing, or ARP), or Layer 4 and above (firewalls, services, etc.).

      • If the link is unstable or frequently resetting or dropping, this may indicate a physical layer issue such as a faulty cable, interference, or power delivery problems. In this case, proceed with the next step in this guide.

Check Power Delivery (PoDL or PoE)

If it is known that PoDL or PoE is not implemented on the system, or the PSE (Power Sourcing Equipment) is managed by proprietary user-space software or external tools, you can skip this step. In such cases, verify power delivery through alternative methods, such as checking hardware indicators (LEDs), using multimeters, or consulting vendor-specific software for monitoring power status.

If PoDL or PoE is implemented and managed directly by Linux, follow these steps to ensure power is being delivered correctly:

  • Command: ethtool --show-pse <interface>

  • Expected Output Examples:

    1. PSE Not Supported:

      If no PSE is attached or the interface does not support PSE, the following output is expected:

      netlink error: No PSE is attached
      netlink error: Operation not supported
      
    2. PoDL (Single-Pair Ethernet):

      When PoDL is implemented, you might see the following attributes:

      PSE attributes for eth1:
      PoDL PSE Admin State: enabled
      PoDL PSE Power Detection Status: delivering power
      
    3. PoE (Clause 33 PSE):

      For standard PoE, the output may look like this:

      PSE attributes for eth1:
      Clause 33 PSE Admin State: enabled
      Clause 33 PSE Power Detection Status: delivering power
      Clause 33 PSE Available Power Limit: 18000
      
  • Adjust Power Limit (if needed):

    • Sometimes, the available power limit may not be sufficient for the link partner. You can increase the power limit as needed.

    • Command: ethtool --set-pse <interface> c33-pse-avail-pw-limit <limit>

      Example:

      ethtool --set-pse eth1 c33-pse-avail-pw-limit 18000
      ethtool --show-pse eth1
      

      Expected Output after adjusting the power limit:

      Clause 33 PSE Available Power Limit: 18000
      
  • Next Steps:

    • PoE or PoDL Not Used: If PoE or PoDL is not implemented or used on the system, proceed to the next diagnostic step, as power delivery is not relevant for this setup.

    • PoE or PoDL Controlled Externally: If PoE or PoDL is used but is not managed by the Linux kernel’s PSE-PD framework (i.e., it is controlled by proprietary user-space software or external tools), this part is out of scope for this documentation. Please consult vendor-specific documentation or external tools for monitoring and managing power delivery.

    • PSE Admin State Disabled:

      • If the PSE Admin State: is disabled, enable it by running one of the following commands:

        ethtool --set-pse <devname> podl-pse-admin-control enable
        

        or, for Clause 33 PSE (PoE):

        ethtool --set-pse <devname> c33-pse-admin-control enable

      • After enabling the PSE Admin State, return to the start of the Check Power Delivery (PoDL or PoE) step to recheck the power delivery status.

    • Power Not Delivered: If the Power Detection Status shows something other than “delivering power” (e.g., over current), troubleshoot the PSE. Check for potential issues such as a short circuit in the cable, insufficient power delivery, or a fault in the PSE itself.

    • Power Delivered but No Link: If power is being delivered but no link is established, proceed with further diagnostics by performing Cable Diagnostics or reviewing the Inspect Link Status and PHY Configuration steps to identify any underlying issues with the physical link or settings.

Cable Diagnostics

Use ethtool to test for physical layer issues such as cable faults. The test results can vary depending on the cable’s condition, the technology in use, and the state of the link partner. The results from the cable test will help in diagnosing issues like open circuits, shorts, impedance mismatches, and noise-related problems.

  • Command: ethtool --cable-test <interface>

The following are the typical outputs for Single-Pair Ethernet (SPE) and Multi-Pair Ethernet (MPE):

  • For Single-Pair Ethernet (SPE): - Expected Output (SPE):

    Cable test completed for device eth1.
    Pair A, fault length: 25.00m
    Pair A code Open Circuit
    

    This indicates an open circuit or cable fault at the reported distance, but results can be influenced by the link partner’s state. Refer to the “Troubleshooting Based on Cable Test Results” section for further interpretation of these results.

  • For Multi-Pair Ethernet (MPE): - Expected Output (MPE):

    Cable test completed for device eth0.
    Pair A code OK
    Pair B code OK
    Pair C code Open Circuit
    

    Here, Pair C is reported as having an open circuit, while Pairs A and B are functioning correctly. However, if autonegotiation is in use on Pairs A and B, the cable test may be disrupted. Refer to the “Troubleshooting Based on Cable Test Results” section for a detailed explanation of these issues and how to resolve them.

For detailed descriptions of the different possible cable test results, please refer to the “Troubleshooting Based on Cable Test Results” section.

Troubleshooting Based on Cable Test Results

After running the cable test, the results can help identify specific issues in the physical connection. However, it is important to note that cable testing results heavily depend on the capabilities and characteristics of both the local hardware and the link partner. The accuracy and reliability of the results can vary significantly between different hardware implementations.

In some cases, this can introduce blind spots in the current cable testing implementation, where certain results may not accurately reflect the actual physical state of the cable. For example:

  • An Open Circuit result might not only indicate a damaged or disconnected cable but also occur if the cable is properly attached to a powered-down link partner.

  • Some PHYs may report a Short within Pair if the link partner is in forced slave mode, even though there is no actual short in the cable.

To help users interpret the results more effectively, it could be beneficial to extend the kernel UAPI (User API) to provide additional context or possible variants of issues based on the hardware’s characteristics. Since these quirks are often hardware-specific, the kernel driver would be an ideal source of such information. By providing flags or hints related to potential false positives for each test result, users would have a better understanding of what to verify and where to investigate further.

Until such improvements are made, users should be aware of these limitations and manually verify cable issues as needed. Physical inspections may help resolve uncertainties related to false positive results.

The results can be one of the following:

  • OK:

    • The cable is functioning correctly, and no issues were detected.

    • Next Steps: If you are still experiencing issues, it might be related to higher-layer problems, such as duplex mismatches or speed negotiation, which are not physical-layer issues.

    • Special Case for `BaseT1` (1000/100/10BaseT1): In BaseT1 systems, an “OK” result typically also means that the link is up and likely in slave mode, since cable tests usually only pass in this mode. For some 10BaseT1L PHYs, an “OK” result may occur even if the cable is too long for the PHY’s configured range (for example, when the range is configured for short-distance mode).

  • Open Circuit:

    • An Open Circuit result typically indicates that the cable is damaged or disconnected at the reported fault length. Consider these possibilities:

      • If the link partner is in admin down state or powered off, you might still get an “Open Circuit” result even if the cable is functional.

      • Next Steps: Inspect the cable at the fault length for visible damage or loose connections. Verify the link partner is powered on and in the correct mode.

  • Short within Pair:

    • A Short within Pair indicates an unintended connection within the same pair of wires, typically caused by physical damage to the cable.

      • Next Steps: Replace or repair the cable and check for any physical damage or improperly crimped connectors.

  • Short to Another Pair:

    • A Short to Another Pair means the wires from different pairs are shorted, which could occur due to physical damage or incorrect wiring.

      • Next Steps: Replace or repair the damaged cable. Inspect the cable for incorrect terminations or pinched wiring.

  • Impedance Mismatch:

    • Impedance Mismatch indicates a reflection caused by an impedance discontinuity in the cable. This can happen when a part of the cable has abnormal impedance (e.g., when different cable types are spliced together or when there is a defect in the cable).

      • Next Steps: Check the cable quality and ensure consistent impedance throughout its length. Replace any sections of the cable that do not meet specifications.

  • Noise:

    • Noise means that the Time Domain Reflectometry (TDR) test could not complete due to excessive noise on the cable, which can be caused by interference from electromagnetic sources.

      • Next Steps: Identify and eliminate sources of electromagnetic interference (EMI) near the cable. Consider using shielded cables or rerouting the cable away from noise sources.

  • Resolution Not Possible:

    • Resolution Not Possible means that the TDR test could not detect the issue due to the resolution limitations of the test or because the fault is beyond the distance that the test can measure.

      • Next Steps: Inspect the cable manually if possible, or use alternative diagnostic tools that can handle greater distances or higher resolution.

  • Unknown:

    • An Unknown result may occur when the test cannot classify the fault or when a specific issue is outside the scope of the tool’s detection capabilities.

      • Next Steps: Re-run the test, verify the link partner’s state, and inspect the cable manually if necessary.

When All Else Fails...

So you’ve checked the cables, monitored the logs, disabled EEE, and still... nothing? Don’t worry, you’re not alone. Sometimes, Ethernet gremlins just don’t want to cooperate.

But before you throw in the towel (or the Ethernet cable), take a deep breath. It’s always possible that:

  1. Your PHY has a unique, undocumented personality.

  2. The problem is lying dormant, waiting for just the right moment to magically resolve itself (hey, it happens!).

  3. Or, it could be that the ultimate solution simply hasn’t been invented yet.

If none of the above bring you comfort, there’s one final step: contribute! If you’ve uncovered new or unusual issues, or have creative diagnostic methods, feel free to share your findings and extend this documentation. Together, we can hunt down every elusive network issue - one twisted pair at a time.

Remember: sometimes the solution is just a reboot away, but if not, it’s time to dig deeper - or report that bug!