Diagnostic Concept for Investigating Twisted Pair Ethernet Variants at OSI Layer 1¶

Introduction¶

This documentation is designed for two primary audiences:

Users and System Administrators: For those dealing with real-world Ethernet issues, this guide provides a practical, step-by-step troubleshooting flow to help identify and resolve common problems in Twisted Pair Ethernet at OSI Layer 1. If you’re facing unstable links, speed drops, or mysterious network issues, jump right into the step-by-step guide and follow it through to find your solution.
Kernel Developers: For developers working with network drivers and PHY support, this documentation outlines the diagnostic process and highlights areas where the Linux kernel’s diagnostic interfaces could be extended or improved. By understanding the diagnostic flow, developers can better prioritize future enhancements.

Step-by-Step Diagnostic Guide from Linux (General Ethernet)¶

This diagnostic guide covers common Ethernet troubleshooting scenarios, focusing on link stability and detection across different Ethernet environments, including Single-Pair Ethernet (SPE) and Multi-Pair Ethernet (MPE), as well as power delivery technologies like PoDL (Power over Data Line) and PoE (Clause 33 PSE).

The guide is designed to help users diagnose physical layer (Layer 1) issues on systems running Linux kernel version 6.11 or newer, utilizing ethtool version 6.10 or later and iproute2 version 6.4.0 or later.

In this guide, we assume that users may have limited or no access to the link partner and will focus on diagnosing issues locally.

Diagnostic Scenarios¶

Link is up and stable, but no data transfer: If the link is stable but there are issues with data transmission, refer to the OSI Layer 2 Troubleshooting Guide.
Link is unstable: Link resets, speed drops, or other fluctuations indicate potential issues at the hardware or physical layer.
No link detected: The interface is up, but no link is established.

Verify Interface Status¶

Begin by verifying the status of the Ethernet interface to check if it is administratively up. Unlike ethtool, which provides information on the link and PHY status, it does not show the administrative state of the interface. To check this, you should use the ip command, which describes the interface state within the angle brackets “<>” in its output.

For example, in the output <NO-CARRIER,BROADCAST,MULTICAST,UP>, the important keywords are:

UP: The interface is in the administrative “UP” state.
NO-CARRIER: The interface is administratively up, but no physical link is detected.

If the output shows <BROADCAST,MULTICAST>, this indicates the interface is in the administrative “DOWN” state.

Command: ip link show dev <interface>

Expected Output:

4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ...
   link/ether 88:14:2b:00:96:f2 brd ff:ff:ff:ff:ff:ff

Interpreting the Output:
- Administrative UP State:
  - If the output contains “UP”, the interface is administratively up, and the system is trying to establish a physical link.
  - If you also see “NO-CARRIER”, it means the physical link has not been detected, indicating potential Layer 1 issues like a cable fault, misconfiguration, or no connection at the link partner. In this case, proceed to the Inspect Link Status and PHY Configuration section.
- Administrative DOWN State:
  - If the output lacks “UP” and shows only states like “<BROADCAST,MULTICAST>”, it means the interface is administratively down. In this case, bring the interface up using the following command:
    ip link set dev <interface> up
Next Steps:
- If the interface is administratively up but shows NO-CARRIER, proceed to the Inspect Link Status and PHY Configuration section to troubleshoot potential physical layer issues.
- If the interface was administratively down and you have brought it up, ensure to repeat this verification step to confirm the new state of the interface before proceeding
- If the interface is up and the link is detected:
  - If the output shows “UP” and there is no `NO-CARRIER`, the interface is administratively up, and the physical link has been successfully established. If everything is working as expected, the Layer 1 diagnostics are complete, and no further action is needed.
  - If the interface is up and the link is detected but no data is being transferred, the issue is likely beyond Layer 1, and you should proceed with diagnosing the higher layers of the OSI model. This may involve checking Layer 2 configurations (such as VLANs or MAC address issues), Layer 3 settings (like IP addresses, routing, or ARP), or Layer 4 and above (firewalls, services, etc.).
  - If the link is unstable or frequently resetting or dropping, this may indicate a physical layer issue such as a faulty cable, interference, or power delivery problems. In this case, proceed with the next step in this guide.

Inspect Link Status and PHY Configuration¶

Use ethtool -I to check the link status, PHY configuration, supported link modes, and additional statistics such as the Link Down Events counter. This step is essential for diagnosing Layer 1 problems such as speed mismatches, duplex issues, and link instability.

For both Single-Pair Ethernet (SPE) and Multi-Pair Ethernet (MPE) devices, you will use this step to gather key details about the link. SPE links generally support a single speed and mode without autonegotiation (with the exception of 10BaseT1L), while MPE devices typically support multiple link modes and autonegotiation.

Command: ethtool -I <interface>

Example Output for SPE Interface (Non-autonegotiation):

Settings for spe4:
    Supported ports: [ TP ]
    Supported link modes:   100baseT1/Full
    Supported pause frame use: No
    Supports auto-negotiation: No
    Supported FEC modes: Not reported
    Advertised link modes: Not applicable
    Advertised pause frame use: No
    Advertised auto-negotiation: No
    Advertised FEC modes: Not reported
    Speed: 100Mb/s
    Duplex: Full
    Auto-negotiation: off
    master-slave cfg: forced slave
    master-slave status: slave
    Port: Twisted Pair
    PHYAD: 6
    Transceiver: external
    MDI-X: Unknown
    Supports Wake-on: d
    Wake-on: d
    Link detected: yes
    SQI: 7/7
    Link Down Events: 2

Example Output for MPE Interface (Autonegotiation):

Settings for eth1:
    Supported ports: [ TP    MII ]
    Supported link modes:   10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
    Supported pause frame use: Symmetric Receive-only
    Supports auto-negotiation: Yes
    Supported FEC modes: Not reported
    Advertised link modes:  10baseT/Half 10baseT/Full
                            100baseT/Half 100baseT/Full
    Advertised pause frame use: Symmetric Receive-only
    Advertised auto-negotiation: Yes
    Advertised FEC modes: Not reported
    Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                         100baseT/Half 100baseT/Full
    Link partner advertised pause frame use: Symmetric Receive-only
    Link partner advertised auto-negotiation: Yes
    Link partner advertised FEC modes: Not reported
    Speed: 100Mb/s
    Duplex: Full
    Auto-negotiation: on
    Port: Twisted Pair
    PHYAD: 10
    Transceiver: internal
    MDI-X: Unknown
    Supports Wake-on: pg
    Wake-on: p
    Link detected: yes
    Link Down Events: 1

Next Steps:
- Record the output provided by ethtool, particularly noting the master-slave status, speed, duplex, and other relevant fields. This information will be useful for further analysis or troubleshooting. Once the ethtool output has been collected and stored, move on to the next diagnostic step.

Check Power Delivery (PoDL or PoE)¶

If it is known that PoDL or PoE is not implemented on the system, or the PSE (Power Sourcing Equipment) is managed by proprietary user-space software or external tools, you can skip this step. In such cases, verify power delivery through alternative methods, such as checking hardware indicators (LEDs), using multimeters, or consulting vendor-specific software for monitoring power status.

If PoDL or PoE is implemented and managed directly by Linux, follow these steps to ensure power is being delivered correctly:

Command: ethtool --show-pse <interface>

Expected Output Examples:

PSE Not Supported:

If no PSE is attached or the interface does not support PSE, the following output is expected:
```
netlink error: No PSE is attached
netlink error: Operation not supported
```

PoDL (Single-Pair Ethernet):

When PoDL is implemented, you might see the following attributes:

PSE attributes for eth1:
PoDL PSE Admin State: enabled
PoDL PSE Power Detection Status: delivering power

PoE (Clause 33 PSE):

For standard PoE, the output may look like this:

PSE attributes for eth1:
Clause 33 PSE Admin State: enabled
Clause 33 PSE Power Detection Status: delivering power
Clause 33 PSE Available Power Limit: 18000

Adjust Power Limit (if needed):
- Sometimes, the available power limit may not be sufficient for the link partner. You can increase the power limit as needed.
- Command: ethtool --set-pse <interface> c33-pse-avail-pw-limit <limit>
  
  Example:
```
ethtool --set-pse eth1 c33-pse-avail-pw-limit 18000
ethtool --show-pse eth1
```
  Expected Output after adjusting the power limit:
```
Clause 33 PSE Available Power Limit: 18000
```
Next Steps:
- PoE or PoDL Not Used: If PoE or PoDL is not implemented or used on the system, proceed to the next diagnostic step, as power delivery is not relevant for this setup.
- PoE or PoDL Controlled Externally: If PoE or PoDL is used but is not managed by the Linux kernel’s PSE-PD framework (i.e., it is controlled by proprietary user-space software or external tools), this part is out of scope for this documentation. Please consult vendor-specific documentation or external tools for monitoring and managing power delivery.
- PSE Admin State Disabled:
  - If the PSE Admin State: is disabled, enable it by running one of the following commands:
    ethtool --set-pse <devname> podl-pse-admin-control enable
    or, for Clause 33 PSE (PoE):
    
    ethtool --set-pse <devname> c33-pse-admin-control enable
  - After enabling the PSE Admin State, return to the start of the Check Power Delivery (PoDL or PoE) step to recheck the power delivery status.
- Power Not Delivered: If the Power Detection Status shows something other than “delivering power” (e.g., over current), troubleshoot the PSE. Check for potential issues such as a short circuit in the cable, insufficient power delivery, or a fault in the PSE itself.
- Power Delivered but No Link: If power is being delivered but no link is established, proceed with further diagnostics by performing Cable Diagnostics or reviewing the Inspect Link Status and PHY Configuration steps to identify any underlying issues with the physical link or settings.

Cable Diagnostics¶

Use ethtool to test for physical layer issues such as cable faults. The test results can vary depending on the cable’s condition, the technology in use, and the state of the link partner. The results from the cable test will help in diagnosing issues like open circuits, shorts, impedance mismatches, and noise-related problems.

Command: ethtool --cable-test <interface>

The following are the typical outputs for Single-Pair Ethernet (SPE) and Multi-Pair Ethernet (MPE):

For Single-Pair Ethernet (SPE): - Expected Output (SPE):
```
Cable test completed for device eth1.
Pair A, fault length: 25.00m
Pair A code Open Circuit
```
This indicates an open circuit or cable fault at the reported distance, but results can be influenced by the link partner’s state. Refer to the “Troubleshooting Based on Cable Test Results” section for further interpretation of these results.
For Multi-Pair Ethernet (MPE): - Expected Output (MPE):
```
Cable test completed for device eth0.
Pair A code OK
Pair B code OK
Pair C code Open Circuit
```
Here, Pair C is reported as having an open circuit, while Pairs A and B are functioning correctly. However, if autonegotiation is in use on Pairs A and B, the cable test may be disrupted. Refer to the “Troubleshooting Based on Cable Test Results” section for a detailed explanation of these issues and how to resolve them.

For detailed descriptions of the different possible cable test results, please refer to the “Troubleshooting Based on Cable Test Results” section.

Troubleshooting Based on Cable Test Results¶

After running the cable test, the results can help identify specific issues in the physical connection. However, it is important to note that cable testing results heavily depend on the capabilities and characteristics of both the local hardware and the link partner. The accuracy and reliability of the results can vary significantly between different hardware implementations.

In some cases, this can introduce blind spots in the current cable testing implementation, where certain results may not accurately reflect the actual physical state of the cable. For example:

An Open Circuit result might not only indicate a damaged or disconnected cable but also occur if the cable is properly attached to a powered-down link partner.
Some PHYs may report a Short within Pair if the link partner is in forced slave mode, even though there is no actual short in the cable.

To help users interpret the results more effectively, it could be beneficial to extend the kernel UAPI (User API) to provide additional context or possible variants of issues based on the hardware’s characteristics. Since these quirks are often hardware-specific, the kernel driver would be an ideal source of such information. By providing flags or hints related to potential false positives for each test result, users would have a better understanding of what to verify and where to investigate further.

Until such improvements are made, users should be aware of these limitations and manually verify cable issues as needed. Physical inspections may help resolve uncertainties related to false positive results.

The results can be one of the following:

OK:
- The cable is functioning correctly, and no issues were detected.
- Next Steps: If you are still experiencing issues, it might be related to higher-layer problems, such as duplex mismatches or speed negotiation, which are not physical-layer issues.
- Special Case for `BaseT1` (1000/100/10BaseT1): In BaseT1 systems, an “OK” result typically also means that the link is up and likely in slave mode, since cable tests usually only pass in this mode. For some 10BaseT1L PHYs, an “OK” result may occur even if the cable is too long for the PHY’s configured range (for example, when the range is configured for short-distance mode).
Open Circuit:
- An Open Circuit result typically indicates that the cable is damaged or disconnected at the reported fault length. Consider these possibilities:
  - If the link partner is in admin down state or powered off, you might still get an “Open Circuit” result even if the cable is functional.
  - Next Steps: Inspect the cable at the fault length for visible damage or loose connections. Verify the link partner is powered on and in the correct mode.
Short within Pair:
- A Short within Pair indicates an unintended connection within the same pair of wires, typically caused by physical damage to the cable.
  - Next Steps: Replace or repair the cable and check for any physical damage or improperly crimped connectors.
Short to Another Pair:
- A Short to Another Pair means the wires from different pairs are shorted, which could occur due to physical damage or incorrect wiring.
  - Next Steps: Replace or repair the damaged cable. Inspect the cable for incorrect terminations or pinched wiring.
Impedance Mismatch:
- Impedance Mismatch indicates a reflection caused by an impedance discontinuity in the cable. This can happen when a part of the cable has abnormal impedance (e.g., when different cable types are spliced together or when there is a defect in the cable).
  - Next Steps: Check the cable quality and ensure consistent impedance throughout its length. Replace any sections of the cable that do not meet specifications.
Noise:
- Noise means that the Time Domain Reflectometry (TDR) test could not complete due to excessive noise on the cable, which can be caused by interference from electromagnetic sources.
  - Next Steps: Identify and eliminate sources of electromagnetic interference (EMI) near the cable. Consider using shielded cables or rerouting the cable away from noise sources.
Resolution Not Possible:
- Resolution Not Possible means that the TDR test could not detect the issue due to the resolution limitations of the test or because the fault is beyond the distance that the test can measure.
  - Next Steps: Inspect the cable manually if possible, or use alternative diagnostic tools that can handle greater distances or higher resolution.
Unknown:
- An Unknown result may occur when the test cannot classify the fault or when a specific issue is outside the scope of the tool’s detection capabilities.
  - Next Steps: Re-run the test, verify the link partner’s state, and inspect the cable manually if necessary.

Verify Link Partner PHY Configuration¶

If the cable test passes but the link is still not functioning correctly, it’s essential to verify the configuration of the link partner’s PHY. Mismatches in speed, duplex settings, or master-slave roles can cause connection issues.

Autonegotiation Mismatch¶

If both link partners support autonegotiation, ensure that autonegotiation is enabled on both sides and that all supported link modes are advertised. A mismatch can lead to connectivity problems or sub optimal performance.
Quick Fix: Reset autonegotiation to the default settings, which will advertise all default link modes:
```
ethtool -s <interface> autoneg on
```
Command to check configuration: ethtool <interface>
Expected Output: Ensure that both sides advertise compatible link modes. If autonegotiation is off, verify that both link partners are configured for the same speed and duplex.

The following example shows a case where the local PHY advertises fewer link modes than it supports. This will reduce the number of overlapping link modes with the link partner. In the worst case, there will be no common link modes, and the link will not be created:
```
Settings for eth0:
   Supported link modes:  1000baseT/Full, 100baseT/Full
   Advertised link modes: 1000baseT/Full
   Speed: 1000Mb/s
   Duplex: Full
   Auto-negotiation: on
```

Combined Mode Mismatch (Autonegotiation on One Side, Forced on the Other)¶

One possible issue occurs when one side is using autonegotiation (as in most modern systems), and the other side is set to a forced link mode (e.g., older hardware with single-speed hubs). In such cases, modern PHYs will attempt to detect the forced mode on the other side. If the link is established, you may notice:
- No or empty “Link partner advertised link modes”.
- “Link partner advertised auto-negotiation:” will be “no” or not present.
This type of detection does not always work reliably:
- Typically, the modern PHY will default to Half Duplex, even if the link partner is actually configured for Full Duplex.
- Some PHYs may not work reliably if the link partner switches from one forced mode to another. In this case, only a down/up cycle may help.
Next Steps: Set both sides to the same fixed speed and duplex mode to avoid potential detection issues.
```
ethtool -s <interface> speed 1000 duplex full autoneg off
```

Master/Slave Role Mismatch (BaseT1 and 1000BaseT PHYs)¶

In BaseT1 systems (e.g., 1000BaseT1, 100BaseT1), link establishment requires that one device is configured as master and the other as slave. A mismatch in this master-slave configuration can prevent the link from being established. However, 1000BaseT also supports configurable master/slave roles and can face similar issues.
Role Preference in 1000BaseT: The 1000BaseT specification allows link partners to negotiate master-slave roles or role preferences during autonegotiation. Some PHYs have hardware limitations or bugs that prevent them from functioning properly in certain roles. In such cases, drivers may force these PHYs into a specific role (e.g., forced master or forced slave) or try a weaker option by setting preferences. If both link partners have the same issue and are forced into the same mode (e.g., both forced into master mode), they will not be able to establish a link.
Next Steps: Ensure that one side is configured as master and the other as slave to avoid this issue, particularly when hardware limitations are involved, or try the weaker preferred option instead of forced. Check for any driver-related restrictions or forced modes.

Command to force master/slave mode:

ethtool -s <interface> master-slave forced-master

or:

ethtool -s <interface> master-slave forced-master speed 1000 duplex full autoneg off

Check the current master/slave status:

ethtool <interface>

Example Output:

master-slave cfg: forced-master
master-slave status: master

Hardware Bugs and Driver Forcing: If a known hardware issue forces the PHY into a specific mode, it’s essential to check the driver source code or hardware documentation for details. Ensure that the roles are compatible across both link partners, and if both PHYs are forced into the same mode, adjust one side accordingly to resolve the mismatch.

Monitor Link Resets and Speed Drops¶

If the link is unstable, showing frequent resets or speed drops, this may indicate issues with the cable, PHY configuration, or environmental factors. While there is still no completely unified way in Linux to directly monitor downshift events or link speed changes via user space tools, both the Linux kernel logs and ethtool can provide valuable insights, especially if the driver supports reporting such events.

Monitor Kernel Logs for Link Resets and Speed Drops:
- The Linux kernel will print link status changes, including downshift events, in the system logs. These messages typically include speed changes, duplex mode, and downshifted link speed (if the driver supports it).
- Command to monitor kernel logs in real-time:
```
dmesg -w | grep "Link is Up\|Link is Down"
```
- Example Output (if a downshift occurs):
```
eth0: Link is Up - 100Mbps/Full (downshifted) - flow control rx/tx
eth0: Link is Down
```
  This indicates that the link has been established but has downshifted from a higher speed.
- Note: Not all drivers or PHYs support downshift reporting, so you may not see this information for all devices.
Monitor Link Down Events Using `ethtool`:
- Starting with the latest kernel and ethtool versions, you can track Link Down Events using the ethtool -I command. This will provide counters for link drops, helping to diagnose link instability issues if supported by the driver.
- Command to monitor link down events:
```
ethtool -I <interface>
```
- Example Output (if supported):
```
PSE attributes for eth1:
Link Down Events: 5
```
  This indicates that the link has dropped 5 times. Frequent link down events may indicate cable or environmental issues that require further investigation.
Check Link Status and Speed:
- Even though downshift counts or events are not easily tracked, you can still use ethtool to manually check the current link speed and status.
- Command: ethtool <interface>
- Expected Output:
```
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
Link detected: yes
```
  Any inconsistencies in the expected speed or duplex setting could indicate an issue.
Disable Energy-Efficient Ethernet (EEE) for Diagnostics:
- EEE (Energy-Efficient Ethernet) can be a source of link instability due to transitions in and out of low-power states. For diagnostic purposes, it may be useful to temporarily disable EEE to determine if it is contributing to link instability. This is not a generic recommendation for disabling power management.
- Next Steps: Disable EEE and monitor if the link becomes stable. If disabling EEE resolves the issue, report the bug so that the driver can be fixed.
- Command:
```
ethtool --set-eee <interface> eee off
```
- Important: If disabling EEE resolves the instability, the issue should be reported to the maintainers as a bug, and the driver should be corrected to handle EEE properly without causing instability. Disabling EEE permanently should not be seen as a solution.
Monitor Error Counters:
- Use ethtool -S <interface> --all-groups to retrieve standardized interface statistics if the driver supports the unified interface:
- Command: ethtool -S <interface> --all-groups
- Example Output (if supported):
```
phydev-RxFrames: 100391
phydev-RxErrors: 0
phydev-TxFrames: 9
phydev-TxErrors: 0
```
- If the unified interface is not supported, use ethtool -S <interface> to retrieve MAC and PHY counters. Note that non-standardized PHY counter names vary by driver and must be interpreted accordingly:
- Command: ethtool -S <interface>
- Example Output (if supported):
```
rx_crc_errors: 123
tx_errors: 45
rx_frame_errors: 78
```
- Note: If no meaningful error counters are available or if counters are not supported, you may need to rely on physical inspections (e.g., cable condition) or kernel log messages (e.g., link up/down events) to further diagnose the issue.
- Compare Counters:
  - Compare the egress and ingress frame counts reported by the PHY and MAC.
  - A small difference may occur due to sampling rate differences between the MAC and PHY drivers, or if the PHY and MAC are not always fully synchronized in their UP or DOWN states.
  - Significant discrepancies indicate potential issues in the data path between the MAC and PHY.

When All Else Fails...¶

So you’ve checked the cables, monitored the logs, disabled EEE, and still... nothing? Don’t worry, you’re not alone. Sometimes, Ethernet gremlins just don’t want to cooperate.

But before you throw in the towel (or the Ethernet cable), take a deep breath. It’s always possible that:

Your PHY has a unique, undocumented personality.
The problem is lying dormant, waiting for just the right moment to magically resolve itself (hey, it happens!).
Or, it could be that the ultimate solution simply hasn’t been invented yet.

If none of the above bring you comfort, there’s one final step: contribute! If you’ve uncovered new or unusual issues, or have creative diagnostic methods, feel free to share your findings and extend this documentation. Together, we can hunt down every elusive network issue - one twisted pair at a time.

Remember: sometimes the solution is just a reboot away, but if not, it’s time to dig deeper - or report that bug!