BigAdmin System Administration Portal
Feature Tech Tip
Print-friendly VersionPrint-friendly Version

Understanding PCI Express Maximum Payload Size on OpenSolaris

Alan Adamson, December 2008

The maximum performance that can be observed on a PCI Express (PCIe) fabric has a lot to do with the efficiency of the protocol. The ratio of the payload to the header and other overhead symbols determines how close the performance is to the theoretical maximum. By increasing the amount of data in each payload, one would expect to get closer to the link bandwidth limits.

By default, PCIe devices are not allowed to create payloads of greater than 128 bytes. This is the device's maximum payload size (MPS). The PCIe Specification requires every device upon power-on to be able to create and receive payloads no larger than 128 bytes. The MPS of each device is a software tunable parameter and if left untouched will remain at 128 bytes.

Though the PCIe Specification allows devices to support payloads as large as 4096 bytes, it does not require it. Valid maximum payload sizes are 128, 256, 512, 1024, 2048, and 4096 bytes. The maximum payload size capability of each device is exported via each device's Device Capabilities Register.

OpenSolaris has been modified to automatically tune each device's MPS beginning with Nevada Build 99 (and this modification is planned for a future Solaris update). The OS needs to "understand" each device's capabilities before increasing the MPS. At operating system startup, every device on each fabric is scanned for its payload capabilities. This includes the end-points and switch-port devices, as well as the Root Complex. When all devices have been scanned, the largest possible payload size that is supportable by every device is programmed into each device's Device Control Register.

Proper PCI Express Configuration

As mentioned above, a payload size that is supportable by all devices on the fabric is used. This means a high-performance storage device that is capable of supporting a payload size of 4096 bytes may be limited if installed within the same fabric as a lower-performing device like a USB bridge that supports a payload of only 128 bytes. In this example, the high-performance storage device will have its MPS programmed to 128 bytes rather than a higher value. Systems should be configured in a way so that lower-performing devices do not share a fabric with higher-performing devices.

Hot-Plugging

When a PCIe Express device is hot-plugged onto an active fabric, the capabilities of the device were not taken into account when the MPS was programmed through the fabric. If the device's MPS capabilities are equal to or greater than what was programmed through the fabric, the device will use the same MPS as the rest of the fabric. If the device's MPS capabilities are less than the rest of the fabric's, further configuration will need to be done to prevent the device from receiving a payload that is too large for it.

When a device's MPS does not support the rest of the fabric, the device's MPS will be set to the highest value it supports. For writes initiated by this device, write payloads will be no larger than the device's MPS, which will be smaller (and thus supportable) by the rest of the fabric. For reads, the read completion payload is created by the root complex and its size will be dictated by the root complex's MPS. In this hot-plug case, a read completion payload may be too large to be received by the hot-plugged device. This can be resolved by never issuing reads for more data than the device's MPS.

By default, the largest read initiated by a device is 512 bytes. This can be changed by programming the Maximum Read Request Size (MRRS) configuration parameter. When the device's MRRS is set to the device's MPS, it prevents the root complex from generating read completion payloads larger than the device's MPS. If the size of each read is limited, the resulting side effect may be a decrease in read performance by the newly hot-plugged device until OpenSolaris is rebooted.

Conclusion

When PCI Express fabrics are configured properly, OpenSolaris has the ability to increase PCIe protocol efficiency and improve performance. On some configurations, PCIe read and write bandwidth has been observed to improve from 10 to 15% just by increasing the payload size. System configurators should include a device's payload capabilities as one of the performance parameters when configuring each fabric.

For More Information

Here are additional resources:

About the Author

Alan Adamson is a Software Engineer in Sun's Systems Group.


Comments (latest comments first)

Discuss and comment on this resource in the BigAdmin Wiki

Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License.


BigAdmin