You either get to C8 or you die trying.
2023-11-02
For the past couple weeks I’ve been trying to get to the bottom of a problem that’s very near and dear to my heart: my home server’s power consumption.
This isn’t going to be some sort of tutorial for fixing a server, but maybe it’ll give you ideas as for what to look at when you’re trying to get to the bottom of an issue like this. I’m not necessarily new to Linux, but have never really messed with PCIe or UEFI like this before. You’ve been warned. Before you send me angry emails going “but everyone already knows all of this, and those who don’t just need to do some googling”, well while that might be true, I would like to preserve the sanity of whoever finds themselves in a situation like mine, because I certainly wanted to stab myself multiple times while writing this post.
TL;DR: It took a new HBA, new 10G card, a UEFI tool, a script, powertop and loads of mental duct tape to finally resolve all of this. Well, I’m not entirely sure if “resolved” is the right word to describe it.
When I set out to move away from unRAID to TrueNAS Scale and ZFS, I intentionally picked very low power components in an effort to try and save power. Here’s a parts list:
Aside from having six HDDs, nothing immediately stands out to me as being very power hungry. But still, this machine sits in my closet consuming about 90W at idle. Turning on HDD spin down brings that number down to 58W. Still nowhere close to what you’d expect from a system with a tiny i3 and no dedicated GPU. Time to investigate.
I didn’t do my testing on TrueNAS Scale, but instead used a new Ubuntu install on an MX500 250GB SSD. It’s much easier to just reinstall that in case I completely break something. The kind of thing you’ve got to account for when you’re just throwing stuff at the wall and seeing what sticks. Also Ubuntu was the OS that the folks over on the unRAID forums suggested for troubleshooting, because apparently “stuff just works there.”
Question #1: When I take this exact same system, don’t unplug anything, and just swap TrueNAS Scale for Ubuntu, do I get the same numbers?
Answer: Not quite. The power meter says 85W at idle. However I think we can all agree that this is far too much for this tiny system.
What does powertop
say?
The attentive reader might notice that the CPU package does not enter C2 or any of the states below it. This is really bad. Meanwhile all the cores are chilling at C7.
Just for fun I ran sudo powertop --auto-tune
. It lowered the power consumption by maybe 1W at most but also enabled auto-suspend for all of my USB devices so now my mouse and keyboard don’t work quite like they should. What a terrible idea lmao
Now I finally get to unplug stuff.
Out go the SAS Expander, 10G card and the HBA.
With the BIOS reset to defaults and nothing plugged into the computer it pulls 28W from wall.
What does powertop
say now?
Still bad. Time to change some BIOS settings.
Settings → Platform Power: Platform Power Management: Enabled PEG ASPM: Enabled PCH ASPM: Enabled DMI ASPM: Enabled
Result: 27W. Ouch. Okay, not all hope is lost. There are still more settings I can tweak.
Tweaker → Advanced CPU Settings: CPU EIST Function: Enabled Intel(R) Turbo Boost Technology: Disabled C-States Control: Enabled CPU Enhanced Halt (C1E): Enabled C3 State Support: Enabled C6/7 State Support: Enabled C8 State Support: Enabled C10 State Support: Enabled Package C State Limit: C10 Settings → IO Ports: Audio Controller: Disabled OnBoard LAN Controller: Disabled
18 Watts. As you can see I disabled the on-board audio and network controllers. This board has 2.5Gb Ethernet, but I have a 10Gb PCIe card and switch so it is of no use for me. And it’s a server, so why bother with audio?
powertop
reports the following:
We’re in C2, that’s progress. Come on, powertop --auto-tune
, I know you can do it…
And we’re in C8! Yes!
The power meter reads 16W. I think this may just be the absolute minimum as far as power consumption goes.
A look at lspci
reveals that every PCI device currently running has ASPM enabled.
user@user-Z590-D:~$ sudo lspci -vvv | grep "ASPM .*abled" LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled+ CommClk+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- user@user-Z590-D:~$ sudo lspci -vvv | grep "ASPM" LnkCap: Port #17, Speed 8GT/s, Width x1,ASPM L0s L1, Exit Latency L0s <1us, L1 <4us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCap: Port #21, Speed 8GT/s, Width x4,ASPM L0s L1, Exit Latency L0s <1us, L1 <4us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCap: Port #1, Speed 8GT/s, Width x1,ASPM L0s L1, Exit Latency L0s <1us, L1 <4us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCap: Port #4, Speed 8GT/s, Width x1,ASPM L0s L1, Exit Latency L0s <1us, L1 <16us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled+ CommClk+ L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- LnkCap: Port #5, Speed 8GT/s, Width x4,ASPM L0s L1, Exit Latency L0s <1us, L1 <4us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCap: Port #9, Speed 8GT/s, Width x4,ASPM L0s L1, Exit Latency L0s <1us, L1 <4us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk- LnkCap: Port #13, Speed 8GT/s, Width x1,ASPM L0s L1, Exit Latency L0s <1us, L1 <4us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk-
Amazing! So now we know that the system is indeed capable of running without drawing too much power on its own. Certainly a step in the right direction.
Time to plug stuff back in. With the Mellanox ConnectX-3 back in my system, power usage shoots back up to 25W. powertop
shows that the system is stuck at C2. Running powertop --auto-tune
gets the system down to C3 and 23-24W. Something’s definitely not right.
What does lspci
say this time?
user@user-Z590-D:~$ sudo lspci -vvv | grep "ASPM .*abled" LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled+ CommClk+ LnkCtl:ASPM Disabled ; RCB 64 bytes, Disabled- CommClk+ LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk- LnkCtl:ASPM Disabled ; RCB 64 bytes, Disabled- CommClk+
Well well well… What device could that possibly be?
user@user-Z590-D:~$ sudo lspci -vvv -s 05:00.00 05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3] Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3] Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at 52100000 (64-bit, non-prefetchable) [size=1M] Region 2: Memory at 50000000 (64-bit, prefetchable) [size=8M] Expansion ROM at 52000000 [disabled] [size=1M] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [48] Vital Product Data Product Name: CX311A - ConnectX-3 SFP+ Read-only fields: [PN] Part number: MCX311A-XCAT_A [EC] Engineering changes: A7 [SN] Serial number: MT1621X14309 [V0] Vendor specific: PCIe Gen3 x4 [RV] Reserved: checksum good, 0 byte(s) reserved Read/write fields: [V1] Vendor specific: N/A [YA] Asset tag: N/A [RW] Read-write area: 109 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 253 byte(s) free [RW] Read-write area: 252 byte(s) free End Capabilities: [9c] MSI-X: Enable+ Count=128 Masked- Vector table: BAR=0 offset=0007c000 PBA: BAR=0 offset=0007d000 Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 116.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #8, Speed 8GT/s, Width x4,ASPM L0s, Exit Latency L0s unlimited ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl:ASPM Disabled ; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x4 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [148 v1] Device Serial Number 24-8a-07-03-00-5e-31-70 Capabilities: [154 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [18c v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Kernel driver in use: mlx4_core Kernel modules: mlx4_core
And what’s the second device in that list? Let’s have a look and see what the Mellanox card is plugged into:
user@user-Z590-D:~$ sudo lspci -t -[0000:00]-+-00.0 +-02.0 +-14.0 +-14.2 +-16.0 +-17.0 +-1b.0-[01]-- +-1b.4-[02]-- +-1c.0-[03]-- +-1c.3-[04]--+-1c.4-[05]----00.0 +-1d.0-[06]-- +-1d.4-[07]-- +-1f.0 +-1f.4 \-1f.5
And looking at 1c.4
gets us:
user@user-Z590-D:~$ sudo lspci -vvv -s 00:1c.4 00:1c.4 PCI bridge: Intel Corporation Tiger Lake-H PCI Express Root Port #5 (rev 11) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 124 Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 I/O behind bridge: 0000f000-00000fff [disabled] Memory behind bridge: 52000000-521fffff [size=2M] Prefetchable memory behind bridge: 0000000050000000-00000000507fffff [size=8M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #5, Speed 8GT/s, Width x4,ASPM L1, Exit Latency L1 <16us ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ LnkCtl:ASPM Disabled ; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x4 (ok) TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #8, PowerLimit 25.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet- LinkState- RootCap: CRSVisible- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+ AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, ARIFwd+ AtomicOpsCtl: ReqEn- EgressBlck- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee20004 Data: 0021 Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Tiger Lake-H PCI Express Root Port Capabilities: [a0] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 RootCmd: CERptEn+ NFERptEn+ FERptEn+ RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 Capabilities: [220 v1] Access Control Services ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Capabilities: [150 v1] Precision Time Measurement PTMCap: Requester:- Responder:+ Root:+ PTMClockGranularity: 4ns PTMControl: Enabled:+ RootSelected:+ PTMEffectiveGranularity: Unknown Capabilities: [200 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=40us PortTPowerOnTime=44us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=40us LTR1.2_Threshold=81920ns L1SubCtl2: T_PwrOn=44us Capabilities: [a30 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [a00 v1] Downstream Port Containment DpcCap: INT Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+ DpcCtl: Trigger:1 Cmpl- INT+ ErrCor- PoisonedTLP- SwTrigger- DL_ActiveErr- DpcSta: Trigger- Reason:00 INT- RPBusy- TriggerExt:00 RP PIO ErrPtr:1f Source: 0000 Kernel driver in use: pcieport
So the Mellanox ConnectX-3 has ASPM disabled and as a result of that the PCIe slot it is plugged into also reports that ASPM is disabled.
Before I go off on a big tangent here, let’s just keep going and try different components.
The SAS Expander is not actually a PCIe device. It merely uses the PCIe slot for power. Just for the sake of being thorough, it draws 9 watts when plugged in.
With the HBA back in the system, I’m seeing the exact same problem. Power consumption goes up more than it should, the system again can’t seem to go below C3 and lspci
this time presents me with the following information:
26W total system power draw without the powertop thingy, 25W with it. We are once again stuck in C3 with nowhere to go. For the record, this card reportedly only uses about 6W.
user@user-Z590-D:~$ sudo lspci -vvv -s 02:00.0 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) Subsystem: Broadcom / LSI 9210-8i Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at 4000 [size=256] Region 1: Memory at 517c0000 (64-bit, non-prefetchable) [size=16K] Region 3: Memory at 51380000 (64-bit, non-prefetchable) [size=256K] Expansion ROM at 51300000 [disabled] [size=512K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0,Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl:ASPM Disabled ; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s (ok), Width x4 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [c0] MSI-X: Enable+ Count=15 Masked- Vector table: BAR=1 offset=00002000 PBA: BAR=1 offset=00003800 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [138 v1] Power Budgeting <?> Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ IOVSta: Migration- Initial VFs: 16, Total VFs: 16, Number of VFs: 0, Function Dependency Link: 00 VF offset: 1, stride: 1, Device ID: 0072 Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 00000000517c4000 (64-bit, non-prefetchable) Region 2: Memory at 00000000513c0000 (64-bit, non-prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: mpt3sas Kernel modules: mpt3sas
There are plenty of smart people on the internet, perhaps someone else figured out something I can’t.
Seems like this card doesn’t support ASPM at all. Well that was a waste of my time.
Here’s where things get really interesting. The official LSI users guide for the 92xx series HBAs explicitly lists ASPM as a feature, but ASPM is set to disabled. Now why would that be? Turns out that back in 2013 someone filed a bug report stating that their system would constantly lock up during high read workloads. The solution for this turned out to be to simply disable ASPM. For any of the MPI v2.0 chipsets listed in mpi2_cnfg.h ASPM will explicitly be disabled.
So unless I am willing to take my chances and live with the possibility of my storage array doing god knows what during rebuilds I don’t think this is worth pursuing. However, this does not apply to MPI v2.5 and MPI v2.6 devices. While there is also a patch for newer devices running that same mpt2sas/mpt3sas driver, it appears to not have gone anywhere. A shimmer of hope.
Some investigative work appears to be in order. The devices say that they are capable of ASPM at least… First, let’s see if my system even correctly supports/implements ASPM:
root@user-Z590-D:/home/user# dmesg | grep ASPM [ 0.469814] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it [ 16.221745] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] [ 16.253813] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration
Now this is bizarre. One would assume that if they enable ASPM within the BIOS, the operating system would also be informed of that. My first thought after reading that was: “Can ASPM somehow forcibly be enabled?”
Apparently the answer to that questions is “yes, sort of, but your mileage may vary”.
There is a kernel parameter for it, see here.
With a quick edit to /boot/grub/grub.cfg
and the pcie_aspm=force
parameter now set, we reboot the system and are surprised to learn that… nothing has changed.
root@user-Z590-D:/home/user# dmesg | grep ASPM [ 0.118848] PCIe ASPM is forcibly enabled [ 0.421773] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it [ 11.801846] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3] [ 11.825805] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration
Might as well reverse it again.
In the process of figuring out what the hell was happening here I did also stumble across this wiki article and these two sections: “Enabling ASPM with enable_aspm” and “Enabling ASPM with setpci”. To test these methods I used the Mellanox card.
Interestingly enough the script didn’t work properly when I tried it. I don’t even quite get why.
root@user-Z590-D:/home/user/Downloads# ./aspm.sh Root complex: 00:1c.4 PCI bridge: Intel Corporation Tiger Lake-H PCI Express Root Port #5 (rev 11) 0x50 : 0x43 --> 0x41 ... [SUCCESS] Endpoint: (standard_in) 1: syntax error setpci: Unknown register "". Try `setpci --help' for more information. setpci: Unknown register "". Try `setpci --help' for more information. ./aspm.sh: line 174: printf: 0x: invalid hex number [... this repeats a dozen times ...] Long loop while looking for ASPM word for 05:00.0
Instead of trying to debug some ancient shell script I might as well just re-write the entire thing. Have a look: GitHub - 0x666690/ASPM/aspm.py
root@user-Z590-D:/home/user/Documents/GitHub/ASPM# python3 aspm.py 00:1c.4 PCI bridge: Intel Corporation Tiger Lake-H PCI Express Root Port #5 (rev 11) 0x34 points to 0x40 Value at 0x40 is 0x10 Found the byte at: 0x40 Adding 0x10 to the register... Final register reads: 0x40 Byte to patch: 0x50 Byte is set to 0x40 -> ASPM_DISABLED Value doesn't match the one we want, setting it! Byte is set to 0x43 -> ASPM_L1_AND_L0s 05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3] 0x34 points to 0x40 Value at 0x40 is 0x1 Value is not 0x10! Reading the next byte... 0x41 points to 0x48 Value at 0x48 is 0x3 Value is not 0x10! Reading the next byte... 0x49 points to 0x9c Value at 0x9c is 0x11 Value is not 0x10! Reading the next byte... 0x9d points to 0x60 Value at 0x60 is 0x10 Found the byte at: 0x60 Adding 0x10 to the register... Final register reads: 0x40 Byte to patch: 0x70 Byte is set to 0x40 -> ASPM_DISABLED Value doesn't match the one we want, setting it! Byte is set to 0x43 -> ASPM_L1_AND_L0s
If everything works as expected, two things should happen:
lspci
should report ASPM as being enabledpowertop
should show that we’re in C6/7/8 and power consumption should decreaseUnfortunately only the former is the case.
# Left out all the unimportant bits... root@user-Z590-D:/home/user# lspci -vvv -s 05:00.0 05:00.0 Ethernet controller: Mellanox Technologies MT27500 Family [ConnectX-3] Capabilities: [60] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0,Latency L0s <64ns, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 116.000W LnkCap: Port #8, Speed 8GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
We will revisit this later.
You know what, let’s go back to the beginning. The error message said something about FADT, but what does that even mean? As with most things in life, the internet has the answer.
This is where I have to openly admit that I don’t know a damn thing about ACPI tables. The only ones I am somewhat familiar with are the DSDT and the SSDT, but only because those were the ones giving 15-year-old me trouble getting my Hackintosh to go to sleep. Good times.
The OSDev Wiki doesn’t tell us where exactly the table reveals information about ASPM, so it’s time to read some documentation.
It appears that this is merely done through a set bit in IAPC_BOOT_ARCH. If the bit is set, the system signals to the OS that it does indeed not support proper power management. Now I wonder what that particular bit is set to in my OS…
Time to dump that table:
user@user-Z590-D:~/Desktop$ sudo apt-get install acpica-tools user@user-Z590-D:~/Desktop$ sudo acpidump -b -n FACP user@user-Z590-D:~/Desktop$ iasl -d facp.dat Intel ACPI Component Architecture ASL+ Optimizing Compiler/Disassembler version 20200925 Copyright (c) 2000 - 2020 Intel Corporation File appears to be binary: found 237 non-ASCII characters, disassembling Binary file appears to be a valid ACPI table, disassembling Input file facp.dat, Length 0x114 (276) bytes ACPI: FACP 0x0000000000000000 000114 (v06 ALASKA A M I 01072009 AMI 01000013) Acpi Data Table [FACP] decoded Formatted output: facp.dsl - 10157 bytes
Let’s have a look:
Legacy Devices Supported (V2) : 1 8042 Present on ports 60/64 (V2) : 0 VGA Not Present (V4) : 0 MSI Not Supported (V4) : 0PCIe ASPM Not Supported (V4) : 1 CMOS RTC Not Present (V5) : 0 [06Fh 0111 1] Reserved : 00
Sigh… there we have the answer. The thing is though, I’ve seen this board act the right way when nothing was plugged in. Can I just patch this and pretend that everything is okay?
Yes.
Now there are two ways of doing it:
by supplying patched ACPI tables to the OS, typically done through a patched initramfs. A good tutorial for that can be found here.
by patching the tables in memory before we even get to the bootloader.
Since I don’t plan on using this Ubuntu install until the end of time and would like to go back to TrueNAS Scale eventually, I think I’m gonna go with the second route. The simplest way of doing things would be to simply have a USB flash drive that is completely separate from the main boot drive for me to get into a UEFI shell, run a program that will patch the table, and then jump to the actual bootloader that will load TrueNAS.
I have little to no experience writing UEFI drivers, but luckily there’s already a tool out there for patching a different part of the ACPI tables for me to build upon.
Big big thank you to James Swineson for his work on the S0ixEnabler.
After compiling a binary and putting it on a USB stick, we can see that this indeed patches things up.
Shell> fs0:\ASPMEnabler.efi ASPMEnabler https://github.com/0x666690/ASPM A modified version of: https://github.com/Jamesits/S0ixEnabler Firmware American Megatrends Rev 327699 Table #1/14: Not RDSP Table #2/14: Not RDSP Table #3/14: Not RDSP Table #4/14: Not RDSP Table #5/14: Not RDSP Table #6/14: Not RDSP Table #7/14: RDSP Rev 0 @0x3B513000 | No XSDT Table #8/14: RDSP Rev 2 @0x3B513014 | XSDT OEM ID: ALASKA Tables: 26 ACPI table #1/26: FACP Rev 6 OEM ID: ALASKA Checking initial checksum... OK Patching FADT table... FADT::IaPcBootArch before: 0x11 FADT::IaPcBootArch after: 0x1 Checksum before: 0x2 Checksum after: 0x3F Re-check... OK FADT table patch finished ASPMEnabler done Shell> fs1:\EFI\ubuntu\grubx64.efi
No more complaints about the FADT!
root@user-Z590-D:/home/user# dmesg | grep ASPM [ 10.443179] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
Unfortunately this doesn’t magically fix the fact that both of the devices I have are generally incompatible with ASPM. Oh well.
You know what, I’m not taking any more chances. I’ll only buy something once at least one other person has vouched for it.
I shouldn’t have said that.
I’m not even gonna say much here and let the pictures do the talking.
Didn’t even take a proper screenshot. Just took a photo of my screen, put the card back in its box and shipped it back the day I got it.
A comment by unRAID forum member h0schi in this thread is what made me buy mine.
One thing to note though: these cards are vendor-locked (in theory). In order to use them with any of the SFP+ modules that aren’t whitelisted, you’ll need to patch it with this: GitHub - bibigon812/xl710-unlocker
Now, does this one have proper ASPM support?
Hell yes.
We’re down to about 22W without powertop --auto-tune
and 19W with it. Now those are finally some numbers that I’m happy with.
root@user-Z590-D:/home/user# lspci -vvv -s 02:00.0 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) Subsystem: Intel Corporation Ethernet Converged Network Adapter X710-2 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: Memory at 53800000 (64-bit, prefetchable) [size=8M] Region 3: Memory at 54c00000 (64-bit, prefetchable) [size=32K] Expansion ROM at 53180000 [disabled] [size=512K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] MSI-X: Enable+ Count=129 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00001000 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x4,ASPM L1, Exit Latency L1 <16us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl:ASPM L1 Enabled ; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x4 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [e0] Vital Product Data Product Name: XL710 40GbE Controller Read-only fields: [PN] Part number: [EC] Engineering changes: [FG] Unknown: [LC] Unknown: [MN] Manufacture ID: [PG] Unknown: [SN] Serial number: [V0] Vendor specific: [RV] Reserved: checksum good, 0 byte(s) reserved Read/write fields: [V1] Vendor specific: End Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [140 v1] Device Serial Number cc-7d-ad-ff-ff-fe-fd-3c Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 1 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) IOVCap: Migration-, Interrupt Message Number: 000 IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+ IOVSta: Migration- Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00 VF offset: 16, stride: 1, Device ID: 154c Supported Page Size: 00000553, System Page Size: 00000001 Region 0: Memory at 0000000053400000 (64-bit, prefetchable) Region 3: Memory at 0000000054c10000 (64-bit, prefetchable) VF Migration: offset: 00000000, BIR: 0 Capabilities: [1a0 v1] Transaction Processing Hints Device specific mode supported No steering table available Capabilities: [1b0 v1] Access Control Services ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans- Capabilities: [1d0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Kernel driver in use: i40e Kernel modules: i40e
With MPI v2.0 devices off the table and the strong desire to get rid of my SAS expander because it’s nothing but a power hog, I ended up settling for the next best thing that can drive all of my drive bays at once, a 9305-24i. Took three weeks to get here and arrived with the slot cover bent out of shape. Oh well, whatever.
mpt3sas_cm0: LSISAS3224: FWVersion(16.00.12.00), ChipRevision(0x01), BiosVersion(18.00.03.00)
And what does lspci
say?
user@user-Z590-D:~$ sudo lspci -vvv -s 02:00.0 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3224 PCI-Express Fusion-MPT SAS-3 (rev 01) Subsystem: Broadcom / LSI SAS9305-24i Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at 4000 [size=256] Region 1: Memory at 53100000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at 53000000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8,ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl:ASPM Disabled ; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x4 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [c0] MSI-X: Enable+ Count=96 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [1e0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [1c0 v1] Power Budgeting <?> Capabilities: [190 v1] Dynamic Power Allocation <?> Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: mpt3sas Kernel modules: mpt3sas
ASPM not supported? Oh you have got to be kidding me… That would really suck. Perhaps the card isn’t running the most recent firmware? Reflashing it certainly can’t hurt. There’s plenty of tutorials out there on how to update these cards. Here’s what we need:
Here’s what my USB Stick looks like:
EFI BOOTX64.efi (a copy of OpenShell.efi with a different name) sasflash3.efi mpt3x64.rom (the UEFI BSD & HII Configuration Utility, from the 'Signed' folder) SAS9305_24i_IT_P.bin (the firmware itself)
Time to boot to the USB…
> fs0: > FS0:\> sas3flash.efi -listall Avago Technologies SAS3 Flash Utility Version 15.00.00.00 (2016.11.17) Copyright 2008-2016 Avago Technologies. All rights reserved. SAS3FLASH: Disconnecting the EFI Driver. Adapter Selected is a Avago SAS: SAS3224(A1) Num Ctrl FW Ver NVDATA x86-BIOS PCI Addr --------------------------------------------------------------------- 0 SAS3224(A1) 16.00.12.00 10.00.00.03 08.37.02.00 00:02:00:00 Finished Processing Commands Successfully. Exiting SAS3Flash. SAS3FLASH: Reconnecting the EFI Driver. Please wait...
> FS0:\> sas3flash.efi -list Avago Technologies SAS3 Flash Utility Version 15.00.00.00 (2016.11.17) Copyright 2008-2016 Avago Technologies. All rights reserved. SAS3FLASH: Disconnecting the EFI Driver. Adapter Selected is a Avago SAS: SAS3224(A1) Controller Number: 0 Controller: SAS3224(A1) PCI Address: 00:02:00:00 SAS Address: 500062B-2-0299-1693 NVDATA Version (Default): 10.00.00.03 NVDATA Version (Persistent): 10.00.00.03 Firmware Product ID: 0x2228 (IT) Firmware Version: 16.00.12.00 NVDATA Vendor: LSI NVDATA Product ID: SAS9305-24i BIOS Version: 08.37.02.00 UEFI BSD Version: 18.00.03.00 FCODE Version: N/A Board Name: SAS9305-24i Board Assembly: 03-25699-02004 Board Tracer Number: XW84190440 Finished Processing Commands Successfully. Exiting SAS3Flash. SAS3FLASH: Reconnecting the EFI Driver. Please wait...
> FS0:\> sas3flash.efi -o -f SAS9305_24i_IT_P.bin -b mpt3x64.rom Avago Technologies SAS3 Flash Utility Version 15.00.00.00 (2016.11.17) Copyright 2008-2016 Avago Technologies. All rights reserved. SAS3FLASH: Disconnecting the EFI Driver. Advanced Mode Set Adapter Selected is a Avago SAS: SAS3224(A1) Executing Operation: Flash Firmware Image Firmware Image has a Valid Checksum. Firmware Version 16.00.12.00 Firmware Image compatible with Controller. Valid NVDATA Image found. NVDATA Major Version 10.00 Checking for a compatible NVData image... NVDATA Device ID and Chip Revision match verified. NVDATA Versions Compatible. Valid Initialization Image verified. Valid BootLoader Image verified. Beginning Firmware Download... Firmware Download Successful. Verifiying Download... Firmware Flash Successful. Resetting Adapter... Adapter Successfully Reset. NVDATA Version 10.00.00.03 Executing Operation: Flash BIOS Image Validating BIOS Image... BIOS Header Signature is Valid BIOS Image has a Valid Checksum. BIOS PCI Structure Signature Valid. BIOS Image Compatible with the SAS Controller. Attempting to Flash BIOS Image... Verifying Download... Flash BIOS Image Successful. Finished Processing Commands Successfully. Exiting SAS3Flash. SAS3FLASH: Reconnecting the EFI Driver. Please wait...
And back to Ubuntu:
> FS0:\> fs1:\EFI\ubuntu\grubx64.efi
What does lspci
say now?
LnkCap: Port #0, Speed 8GT/s, Width x8,ASPM not supported LnkCtl:ASPM Disabled ; RCB 64 bytes, Disabled- CommClk+
I can’t believe that this card outright supposed doesn’t support ASPM. By the way, it’s not just specific to my particular card. I should know, I bought a second one for troubleshooting.
Also, what is going on with LnkCap?
I found this message on the OmniOS mailing list: [OmniOS-discuss] SAS 9305-16e HBA support in Illumos
This person’s HBA is very very similar to mine and their LnkCap shows ASPM not supported, Exit Latency L0s <2us, L1 <4us
. That… doesn’t make a lot of sense.
Nvidia’s official documentation shows a Mellanox adapter with ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
and honestly that might even make sense. An unlimited latency for waking up out of a power saving state implies that the device simply won’t wake up once sent into that state, so it shouldn’t be used. But that HBA is just very confusing.
You know what, let’s just ignore whatever the device says and see what happens.
With aspm.py
I can indeed get it into L1!
root@user-Z590-D:~$ python3 aspm.py 00:1b.4 PCI bridge: Intel Corporation Device 43c4 (rev 11) 0x34 points to 0x40 Value at 0x40 is 0x10 Found the byte at: 0x40 Adding 0x10 to the register... Final register reads: 0x40 Byte to patch: 0x50 Byte is set to 0x40 -> ASPM_DISABLED Value doesn't match the one we want, setting it! Byte is set to 0x43 ->ASPM_L1_AND_L0s 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3224 PCI-Express Fusion-MPT SAS-3 (rev 01) 0x34 points to 0x50 Value at 0x50 is 0x1 Value is not 0x10! Reading the next byte... 0x51 points to 0x68 Value at 0x68 is 0x10 Found the byte at: 0x68 Adding 0x10 to the register... Final register reads: 0x40 Byte to patch: 0x78 Byte is set to 0x40 -> ASPM_DISABLED Value doesn't match the one we want, setting it! Byte is set to 0x43 ->ASPM_L1_AND_L0s root@user-Z590-D:~$ lspci -vvv -s 02:00.0 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3224 PCI-Express Fusion-MPT SAS-3 (rev 01) Subsystem: Broadcom / LSI SAS9305-24i Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at 5000 [size=256] Region 1: Memory at 53100000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at 53000000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8,ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl:ASPM L0s L1 Enabled ; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x4 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [c0] MSI-X: Enable+ Count=96 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [1e0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [1c0 v1] Power Budgeting <?> Capabilities: [190 v1] Dynamic Power Allocation <?> Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: mpt3sas Kernel modules: mpt3sas
We run powertop --auto-tune
again and what does the power meter say?
30W with the CPU chiling at C8! If I plug in the 10G X710-DA2 as well we end up at 34W. Those are some amazing numbers! :)
This is where I want to put everything I’ve learned together. No more Ubuntu, it’s back to TrueNAS now. Alright, I prepped my USB drive. It looks like this:
EFI BOOT BOOTX64.efi (a copy of OpenShell) startup.nsh ASPMEnabler.efi
The startup.nsh
file looks like this:
echo -off echo Starting ASPMEnabler... fs0:\ASPMEnabler.efi echo Booting into GRUB... fs1:\EFI\debian\grubx64.efi
Booting from said USB drive gets us this:
Press ESC in 1 seconds to skip startup.nsh or any other key to continue... Shell> echo -off Starting ASPMEnabler... ASPMEnabler https://github.com/0x666690/ASPM A modified version of: https://github.com/Jamesits/S0ixEnabler Firmware American Megatrends Rev 327699 Table #1/14: Not RDSP Table #2/14: Not RDSP Table #3/14: Not RDSP Table #4/14: Not RDSP Table #5/14: Not RDSP Table #6/14: Not RDSP Table #7/14: RDSP Rev 0 @0x3B513000 | No XSDT Table #8/14: RDSP Rev 2 @0x3B513014 | XSDT OEM ID: ALASKA Tables: 26 ACPI table #1/26: FACP Rev 6 OEM ID: ALASKA Checking initial checksum... OK Patching FADT table... FADT::IaPcBootArch before: 0x11 FADT::IaPcBootArch after: 0x1 Checksum before: 0x2 Checksum after: 0x3F Re-check... OK FADT table patch finished ASPMEnabler done Booting into GRUB... Welcome to GRUB!
root@truenas[/home/admin]# dmesg | grep ASPM [ 1.358832] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
It works! :)
All of this is happening on a fresh TrueNAS install on my Samsung PM981 NVMe SSD. I did also have my 480GB ADATA SSD plugged in. Turns out that when you run /sbin/powertop --auto-tune
with it plugged in, a small message on the TrueNAS console flashes by, telling us that things are not going well:
[ 131.844759] ahci 0000:00:17.0: port does not support device sleep
The CPU package then permanently gets itself stuck in C2, when previously it would even go down to C3. Let’s get to the bottom of this message, what’s 00:17.0
?
root@truenas[/home/admin]# lspci -s 00:17.0 -vvv 00:17.0 SATA controller: Intel Corporation Device 43d2 (rev 11) (prog-if 01 [AHCI 1.0])DeviceName: Onboard - SATA Subsystem: Gigabyte Technology Co., Ltd Device b005 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 127 Region 0: Memory at 53514000 (32-bit, non-prefetchable) [size=8K] Region 1: Memory at 53518000 (32-bit, non-prefetchable) [size=256] Region 2: I/O ports at 6090 [size=8] Region 3: I/O ports at 6080 [size=4] Region 4: I/O ports at 6060 [size=32] Region 5: Memory at 53517000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee08004 Data: 0022 Capabilities: [70] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004 Kernel driver in use: ahci Kernel modules: ahci
ls -al /sys/block/sd* lrwxrwxrwx 1 root root 0 Oct 29 01:14 /sys/block/sda -> ../devices/pci0000:00/0000:00:1b.4/0000:02:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/block/sda lrwxrwxrwx 1 root root 0 Oct 29 01:14 /sys/block/sdb -> ../devices/pci0000:00/0000:00:1b.4/0000:02:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sdb lrwxrwxrwx 1 root root 0 Oct 29 01:14 /sys/block/sdc -> ../devices/pci0000:00/0000:00:1b.4/0000:02:00.0/host0/port-0:2/end_device-0:2/target0:0:2/0:0:2:0/block/sdc lrwxrwxrwx 1 root root 0 Oct 29 01:14 /sys/block/sdd -> ../devices/pci0000:00/0000:00:1b.4/0000:02:00.0/host0/port-0:3/end_device-0:3/target0:0:3/0:0:3:0/block/sdd lrwxrwxrwx 1 root root 0 Oct 29 01:14 /sys/block/sde -> ../devices/pci0000:00/0000:00:1b.4/0000:02:00.0/host0/port-0:4/end_device-0:4/target0:0:4/0:0:4:0/block/sde lrwxrwxrwx 1 root root 0 Oct 29 01:14 /sys/block/sdf -> ../devices/pci0000:00/0000:00:1b.4/0000:02:00.0/host0/port-0:5/end_device-0:5/target0:0:5/0:0:5:0/block/sdflrwxrwxrwx 1 root root 0 Oct 29 01:14 /sys/block/sdg -> ../devices/pci0000:00/0000:00:17.0/ata6/host6/target6:0:0/6:0:0:0/block/sdg lrwxrwxrwx 1 root root 0 Oct 29 01:14 /sys/block/sdh -> ../devices/pci0000:00/0000:00:14.0/usb1/1-8/1-8:1.0/host8/target8:0:0/8:0:0:0/block/sdh
Ah, so the SSD is at fault. When powertop says device sleep
, what it is referring to is DEVSLP
. Funnily enough, the datasheet for this particular SSD explicitly lists it as a feature.
For once we can’t use lspci
, but rather hdparm
.
root@truenas[/home/admin]# /sbin/hdparm -I /dev/sdg /dev/sdg: ATA device, with non-removable media Model Number: ADATA SP550 Serial Number: 2G2220072146 Firmware Revision: P0330AA Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Supported: 9 8 7 6 5 Likely used: 9 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 937703088 Logical Sector size: 512 bytes Physical Sector size: 4096 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 457862 MBytes device size with M = 1000*1000: 480103 MBytes (480 GB) cache/buffer size = unknown Nominal Media Rotation Rate: Solid State Device Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 2 Current = 1 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE SET_MAX security extension * 48-bit Address feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Host-initiated interface power management * Phy event counters * READ_LOG_DMA_EXT equivalent to READ_LOG_EXT * DMA Setup Auto-Activate optimization Device-initiated interface power management * Software settings preservationDevice Sleep (DEVSLP) * SMART Command Transport (SCT) feature set * SCT Write Same (AC2) * SCT Features Control (AC4) * SCT Data Tables (AC5) * SANITIZE feature set * BLOCK_ERASE_EXT command * DOWNLOAD MICROCODE DMA command * WRITE BUFFER DMA command * READ BUFFER DMA command * Data Set Management TRIM supported (limit 8 blocks) * Deterministic read ZEROs after TRIM Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count supported: enhanced erase 2min for SECURITY ERASE UNIT. 2min for ENHANCED SECURITY ERASE UNIT. Device Sleep: DEVSLP Exit Timeout (DETO): 220 ms (drive) Minimum DEVSLP Assertion Time (MDAT): 31 ms (drive) Checksum: correct
So it is supported, but not enabled and I don’t see a way to forcibly enable it, at least within the OS. It needs to be done from within the BIOS.
Settings → IO Ports → SATA And RST Configuration: SATA 3 ADATA SP550 (480.1GB) Software Preserve SUPPORTED Port 3 EnabledSATA Port 3 DevSlp Disabled → Enabled Hot Plug Disabled
With this setting set, powertop will not throw that particular error anymore and the CPU no longer gets stuck at C2.
Now that that is out of the way, let’s get to the real star of the show, the 9305-24i.
root@truenas[/home/admin]# python3 aspm.py 00:1b.4 PCI bridge: Intel Corporation Device 43c4 (rev 11) 0x34 points to 0x40 Value at 0x40 is 0x10 Found the byte at: 0x40 Adding 0x10 to the register... Final register reads: 0x40 Byte to patch: 0x50 Byte is set to 0x40 -> ASPM_DISABLED Value doesn't match the one we want, setting it! Byte is set to 0x42 ->ASPM_L1_ONLY 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3224 PCI-Express Fusion-MPT SAS-3 (rev 01) 0x34 points to 0x50 Value at 0x50 is 0x1 Value is not 0x10! Reading the next byte... 0x51 points to 0x68 Value at 0x68 is 0x10 Found the byte at: 0x68 Adding 0x10 to the register... Final register reads: 0x40 Byte to patch: 0x78 Byte is set to 0x40 -> ASPM_DISABLED Value doesn't match the one we want, setting it! Byte is set to 0x42 ->ASPM_L1_ONLY root@truenas[/home/admin]# lspci -vvv -s 02:00.0 02:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS3224 PCI-Express Fusion-MPT SAS-3 (rev 01) Subsystem: Broadcom / LSI SAS9305-24i Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at 5000 [size=256] Region 1: Memory at 53100000 (64-bit, non-prefetchable) [size=64K] Expansion ROM at 53000000 [disabled] [size=1M] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 8GT/s, Width x8,ASPM not supported ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl:ASPM L1 Enabled ; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x4 (downgraded) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range BC, TimeoutDis+ NROPrPrP- LTR- 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [c0] MSI-X: Enable+ Count=96 Masked- Vector table: BAR=1 offset=0000e000 PBA: BAR=1 offset=0000f000 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [1e0 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [1c0 v1] Power Budgeting <?> Capabilities: [190 v1] Dynamic Power Allocation <?> Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Kernel driver in use: mpt3sas Kernel modules: mpt3sas
Moment of truth, what does powertop --auto-tune
do?
Success!
68W with all disks spinning.
If we now enable HDD standby (without spindown, mind you), this goes down to 55W. Seeing how at the start of this whole ordeal the server was pulling 58W with the disks spun down and not just the heads parked, I am pretty damn happy. With that said, I wouldn’t personally let the drives park their heads this often.
From this point onwards, things did not go smoothly. Instead of presenting you with pages upon pages of dmesg
output I will try to keep things short.
For the purposes of stress testing I just ran ZFS scrubs the entire time.
Some notes:
L0s is not it. Setting it right after boot-up will cause the system to hang and reboot. If set without pci=nommconf
(see below), the system will spam PCIe bus errors. Also, it will not drop into anything below C3.
L1 works, I think, throws the same PCIe bus errors, albeit much more infrequently.
[ 844.281175] pcieport 0000:00:1b.4: AER: Corrected error received: 0000:02:00.0 [ 844.281206] mpt3sas 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [ 844.281210] mpt3sas 0000:02:00.0: device [1000:00c4] error status/mask=00001000/00002000 [ 844.281213] mpt3sas 0000:02:00.0: [12] Timeout [ 849.878827] pcieport 0000:00:1b.4: AER: Corrected error received: 0000:02:00.0 [ 849.878852] mpt3sas 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [ 849.878857] mpt3sas 0000:02:00.0: device [1000:00c4] error status/mask=00001000/00002000 [ 849.878862] mpt3sas 0000:02:00.0: [12] Timeout
With that said, adding pci=nommconf
to the boot arguments in GRUB appears to have made these error messages disappear. I’ll be the first one to admit that I don’t exactly understand the consequences of setting it, but from reading this and this I feel significantly better about pci=nommconf
than I do about pci=noaer
, which was also a suggestion I found online, but that just hides all the errors lol
We can add it to the TrueNAS boot arguments like this:
midclt call system.advanced.update '{"kernel_extra_options": "pci=nommconf"}'
The mpt3sas
driver doesn’t exactly provide a lot of information in its default configuration. Luckily there’s a parameter for it. If you wanna know about all the other parameters, just run /sbin/modinfo mpt3sas
. For descriptions as to what each parameter does it might be worth looking at the official user documentation for the driver.
A reasonable option for initial testing would be
echo 0x3f8 > /sys/module/mpt3sas/parameters/logging_level
All of the pre-defined values for the logging_level
can be found at the top of mptdebug.h. The value 0x3f8
is what one of the Broadcom engineers in that mpt2sas bug thread wanted output from, so I figured it would be a good choice for me as well.
It essentially turns this:
[58307.970617] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970660] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970684] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970741] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970889] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e)
into this:
[58307.796312] mpt3sas_cm0: Device Status Change [58307.796337] mpt3sas_cm0: Enable tm_busy flag for handle(0x001c) [58307.796422] mpt3sas_cm0: device status change: (internal device reset) handle(0x001c), sas address(0x300062b202991698), tag(65535) [58307.803237] mpt3sas_cm0: Device Status Change [58307.803261] mpt3sas_cm0: Enable tm_busy flag for handle(0x001c) [58307.803344] mpt3sas_cm0: device status change: (internal device reset) handle(0x001c), sas address(0x300062b202991698), tag(65535) [58307.970553] mpt3sas_cm0: Device Status Change [58307.970555] sd 0:0:3:0: [sdd] tag#1889 CDB: Read(16) 88 00 00 00 00 02 a6 d4 f7 98 00 00 00 40 00 00 [58307.970555] sd 0:0:3:0: [sdd] tag#1885 CDB: Read(16) 88 00 00 00 00 02 a6 d4 e7 98 00 00 04 98 00 00 [58307.970559] sd 0:0:3:0: [sdd] tag#1888 CDB: Read(16) 88 00 00 00 00 02 a6 d4 f4 08 00 00 03 90 00 00 [58307.970566] mpt3sas_cm0: sas_address(0x300062b202991698), phy(5) [58307.970567] mpt3sas_cm0: sas_address(0x300062b202991698), phy(5) [58307.970568] mpt3sas_cm0: sas_address(0x300062b202991698), phy(5) [58307.970575] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(6) [58307.970576] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(6) [58307.970577] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(6) [58307.970581] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [58307.970584] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [58307.970587] mpt3sas_cm0: handle(0x001c), ioc_status(unknown)(0x0003), smid(1886) [58307.970589] mpt3sas_cm0: Disable tm_busy flag for handle(0x001c) [58307.970593] mpt3sas_cm0: request_len(602112), underflow(602112), resid(73720) [58307.970599] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [58307.970599] mpt3sas_cm0: tag(2), transfer_count(528392), sc->result(0x000b0000) [58307.970605] mpt3sas_cm0: Device Status Change [58307.970605] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [58307.970609] mpt3sas_cm0: Disable tm_busy flag for handle(0x001c) [58307.970614] mpt3sas_cm0: handle(0x001c), ioc_status(scsi ioc terminated)(0x004b), smid(1889)[58307.970617] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970621] mpt3sas_cm0: request_len(466944), underflow(466944), resid(466944) [58307.970625] mpt3sas_cm0: handle(0x001c), ioc_status(scsi ioc terminated)(0x004b), smid(1890) [58307.970628] mpt3sas_cm0: tag(65535), transfer_count(0), sc->result(0x000b0000) [58307.970635] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [58307.970640] sd 0:0:3:0: [sdd] tag#1886 CDB: Read(16) 88 00 00 00 00 02 a6 d4 ec 30 00 00 03 68 00 00 [58307.970645] mpt3sas_cm0: sas_address(0x300062b202991698), phy(5) [58307.970646] mpt3sas_cm0: request_len(32768), underflow(32768), resid(32768) [58307.970652] mpt3sas_cm0: tag(65535), transfer_count(0), sc->result(0x000b0000)[58307.970660] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970663] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(6) [58307.970667] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [58307.970671] mpt3sas_cm0: handle(0x001c), ioc_status(scsi ioc terminated)(0x004b), smid(1887) [58307.970673] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [58307.970684] sd 0:0:3:0: [sdd] tag#1887 CDB: Read(16) 88 00 00 00 00 02 a6 d4 ef 98 00 00 04 70 00 00[58307.970684] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970689] mpt3sas_cm0: sas_address(0x300062b202991698), phy(5) [58307.970692] mpt3sas_cm0: request_len(446464), underflow(446464), resid(446464) [58307.970694] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(6) [58307.970697] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [58307.970702] mpt3sas_cm0: handle(0x001c), ioc_status(scsi ioc terminated)(0x004b), smid(1888) [58307.970708] mpt3sas_cm0: tag(3), transfer_count(0), sc->result(0x000b0000) [58307.970717] mpt3sas_cm0: request_len(581632), underflow(581632), resid(581632) [58307.970723] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [58307.970731] mpt3sas_cm0: tag(65535), transfer_count(0), sc->result(0x000b0000)[58307.970741] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970747] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c)[58307.970889] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [58307.970969] mpt3sas_cm0: device status change: (internal device reset complete) handle(0x001c), sas address(0x300062b202991698), tag(65535) [58307.971006] mpt3sas_cm0: device status change: (internal device reset complete) handle(0x001c), sas address(0x300062b202991698), tag(65535)
I did later on increase the verbosity some more and settled on 0x23f8
.
Side note: no matter how you combine these values, don’t enable MPT_DEBUG_SCSI
. It prints every read or write command to the console…
Another thing about parameters: do not turn on mpt3sas_fwfault_debug
unless you have the hardware to pull data from the UART on the HBA. If the device ever gets to mpt3sas_base_hard_reset_handler
it will enter mpt3sas_halt_firmware
, give you a stack trace and then leave you wondering why your system panicked. Not to mention that this stack trace doesn’t exactly contain a lot of information to begin with:
[ 3179.805528] mpt3sas_cm0: mpt3sas_base_hard_reset_handler: enter [ 3179.806662] CPU: 5 PID: 19596 Comm: kworker/u16:0 Tainted: P OE 6.1.55-debug+truenas #2 [ 3179.807792] Hardware name: Gigabyte Technology Co., Ltd. Z590 D/Z590 D, BIOS F7a 01/24/2022 [ 3179.808933] Workqueue: poll_mpt3sas0_statu _base_fault_reset_work [mpt3sas] [ 3179.810093] Call Trace: [ 3179.811225] <TASK> [ 3179.812357] dump_stack_lvl+0x44/0x5c [ 3179.813507] mpt3sas_halt_firmware.part.0+0xf/0xb3 [mpt3sas] [ 3179.814658] mpt3sas_base_hard_reset_handler.cold+0x221/0x221 [mpt3sas] [ 3179.815849] _base_fault_reset_work+0x292/0x2a0 [mpt3sas] [ 3179.817008] process_one_work+0x1c4/0x380 [ 3179.818159] worker_thread+0x4d/0x380 [ 3179.819304] ? _raw_spin_lock_irqsave+0x23/0x50 [ 3179.820457] ? rescuer_thread+0x3a0/0x3a0 [ 3179.821620] kthread+0xe6/0x110 [ 3179.822763] ? kthread_complete_and_exit+0x20/0x20 [ 3179.823911] ret_from_fork+0x1f/0x30 [ 3179.825062] </TASK> [ 3179.826252] mpt3sas_cm0 fault info from func: mpt3sas_halt_firmware
powertop --auto-tune
is strongly discouraged.Using it will automatically enable all of the tunables, including those not needed to get past C2/C3 which will just give you headaches when debugging. It will also mess with the power management for each of your drives, which leads to faults which leads to the server essentially hanging itself.
Here are all the tunables I ended up setting by hand:
# Runtime PM for PCI Device Intel Corporation Ethernet Controller X710 for 10GbE SFP+
echo 'auto' > '/sys/bus/pci/devices/0000:05:00.0/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:05:00.1/power/control';
# Runtime PM for PCI Device Intel Corporation 10th Gen Core Processor Host Bridge/DRAM Registers
echo 'auto' > '/sys/bus/pci/devices/0000:00:00.0/power/control';
# Runtime PM for PCI Device Intel Corporation Tiger Lake-H SPI Controller
echo 'auto' > '/sys/bus/pci/devices/0000:00:1f.5/power/control';
# Runtime PM for PCI Device Broadcom / LSI SAS3224 PCI-Express Fusion-MPT SAS-3
#echo 'auto' > '/sys/bus/pci/devices/0000:02:00.0/power/control';
# Runtime PM for PCI Device Intel Corporation Tiger Lake-H PCI Express Root Port #9
echo 'auto' > '/sys/bus/pci/devices/0000:00:1d.0/power/control';
# Runtime PM for PCI Device Intel Corporation Tiger Lake-H Shared SRAM
echo 'auto' > '/sys/bus/pci/devices/0000:00:14.2/power/control';
# Runtime PM for I2C Adapter i2c-1 (i915 gmbus dpa)
echo 'auto' > '/sys/bus/i2c/devices/i2c-1/device/power/control';
# Autosuspend for USB device ITE Device [ITE Tech. Inc.]
echo 'auto' > '/sys/bus/usb/devices/1-13/power/control';
# Autosuspend for USB device USB DISK 2.0 [ ]
echo 'auto' > '/sys/bus/usb/devices/1-3/power/control';
# Runtime PM for PCI Device Intel Corporation Device 4385
echo 'auto' > '/sys/bus/pci/devices/0000:00:1f.0/power/control';
# Runtime PM for PCI Device of the built-in SATA controller
echo 'auto' > '/sys/bus/pci/devices/0000:00:17.0/power/control';
# Runtime PM for the built-in SATA controller
echo 'auto' > '/sys/bus/pci/devices/0000:00:17.0/ata1/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:17.0/ata2/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:17.0/ata3/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:17.0/ata4/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:17.0/ata5/power/control';
echo 'auto' > '/sys/bus/pci/devices/0000:00:17.0/ata6/power/control';
# NMI watchdog should be turned off
echo '0' > '/proc/sys/kernel/nmi_watchdog';
# VM writeback timeout
echo '1500' > '/proc/sys/vm/dirty_writeback_centisecs';
# Enable SATA link power management for the built-in sata controller
echo 'med_power_with_dipm' > '/sys/class/scsi_host/host1/link_power_management_policy';
echo 'med_power_with_dipm' > '/sys/class/scsi_host/host2/link_power_management_policy';
echo 'med_power_with_dipm' > '/sys/class/scsi_host/host3/link_power_management_policy';
echo 'med_power_with_dipm' > '/sys/class/scsi_host/host4/link_power_management_policy';
echo 'med_power_with_dipm' > '/sys/class/scsi_host/host5/link_power_management_policy';
echo 'med_power_with_dipm' > '/sys/class/scsi_host/host6/link_power_management_policy';
Do not enable APM. Explicitly disable any kind of power management for the drives.
The system will regularly query information about the state of the drives. In dmesg
that looks like this:
[32461.263672] sd 0:0:4:0: [sde] tag#5697 CDB: ATA command pass through(16) 85 06 2c 00 da 00 00 00 00 00 4f 00 c2 00 b0 00 [32461.264991] mpt3sas_cm0: sas_address(0x300062b202991699), phy(6) [32461.266196] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(4) [32461.267398] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [32461.268593] mpt3sas_cm0: handle(0x001d), ioc_status(success)(0x0000), smid(5698) [32461.269796] mpt3sas_cm0: request_len(0), underflow(0), resid(0) [32461.270994] mpt3sas_cm0: tag(0), transfer_count(0), sc->result(0x00000002) [32461.272192] mpt3sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) [32461.273400] mpt3sas_cm0: [sense_key,asc,ascq]: [0x01,0x00,0x1d], count(22)
Those messages can safely be ignored.
It appears that setting mpt3sas.msix_disable=1
helps with the system’s stability. I don’t exactly understand why, but in terms of functionality we’re not losing anything so it’s whatever.
With all the correct tunables set, the system will go down as far as C10. I ended up disabling C10 because it lead to fault states that the system was unable to recover itself from. Settting the lowest C-state to C6/7 did not change anything.
I’ve run into multiple fault states, namely fault_state(0x2623)
and fault_state(0x5854)
, easily recognized by the red text in dmesg
. Both of these lead to the controller resetting itself. It can take up to ten seconds for the storage pool it is attached to to return into an operational state. My ZFS array appeared to simply not care about any of that happening. No errors, no warnings, nada.
Messages about devices being reset and devices changing state are relatively common. These do not appear to mess with the rest of the system itself.
A device reset looks a little like this:
[ 2127.945563] mpt3sas_cm0: Device Status Change [ 2127.947874] mpt3sas_cm0: Enable tm_busy flag for handle(0x0019) [ 2127.949020] mpt3sas_cm0: device status change: (internal device reset) handle(0x0019), sas address(0x300062b202991695), tag(65535) [ 2127.951658] mpt3sas_cm0: Device Status Change [ 2127.952804] mpt3sas_cm0: Enable tm_busy flag for handle(0x0019) [ 2127.953938] mpt3sas_cm0: device status change: (internal device reset) handle(0x0019), sas address(0x300062b202991695), tag(65535) [ 2128.109209] sd 0:0:0:0: [sda] tag#2303 CDB: Read(16) 88 00 00 00 00 02 de 5b e0 88 00 00 02 08 00 00 [ 2128.112871] mpt3sas_cm0: sas_address(0x300062b202991695), phy(2) [ 2128.116758] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(0) [ 2128.120787] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 2128.124813] mpt3sas_cm0: handle(0x0019), ioc_status(scsi ioc terminated)(0x004b), smid(2304) [ 2128.128846] mpt3sas_cm0: request_len(266240), underflow(266240), resid(266240) [ 2128.132875] mpt3sas_cm0: tag(1), transfer_count(0), sc->result(0x000b0000) [ 2128.136884] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [ 2128.140891] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [ 2128.144736] sd 0:0:0:0: [sda] tag#2240 CDB: Read(16) 88 00 00 00 00 02 de 5b e2 90 00 00 08 00 00 00 [ 2128.147508] mpt3sas_cm0: sas_address(0x300062b202991695), phy(2) [ 2128.150063] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(0) [ 2128.152161] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 2128.154248] mpt3sas_cm0: handle(0x0019), ioc_status(scsi ioc terminated)(0x004b), smid(2241) [ 2128.155981] mpt3sas_cm0: request_len(1048576), underflow(1048576), resid(1048576) [ 2128.157661] mpt3sas_cm0: tag(2), transfer_count(0), sc->result(0x000b0000) [ 2128.159342] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [ 2128.160760] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [ 2128.162180] sd 0:0:0:0: [sda] tag#2296 CDB: Read(16) 88 00 00 00 00 02 de 5b da 90 00 00 05 f8 00 00 [ 2128.163596] mpt3sas_cm0: sas_address(0x300062b202991695), phy(2) [ 2128.164936] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(0) [ 2128.166163] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 2128.167378] mpt3sas_cm0: handle(0x0019), ioc_status(unknown)(0x0003), smid(2297) [ 2128.168587] mpt3sas_cm0: request_len(782336), underflow(782336), resid(99320) [ 2128.169784] mpt3sas_cm0: tag(0), transfer_count(683016), sc->result(0x000b0000) [ 2128.170925] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [ 2128.172079] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [ 2128.173223] sd 0:0:0:0: [sda] tag#2246 CDB: Read(16) 88 00 00 00 00 02 de 5b ee 90 00 00 04 00 00 00 [ 2128.174367] mpt3sas_cm0: sas_address(0x300062b202991695), phy(2) [ 2128.175514] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(0) [ 2128.176666] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 2128.177811] mpt3sas_cm0: handle(0x0019), ioc_status(scsi ioc terminated)(0x004b), smid(2247) [ 2128.178956] mpt3sas_cm0: request_len(524288), underflow(524288), resid(524288) [ 2128.180104] mpt3sas_cm0: tag(65535), transfer_count(0), sc->result(0x000b0000) [ 2128.181257] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [ 2128.182403] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [ 2128.183549] sd 0:0:0:0: [sda] tag#2245 CDB: Read(16) 88 00 00 00 00 02 de 5b ea 90 00 00 04 00 00 00 [ 2128.184703] mpt3sas_cm0: sas_address(0x300062b202991695), phy(2) [ 2128.185853] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(0) [ 2128.186994] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 2128.188138] mpt3sas_cm0: handle(0x0019), ioc_status(scsi ioc terminated)(0x004b), smid(2246) [ 2128.189283] mpt3sas_cm0: request_len(524288), underflow(524288), resid(524288) [ 2128.190427] mpt3sas_cm0: tag(65535), transfer_count(0), sc->result(0x000b0000) [ 2128.191571] mpt3sas_cm0: scsi_status(good)(0x00), scsi_state(state terminated no status )(0x0c) [ 2128.192727] mpt3sas_cm0: log_info(0x3112043e): originator(PL), code(0x12), sub_code(0x043e) [ 2128.193878] mpt3sas_cm0: Device Status Change [ 2128.195023] mpt3sas_cm0: Disable tm_busy flag for handle(0x0019) [ 2128.196162] mpt3sas_cm0: Device Status Change [ 2128.196168] mpt3sas_cm0: device status change: (internal device reset complete) handle(0x0019), sas address(0x300062b202991695), tag(65535) [ 2128.197535] mpt3sas_cm0: Disable tm_busy flag for handle(0x0019) [ 2128.201029] mpt3sas_cm0: device status change: (internal device reset complete) handle(0x0019), sas address(0x300062b202991695), tag(65535) [ 2128.820155] mpt3sas_cm0: Discovery: (start) [ 2128.823241] mpt3sas_cm0: SAS Topology Change List [ 2128.823272] mpt3sas_cm0: discovery event: (start) [ 2128.824771] mpt3sas_cm0: Discovery: (stop) [ 2128.826145] mpt3sas_cm0: sas topology change: (responding) [ 2128.827734] sd 0:0:0:0: [sda] tag#2248 CDB: Read(16) 88 00 00 00 00 02 de 5b ee 90 00 00 04 00 00 00 [ 2128.827737] mpt3sas_cm0: sas_address(0x300062b202991695), phy(2) [ 2128.827738] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(0) [ 2128.827740] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 2128.827741] mpt3sas_cm0: handle(0x0019), ioc_status(success)(0x0000), smid(2249) [ 2128.827743] mpt3sas_cm0: request_len(524288), underflow(524288), resid(262144) [ 2128.827744] mpt3sas_cm0: tag(65535), transfer_count(262144), sc->result(0x00000000) [ 2128.827745] mpt3sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) [ 2128.829130] handle(0x0000), enclosure_handle(0x0001) start_phy(02), count(1) [ 2128.830555] mpt3sas_cm0: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18) [ 2128.832270] phy(02), attached_handle(0x0019): link rate change: link rate: new(0x0a), old(0x0a) [ 2128.833698] sd 0:0:0:0: [sda] tag#2248 CDB: Read(16) 88 00 00 00 00 02 de 5b ee 90 00 00 04 00 00 00 [ 2128.835189] mpt3sas_cm0: updating handles for sas_host(0x500062b202991693) [ 2128.836617] mpt3sas_cm0: sas_address(0x300062b202991695), phy(2) [ 2128.847671] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(0) [ 2128.848871] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 2128.850066] mpt3sas_cm0: handle(0x0019), ioc_status(success)(0x0000), smid(2249) [ 2128.851273] mpt3sas_cm0: request_len(524288), underflow(524288), resid(262144) [ 2128.852469] mpt3sas_cm0: tag(65535), transfer_count(262144), sc->result(0x00000002) [ 2128.853666] mpt3sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) [ 2128.854856] mpt3sas_cm0: [sense_key,asc,ascq]: [0x06,0x29,0x00], count(18) [ 2128.856047] sd 0:0:0:0: Power-on or device reset occurred [ 2128.857760] mpt3sas_cm0: discovery event: (stop) [ 2128.886501] sd 0:0:0:0: [sda] tag#2317 CDB: ATA command pass through(12)/Blank a1 08 2e 00 01 00 00 00 00 ec 00 00 [ 2128.890755] mpt3sas_cm0: sas_address(0x300062b202991695), phy(2) [ 2128.894975] mpt3sas_cm0: enclosure logical id(0x500062b202991693), slot(0) [ 2128.898648] mpt3sas_cm0: enclosure level(0x0000), connector name( ) [ 2128.901549] mpt3sas_cm0: handle(0x0019), ioc_status(success)(0x0000), smid(2318) [ 2128.903707] mpt3sas_cm0: request_len(512), underflow(0), resid(0) [ 2128.904900] mpt3sas_cm0: tag(0), transfer_count(512), sc->result(0x00000002) [ 2128.906087] mpt3sas_cm0: scsi_status(check condition)(0x02), scsi_state(autosense valid )(0x01) [ 2128.907270] mpt3sas_cm0: [sense_key,asc,ascq]: [0x01,0x00,0x1d], count(22)
Someone please explain to me how the 0x0a and 0x0a in link rate change: link rate: new(0x0a), old(0x0a)
are two different numbers lol
Here’s my final setup:
a patched X710-DA2 (to accept my non-whitelisted SFP+ modules), which natively supports ASPM L1
a 9305-24i
ASPMEnabler running at bootup to patch the FADT
all of the previously metioned BIOS settings enabled except for C10
pci=nommconf mpt3sas.msix_disable=1
in my boot arguments
a startup script for TrueNAS which sets all of the tunables by hand and runs my ASPM script
plenty of hopes and dreams, that way the system doesn’t kernel panic while I’m not looking
This server’s state went from “everything works but it’s a power hog” to “machine exhibits strange behaviour tolerated by the operating system but at least we’re saving power”. I’m not entirely sure how I’m supposed to feel about that. Part of me would like to go out and buy a 9600-24i for upwards of 1300€ after seeing one of the Tri-Mode MegaRAID controllers get to L1 on its own but I really don’t want to spend that sort of cash.
For now I guess this is fine. I can live with my server the way it is now. Should things ever get worse I will either update this post or take it down entirely. It’s not like anyone would notice lol
If you’ve made it this far and still want to go ahead with all of this, by all means do it. You clearly care just as little about your production environment as I do.