Linux Galaxy

Slackware 15.0 AMD RX 6000 series (NAVI 23) power and fan issues

Posted on Jun 18, 2023 by kingbeowulf



After a recent purchase a few months ago of a MSI Mech Radeon RX 6650 XT ($280 US, newegg.com, now $250), I noticed a peculiar behavior whenever the monitor powered down: The admgpu/drm crashed. The system had to be shutdown and power cycled to restore GPU function. Same issue on both motherboards:

Gigabyte X570 AORUS ELITE AMD Ryzen 9 3900X
Kernel 5.15.94 Mesa-22.2.5 libdrm-2.4.115
Gigabyte 27-in 2560x1440 170 Hz IPS

Gigabyte X570 I AORUS PRO WIFI AMD Ryzen 7 3800X
Kernel 5.15.94 Mesa-21.3.5 libdrm-2.4.109
Samsung 28-in 4K 60 Hz IPS

An example or the errors in /var/log/syslog:

 1[    7.004430] [drm] Loading DMUB firmware via PSP: version=0x02020017
 2[    7.008658] [drm] Found VCN firmware Version ENC: 1.26 DEC: 2 VEP: 0 Revision: 0
 3[    7.008663] amdgpu 0000:0b:00.0: amdgpu: Will use PSP to load VCN firmware
 4...
 5[82707.692420] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
 6[82712.848198] amdgpu 0000:0b:00.0: amdgpu: Failed to export SMU metrics table!
 7[82717.358258] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command!
 8[82717.358261] amdgpu 0000:0b:00.0: amdgpu: Failed to export SMU metrics table!
 9[82719.233218] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
10[82719.490285] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
11[82719.739131] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
12[82719.987969] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
13[82720.236750] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
14[82720.488203] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
15[82720.737006] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
16[82720.985789] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
17[82721.234601] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
18[82721.486020] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
19[82721.734834] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
20[82721.983625] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
21[82722.139761] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command!
22[82722.139764] amdgpu 0000:0b:00.0: amdgpu: Failed to export SMU metrics table!
23[82722.232490] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
24[82722.483935] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
25[82722.732707] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
26[82722.980662] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
27[82723.229470] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
28[82723.289267] [drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
29[82725.079984] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=834657, emitted seq=834658
30[82725.080195] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process plasmashell pid 1636 thread plasmashel:cs0 pid 1694
31[82725.080385] amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!
32[82726.746637] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command!
33[82726.746641] amdgpu 0000:0b:00.0: amdgpu: Failed to export SMU metrics table!
34[82731.754810] amdgpu 0000:0b:00.0: amdgpu: SMU: I'm not done with your previous command!
35[82731.754815] amdgpu 0000:0b:00.0: amdgpu: Failed to disable gfxoff!
36[82732.645299] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
37[82732.902045] [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
38...etc

At first I thought this was related to power management/dpm; however, turning all that off, and even resorting to appending "amdgpu.runpm=0 pcie_aspm=off" to the boot command, did not resolve the problem. The error occured when the monitors turn of or wake up via both Display port or HDMI, as well as a simple power off/on and hot-plugging the cable. The RX 6650 XT worked as expected on MS Windows 10 with the AMD Adrenalin Graphics Driver.

Searching the usually suspect provided some conflicting information and patches, supposedly resolved after kernel 5.14 or so.

And did I mention that although the GPU fan works, and could be controlled via /sys hwmon pwm, but that

/sys/class/drm/card0/device/hwmon/hwmon5/fan1_input

reads 0 (zero) no matter the fan speed (lm-sensors, gkrellm, corectrl)? This script can calculate an approximate fan rpm from the pwm1 value:

 1#!/bin/bash
 2
 3rpm_max=$(cat "/sys/class/drm/card0/device/hwmon/hwmon5/fan1_max")
 4pwm_max=$(cat "/sys/class/drm/card0/device/hwmon/hwmon5/pwm1_max")
 5
 6# Update interval (seconds)
 7UPD=${UPD:-5}
 8
 9clear
10echo -e "\nCurrent GPU fan speed (CTRL-C to exit):"
11while [ true ]; do
12  pwm1=$(cat "/sys/class/drm/card0/device/hwmon/hwmon5/pwm1")
13  rpm1=$(echo "$rpm_max/100" | bc -l )
14  rpm2=$(echo "$pwm1/$pwm_max" | bc -l)
15  GPU_RPM=$(echo "scale=0; $rpm1*$rpm2*100 / 1" | bc)
16  echo -ne "\033[2K\r$GPU_RPM rpm "
17  sleep $UPD
18done

It is nice that the GPU prices are slowly getting back to a relatively normal baseline, so I last week picked up the Sapphire Pulse Radeon RX 6800 XT Gaming ($485 US, newgg.com) for the Ryzen 9 system. Weirdly enough, this GPU does not have the monitor hotplug issue and the GPU fan speed is displayed! At this point it looks like an issue with "Dimgrey Cavefish" (Navi 23, RX 6600/6650 XT) amdgpu or firmware, whereas "Sienna Cichlid" (Navi 21, RX 6800/6800XT) is better supported.

The next stable release of Slackware will no doubt resolve this Navi 23 display and fan issue.


Return to blog

King Beowulf's Linux Adventures


Contact:

  • kingbeowulf@linuxgalaxy.org
  • mumble.linuxgalaxy.org:64738
  • Libera.chat IRC
    • ##slackware, #slackbuilds, #linuxgalaxy

Screamin' and a-streamin' !

  • https://twitch.tv/kngbwlf
  • https://www.youtube.com/@mylinuxgalaxy

Advertisement

Try a nice upgrade from EVGA and get a discount!

Tired of Steam, Epic and other rip-off game "stores"? Check out Humble Bundle for your digital download needs! A portion of your hard-earned gaming cash goes to charity.

King Beowulf's Humble Bundle Referral Code

Citizen Science!