Wednesday, May 6, 2009

Zombie X

I'm having a weird and disturbing issue, in which my display completely freezes while watching a recording in MythTV. I've also seen this happen when trying to watch a video fullscreen with mplayer. So it seems to be a video issue. Sometimes the display blanks, sometimes the image freezes, but remains on screen.

When this happens, my only recourse is to ssh in from my laptop, and issue the commands:

sudo /etc/init.d/xdm stop

sudo /etc/init.d/xdm start

Note that sudo /etc/init.d/xdm restart does not work, which is weird, since all that does (IIUC) is to stop/start the daemon!

I have also noticed the following error dump in dmesg when this occurs:

[ 6734.516861] BUG: unable to handle kernel paging request at ffffc20011105428
[ 6734.516867] IP: [] ulReadMmRegisterUlongViaAddr+0x9/0x10 [fglrx]
[ 6734.516957] PGD 12fc08067 PUD 12fc09067 PMD 12eaf5067 PTE 0
[ 6734.516962] Oops: 0000 [#4] SMP
[ 6734.516965] last sysfs file: /sys/devices/pci0000:00/0000:00:18.3/resource
[ 6734.516968] CPU 1
[ 6734.516970] Modules linked in: lirc_mceusb2 lirc_dev video1394 dv1394 raw1394 fglrx(P) ohci1394 snd_hda_intel ieee1394 r8169 snd_hwdep
[ 6734.516981] Pid: 17277, comm: X Tainted: P D 2.6.28-gentoo-r5 #6
[ 6734.516983] RIP: 0010:[] [] ulReadMmRegisterUlongViaAddr+0x9/0x10 [fglrx]
[ 6734.517046] RSP: 0018:ffff880117075c90 EFLAGS: 00010286
[ 6734.517048] RAX: 0000000000000000 RBX: ffff880099859190 RCX: 0000000000000007
[ 6734.517050] RDX: 000000000000150a RSI: 000000000000150a RDI: ffffc20011100000
[ 6734.517053] RBP: ffffc20011100000 R08: 0000000000000000 R09: ffffffffa01939a0
[ 6734.517055] R10: 00000000000fdfe0 R11: ffff88012fc08000 R12: 000000000000150a
[ 6734.517058] R13: ffff880040e3b000 R14: 000000004068646a R15: ffff880038855f00
[ 6734.517061] FS: 00007fdefc2e36f0(0000) GS:ffff88012fc03980(0000) knlGS:00000000f7de36c0
[ 6734.517063] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 6734.517066] CR2: ffffc20011105428 CR3: 0000000117076000 CR4: 00000000000006a0
[ 6734.517068] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6734.517071] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6734.517073] Process X (pid: 17277, threadinfo ffff880117074000, task ffff8800a5488740)
[ 6734.517076] Stack:
[ 6734.517077] ffffffffa00d036b 00000000fdff0000 0000000000000202 ffff880099859000
[ 6734.517081] ffff880099859000 00007fff0430a6e0 ffffffffa00ec124 000000004068646a
[ 6734.517086] ffff880099859000 ffff880099859190 00007fff0430a6e0 ffff880040e3b000
[ 6734.517090] Call Trace:
[ 6734.517093] [] ? ulReadMmRegisterUlong+0x6b/0x140 [fglrx]
[ 6734.517132] [] ? Cail_RS780_RestoreAdapterCfgRegisters+0x24/0x190 [fglrx]
[ 6734.517132] [] ? CAILExit+0x15f/0x190 [fglrx]
[ 6734.517132] [] ? firegl_cail_free+0x3f/0x70 [fglrx]
[ 6734.517132] [] ? hal_init_asic+0x45a/0x5e0 [fglrx]
[ 6734.517132] [] ? capable+0x17/0x36
[ 6734.517132] [] ? hal_init_asic+0x0/0x5e0 [fglrx]
[ 6734.517132] [] ? firegl_ioctl+0x1ea/0xf40 [fglrx]
[ 6734.517132] [] ? ip_firegl_ioctl+0x11/0x13 [fglrx]
[ 6734.517132] [] ? vfs_ioctl+0x5f/0x78
[ 6734.517132] [] ? do_vfs_ioctl+0x37c/0x3aa
[ 6734.517132] [] ? fsnotify_modify+0x62/0x6a
[ 6734.517132] [] ? sys_ioctl+0x55/0x77
[ 6734.517132] [] ? system_call_fastpath+0x16/0x1b
[ 6734.517132] Code: 0f 85 0e ff ff ff 44 89 f2 44 89 ee 48 89 df e8 8e 4d 02 00 85 c0 0f 84 40 ff ff ff e9 f3 fe ff ff 90 31 c0 48 85 ff 74 05 89 f2 <8b> 04 97 c3 66 66 90 48 85 ff 74 05 89 f1 89 14 8f c3 66 66 90
[ 6734.517132] RIP [] ulReadMmRegisterUlongViaAddr+0x9/0x10 [fglrx]
[ 6734.517132] RSP
[ 6734.517132] CR2: ffffc20011105428
[ 6734.517132] ---[ end trace 3dbdfa876caf57b1 ]---
[ 6734.521254] [fglrx:firegl_release] *ERROR* device busy: 1 0
[ 6734.521258] [fglrx] release failed with code -EBUSY


Whoa, scary!

I looked through the Gentoo ATI Radeon FAQ, but didn't find much there. I did find this thread on the Gentoo forums. The person there describes a very similar problem, which he solved by switching to the "zen-sources" kernel.

I figure it's worth a try. zen-sources is not part of the normal Gentoo portage tree, it's in an "overlay", which is a 3rd-party portage tree. To install it, I followed their instructions. Basically: emerge layman; layman -a zen-sources; emerge zen-sources.

Kernel config/compile/install went smoothly, and the major subsystems (network/audio/Xorg) seem to function properly. I even remembered to re-emerge ati-drivers and lirc after installing the new kernel (since those packages provide kernel modules). I'll have to wait and see about the video stability...time for bed!

Update: zen-sources didn't resolve the stability issues. However, I did find a number of new resources online, including this helpful opensuse.org guide and the Unofficial wiki for the ATi Linux Driver.

Following the opensuse guide, I added some lines to the Device section of my xorg.conf (everything after "try these options..."):
Section "Device"

# Driver "radeonhd"
Identifier "Card0"
Driver "fglrx"
VendorName "ATI Technologies Inc"
BoardName "Radeon HD 3200 Graphics"
BusID "PCI:1:5:0"

#try these options to improve stability:
Option "VideoOverlay" "off"
Option "OpenGLOverlay" "off"
Option "TexturedVideo" "on"
Option "BlockSignalsOnLock" "on"
Option "KernelModuleParm" "locked-userpages=0"
Option "UseFastTLS" "2"
Option "UseInternalAGPGART" "no"
Option "mtrr" "off"
Option "no_accel" "no"
Option "EnablePrivateBackZ" "no"
Option "backingstore" "true"

EndSection


Time will tell if these options make it more stable...

No comments:

Post a Comment