Thu, 25 May 2006

Kernel upgrades, crashes

Yesterday I have upgraded the kernel in our main server to the newest stable, The server ran pretty historical kernel, because newer kernels had some problems in XFS on this setup. But even the older kernel has crashed from time to time. The booted fine, I did few additional tweaks, and put the server back to the production use. Today, however, some problems appeared:

We have found that NFS clients using volumes from this server cannot lock files using fcntl(). According to tcpdump, the server just did not respond to the RPC locking requests. However, the same kernel version on a different server, and with the same line in /etc/exports, worked correctly, even with file locking. I think the difference was in the NFS server utils, which are older in the prodution system (RHEL3) than in the other server (FC3). However, I couldn't recompile newer NFS utils, because it depends on the newer version of Kerberos libraries. I was about to reboot the server back to the older kernel, when it crashed on me. I am back on and I will probably wait till we upgrade this box to RHEL4. If the problem is the same then, I will probably make it a Red Hat problem.

When testing the NFS with the newest kernel on that other box, I have found that the 3ware driver does not work with the iommu=off boot option. I wonder if akpm is right about the kernel getting buggier. Moreover, it seems that the Tyan S2882 board does not boot correctly, when the power-on memory test is interrupted by pressing the Escape key (MCE is reported, or the server just silently reboots when loading the kernel). At first I thought the server was dead, when even the original kernel refused to boot.

