For many years I'd relied on an Adaptec 5805z RAID controller to safeguard my data. I started with 8x 2TB consumer drives in a Norco 4220 chassis. 10 years later, I ran out of CPU and RAM resources and needed an upgrade.
I'll go into the specs of the new build in another post, for now, I'll focus on the drive controllers.
The previous server's RAID controller had two SFF-8087 SAS ports. Since the server had 5 backplanes, I needed a SAS expander. After doing some research, I settled on Intel's RES2SV240. It worked great and gave me the capability to use my server case to its full potential.
The Norco case comes with SFF-8087 backplanes that have multiple LEDs. Blue for power, Green for activity, and red for identification/error. The blue and green LEDs worked right out of the box, but after hours of troubleshooting, drivers and finally the internet, I found out, the red ones aren't connected! In fact, the backplanes all have a bank of headers on the side. If you touch these headers to ground, the LED illuminates!
Here's where the expander saved the day. The RES2SV240 is classified as an "enclosure." As such, it has a few extra headers on it that the Adaptec does not. Two in particular, on either side of the heatsink, are the identification headers. These, when activated by the drive controller, or software, will go high (+5v) and drive an LED. Since the backplane needed a ground signal, building an array of NPN transistors, provided the potential needed.
10 years later, it was time to retire the Adaptec 5805z and usher in a new era of software RAID with ZFS. To do so, I replaced the hardware RAID controller with an LSI SAS9211-8i HBA.
One of the first things I needed to do as flash the firmware on the LSI HBA into "IT" mode so that the controller would blindly pass the drives to the OS.
Following a few guides online, I download the most current firmware and prepared a USB stick for the UEFI firmware flash. At this point I had booted into the motherboard's UEFI built-in shell.
After copying the sas2flash.efi, 2118it.bin, and mptsas2.rom files to a USB drive formatted as FAT32, I inserted the USB drive and performed the following:
fs0:
This mounted the USB drive.
sas2flash -o -e
This erased the the flash from the HBA.
sas2flash -o -f firmware/2118it.bin -b bios/mptsas2.rom
This was supposed to flash the firmware but instead it failed and told me it could not find the files I specified in the command. After some further research, it seems that the firmware and bios files should have been at the root of the USB drive's filesystem. I had them in their original folders as they came from the downloaded Brocade ZIP file.
At this point I was a bit panicked since I had just finished erasing the original firmware. I had no idea what would happen if I rebooted or removed the USB drive. After removing the drive and fixing the files, I reinserted the drive and tried to run the command again. At this point the motherboard was now displaying errors that the disk was not found. I briefly researched how to rescan for the drive or go back to the UEFI shell but my searches came up empty. I had one move left, reboot.
As the POST codes flashed on the bottom of the screen I kept my eyes focused on the green LEDs on the HBA. They started flashing in the same rhythm as before. This gave me hope. I entered the UEFI shell and ran through the commands again:
fs0:
sas2flash -o -f 2118it.bin -b mptsas2.rom
The progress bars started flowing and before I knew it, the firmware was installed.
I ran the server like this for a few days as I configured the OS. Once I was ready, I powered down the old server and replaced the motherboard, CPU, RAM, and finally the Adaptec RAID controller with the LSI HBA.
I didn't know what to expect, but I felt that since my HBA was guaranteed to work with FreeBSD and the Intel SAS expander hadn't let me down in 10 years, I should be good to go, right?
Boy, was I wrong.
The first boot was flawless. Everything came up and it was running in seconds. I slowly inserted the 8 HGST 4TB enterprise drives into the Norco case and refreshed the camcontrol devlist
command along the way. Once all the drives were online, I proceeded to partition, geli encrypt, and prep for the zpool:
dd if=/dev/zero of=/dev/da0 bs=512 count=1
gpart create -s gpt da0
gpart add -t freebsd-zfs -a 4k -b 1m da0
geli init -s 4096 -e AES-XTS -l 256 -K "geli.key" "/dev/da0"
geli attach -k geli.key /dev/da0
After 8 iterations of this, the next command created the pool:
zpool create storage raidz2 /dev/da{0..7}.eli
It was very uneventful. zfs list
showed me a 20TB dataset ready to go! I fired up a zfs send | ssh zfs recv
command to being restoring the data from the backup server. Instantly, the LEDs were dancing away and the gigabit link between the servers was saturated...almost.
After a few hundred megabytes as transferred, the array would stop flashing, hang for a few seconds, then continue. I pulled up dmesg and noticed messages about CAM status: Command timeout
. Eventually, ZFS started taking drives offline complaining about too many errors. I pulled the drive in question, and reinserted it. The re-silvering process started but reported it was going to take DAYS to complete. Something was seriously wrong here.
It took a few hours of digging through various search results until I finally realized I was searching in the wrong place. The issue wasn't with the HBA!
I came across someone with the same Intel SAS expander with similar issues. Fix action? Upgrade the firmware on the expander! This made me nervous. I had been on Intel's site many times before and wasn't ever successful finding the firmware update packages. That was a few years ago and apparently they've fixed some broken links since then:
https://downloadcenter.intel.com/download/21686/RAID-Expander-RES2SV240-RES2CV240-RES2CV360-Firmware?product=53946
Using Supermicro's IPMI utility, I was able to perform the entire firmware upgrade from my desk:
The SAS expander firmware update took my performance from ~150MBps to the ~900MBps range!
It's been over a month now and I haven't received a single CAM error since then. Looks like I'm set for another 10 years!