Reliable Bare Metal Server using TrueOS/FreeBSD

I currently have a project need for a simple FreeBSD base install that is hooked up to a NAS/SAN back end. Coming from a Solaris background, most SPARC machines (like the V220/V420) came rack mountable and space for 2 primary hard drives simply for the OS. You would spin up your OS install, install Solstice DiskSuite (Solaris Volume Manager), apply your secret sauce of configuration and you were away. One disk could fail and you could either hot swap replace and resilver or power down, boot off the disk that was functioning and then resilver to bring the new disk online (yes, I know there are more steps than this but that is out of scope for this article).

I wanted a similar, modern day solution like this, using commodity hardware and a free, open source and liberally licensed operating system.

While FreeBSD 10.x has a stable, binary update method for maintaining production machines, this doesn’t allow you to follow –HEAD to get the latest technology for the project that you are working on. The reason that I use OpenBSD so much in production, apart from the security aspect, is that you don’t have to wait long for new technology to appear in –release. This is why I think FreeBSD –HEAD is more suitable for my needs.

So that leaves me to choose TrueOS, made by the team that puts PCBSD together and backed by iXSystems. TrueOS gives me all the benefits of FreeBSD –HEAD but with the tools for simplistic binary updates, monthly. After testing, I have found you can even skip months through the upgrade cycle but YMMV.

TrueOS has a slightly different way on how it sets up and manages disks than FreeBSD but it didn’t take me long to figure it out and write the simple instructions below for disk replacement due to complete drive failure.

This guide will not go through the process of setting up TrueOS on a host, you can follow the official documentation for this. What is expected is that you have a TrueOS host built, with no GELI(8) encryption using ZFS(8) zmirror on two identical drives that are installed. For the article, I will show how this is done in a virtual environment, however this has been performed on bare metal, consumer grade SATA hardware using 2 x Samsung 850 EVO 250GB drives.

Here is the host and current drive information:

# uname -a
FreeBSD trueos 11.0-CURRENTAUG2015 FreeBSD 11.0-CURRENTAUG2015 #0 2b9b3e7(master)-dirty: Thu Jul 30 18:25:22 EDT 2015     root@avenger:/usr/obj/net/executor/builds/git/freebsd-11-current/sys/GENERIC amd64
# gpart show da0
=>     40 18874288 da0 GPT (9.0G)
       40     2048   1 bios-boot (1.0M)
     2088 14768128   2 freebsd-zfs (7.0G)
14770216   4096000   3 freebsd-swap (2.0G)
18866216     8112       - free - (4.0M)
# gpart show da1
=>     40 18874288 da1 GPT (9.0G)
       40     2048   1 bios-boot (1.0M)
     2088 14768128   2 freebsd-zfs (7.0G)
14770216   4096000   3 freebsd-swap (2.0G)
18866216     8112       - free - (4.0M)

Now shut down the host and let’s remove the primary hard drive (da0)

# shutdown –p now

Remove the hard drive and replace with one that is of equal size (it can be bigger, but is out of scope of this article). Now boot the system off the second drive. You’ll probably have to select an alternative boot device as the functioning hard disk is not on the primary controller.

Now verify the new disk is online and doesn’t have a valid label:

# gpart show da0

gpart: No such geom: da0.
Since these disks should be identical, we will copy the GPT label from the disk we booted from to the new disk, then verify that is worked as intended:

# gpart backup da1 | gpart restore da0
# gpart show da0
=>     40 18874288 da0 GPT (9.0G)
       40     2048   1 bios-boot (1.0M)
     2088 14768128   2 freebsd-zfs (7.0G)
14770216 4096000   3 freebsd-swap (2.0G)
18866216     8112       - free - (4.0M)

Below shows that we still have a broken zpool(8) and we need to replace the existing device (da0p2):

# zpool status tank
pool: tank
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
       the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
scan: resilvered 1.00G in 0h2m with 0 errors on Tue Aug 18 05:05:35 2015
config:

       NAME                     STATE     READ WRITE CKSUM
       tank                     DEGRADED     0     0     0
         mirror-0               DEGRADED     0     0     0
           3380954699551463507 UNAVAIL     0     0     0 was /dev/da0p2
           da1p2               ONLINE       0     0     0

errors: No known data errors

Now to replace the unavailable disk with the new one that we have replaced it with:

# zpool replace tank da0p2 da0p2

While you will be prompted to install the bootcode on the new disk, this is not applicable to TrueOS. We will get to the TrueOS supported method shortly.

Wait until resilvering has completed:

# zpool status tank
pool: tank
state: ONLINE
scan: resilvered 1.00G in 0h2m with 0 errors on Wed Aug 19 01:26:05 2015
config:

     NAME       STATE     READ WRITE CKSUM
       tank       ONLINE       0     0     0
         mirror-0 ONLINE       0     0     0
           da0p2   ONLINE       0     0     0
           da1p2   ONLINE       0     0     0

errors: No known data errors

Swap on TrueOS is encrypted and in this disk configuration mirrored using gmirror(8). The system will boot without fixing this issue but it is best to ensure this is back online in its mirror configuration before continuing. We need to have gmirror forget the old device and insert the replacement:

# gmirror status
             Name   Status Components
mirror/swapmirror DEGRADED da1p3 (ACTIVE)
# gmirror forget swapmirror
# gmirror insert swapmirror da0p3
# gmirror status
             Name   Status Components
mirror/swapmirror COMPLETE da1p3 (ACTIVE)
                             da0p3 (ACTIVE)

The final thing to complete is re-installing the bootloader on the new disk. TrueOS has a simple script that does this for you, without needing to provide any switches:

# restamp-grub
Installing GRUB to da0
Installing GRUB to da1

Once completed, you can simply reboot your host to make sure everything boots and works as intended. Your system will now have redundant boot disks again and ready for production workloads.