Friday, May 5, 2023

Svelte Jails From Scratch

I recently upgraded one of my homelab servers to FreeBSD 14-CURRENT, and I updated my Ansible scripts to build up some needed Jails for various services. I have never relied on iocage, exjail, or the like and have instead typically built my jails from source. With the latest FreeBSD -CURRENT a make installworld / make distribution into the new jail DESTDIR created a 1.3GB installation. This is a large surface area for running just a single service such as nginx, and so I visited the src.conf build options to get the base system of each jail down to 85MB.

The largest parts of the default 14-CURRENT userland install is by far /usr/lib/debug at 724MB. Clang/LLVM/LLDB is another large chunk. The full src.conf to build a base userland without many unneeded bits is included below:

WITHOUT_ACCT=true
WITHOUT_ACPI=true
WITHOUT_APM=true
WITHOUT_ATM=true
WITHOUT_AUTOFS=true
WITHOUT_BHYVE=true
WITHOUT_BLUETOOTH=true
WITHOUT_BOOT=true
WITHOUT_BOOTPARAMD=true
WITHOUT_BOOTPD=true
WITHOUT_BSDINSTALL=true
WITHOUT_BSNMP=true
WITHOUT_CLANG=true
WITHOUT_CXGBETOOL=true
WITHOUT_DEBUG_FILES=true
WITHOUT_DTRACE=true
WITHOUT_EFI=true
WITHOUT_EXAMPLES=true
WITHOUT_FINGER=true
WITHOUT_FLOPPY=true
WITHOUT_FREEBSD_UPDATE=true
WITHOUT_FTP=true
WITHOUT_GAMES=true
#WITHOUT_GH_BC=true
WITHOUT_GNU_DIFF=true
WITHOUT_HAST=true
WITHOUT_HTML=true
WITHOUT_HYPERV=true
WITHOUT_INCLUDES=true
WITHOUT_INSTALLLIB=true
WITHOUT_IPFILTER=true
WITHOUT_IPFW=true
WITHOUT_ISCSI=true
WITHOUT_JAIL=true
WITHOUT_LEGACY_CONSOLE=true
WITHOUT_LIB32=true
WITHOUT_LLDB=true
WITHOUT_LOCALES=true
WITHOUT_LPR=true
WITHOUT_MAN=true
WITHOUT_MANCOMPRESS=true
WITHOUT_MLX5TOOL=true
WITHOUT_NDIS=true
WITHOUT_NETCAT=true
WITHOUT_NIS=true
WITHOUT_NLS=true
WITHOUT_NTP=true
WITHOUT_OFED=ture
WITHOUT_OPENMP=true
WITHOUT_PF=true
WITHOUT_PMC=true
WITHOUT_PPP=true
WITHOUT_RADIUS_SUPPORT=true
WITHOUT_RBOOTD=true
WITHOUT_RESCUE=true
WITHOUT_ROUTED=true
WITHOUT_SENDMAIL=true
WITHOUT_SHAREDOCS=true
WITHOUT_SYSCONS=true
WITHOUT_TALK=true
WITHOUT_TESTS=true
WITHOUT_UNBOUND=true
WITHOUT_USB=true
WITHOUT_VT=true
WITHOUT_WIRELESS=true
WITHOUT_ZFS=true

Note that WITHOUT_GH_BC is broken for installworld in -CURRENT at present, so I've commented it out. nginx, isc-dhcpd, and other packages I install my jails add about 40MB each, and so I'm pretty happy with an 85MB base system for each jail.

FreeBSD has a long history of projects to provide a minimal system for embedded devices and other use cases, such as PicoBSD (deprecated -- FreeBSD 3 on a single floppy!), NanoBSD, mfsBSD, and more. Please see these projects for more robust techniques to further minimize. This post is meant purely to see how small I can get an installed jail userland from the output of a default make buildworld by just setting build variables with make installworld.

Monday, May 1, 2023

New NUC13ANKi7 13th gen

I'd like to replace some of my older 2015-2016 era FreeBSD NUCs with something with more DRAM to support ZFS and more Jails, and a faster CPU to speed up my builds. After some research I settled on the NUC13ANKi7. This NUC fits a lot into the smaller short height form factor:

  • Intel Core i7-1360P
    • 12 cores (4 performance, 8 efficiency), 16 threads)
    • Performance cores turbo boost up to 5.0 GHz
    • Raptor Lake architecture
  • 64GB DDR4 DRAM.
  • 1TB PCIe x4 Gen4 NVMe
  • 2.5GB Ethernet, Wifi6E
  • UEFI with support for HTTP Boot.

The price from SimplyNUC is $1,189 plus tax and shipping. I would have preferred DDR5 DIMMs but otherwise was pretty happy with this, especially the smaller NUC form factor.

The machine builds and runs FreeBSD 13.2 great (see dmesg gist).

Buildworld Times

To test the suitability of this new machine as a build server, I ran 270 iterations of make buildworld over a week.
  • 30 -j parallelism options (1 to 30)
  • Three source and /usr/obj configurations:
    • /usr/src and /usr/obj on local SSD
    • /usr/src on SSD and /usr/obj on tmpfs
    • /usr/src and /usr/obj on tmpfs.
  • Three iterations of each test.

Median result of the three runs for the default ZFS /usr/src and /usr/obj runs are included in the plot below:

The fastest buildtimes occurred in under 26 minutes with -j21, but the performance of anything between -j16 and -j30 was within 1.3% of that elapsed build time. Moving src and obj to tmpfs led to less than a 4.1% delta in elapsed time.

SSD Performance

diskinfo -t nvd0 shows 3.9-4.35 MB/s transfer rates to the local SSD.

diskinfo -t nvd0

nvd0
        512             # sectorsize
        1000204886016   # mediasize in bytes (932G)
        1953525168      # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        PNY CS2140 1TB SSD      # Disk descr.
        PNY2201220106010B3A8    # Disk ident.
        nvme0           # Attachment
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM

Seek times:
        Full stroke:      250 iter in   0.004330 sec =    0.017 msec
        Half stroke:      250 iter in   0.019046 sec =    0.076 msec
        Quarter stroke:   500 iter in   0.013305 sec =    0.027 msec
        Short forward:    400 iter in   0.005383 sec =    0.013 msec
        Short backward:   400 iter in   0.005009 sec =    0.013 msec
        Seq outer:       2048 iter in   0.045332 sec =    0.022 msec
        Seq inner:       2048 iter in   0.029877 sec =    0.015 msec

Transfer rates:
        outside:       102400 kbytes in   0.025866 sec =  3958865 kbytes/sec
        middle:        102400 kbytes in   0.023508 sec =  4355964 kbytes/sec
        inside:        102400 kbytes in   0.024016 sec =  4263824 kbytes/sec

IOZone

I also used IOZone benchmark to quickly gather some SSD stats with 4k reads:
MetricOutput in KBytes/sec
Initial write:1160931.00
Rewrite1223939.88
Read1900560.25
Re-read3034837.25
Reverse Read1859190.25
Stride read568050.94
Random read143096.12
Mixed workload290691.06
Random write94226.63
Pwrite1184217.75
Pread3080741.50
Fwrite1211297.38
Fread1776154.00

UEFI HTTP Boot

One thing I wasn't expecting with this NUC is that it supports HTTP Boot and a UEFI shell. It seems more finicky than HTTPBoot on a Dell, and is possibly only looking at HTTPS URLs. Will try to follow up with another post in more detail about HTTP Boot with FreeBSD.

Monday, October 24, 2016

FreeBSD on Intel NUCs

I've been away from FreeBSD for a few years but I wanted some more functionality on my home network that I was able to configure with my Synology NAS and router. Specifically, I wanted:

  • a configurable caching name server that would serve up authoritative private names on my LAN and also validates responses with DNSSEC.
  • a more configurable DHCP server so I could make the server assign specific IPs to specific MAC addresses.
  • more compute power for transcoding videos for Plex.

Running FreeBSD 11 on an Intel NUC seemed like an ideal solution to keep my closet tidy. As of this week, $406.63 on Amazon buys a last generation i3 Intel NUC mini PC (NUC5I3RYH), with 8GB of RAM and 128GB of SSD storage. This was the first model I tried since I found reports of others using this with FreeBSD online, but I was also able to get it working on the newer generation i5 based NUC6i5SYK with 16GB of RAM and 256GB of SSD. The major issue with these NUCs is that the Intel wireless driver is not supported in FreeBSD. I am not doing anything graphical with these boxes so I don't know how well the graphics work, but they are great little network compute nodes.

Installation

I downloaded the FreeBSD 11 memory stick images, and was pleased to see that the device booted fine off the memory stick without any BIOS configuration required. However, my installation failed trying to mount root ("Mounting from ufs:/dev/ufs/FreeBSD_Install failed with error 19."). Installation from an external USB DVD drive and over the network with PXE both proved more successful at getting me into bsdinstaller to complete the installation.

I partitioned the 128GB SSD device with 8GB of swap and the rest for the root partition (UFS, Journaled and Soft Updates). After installation I edited /etc/fstab to add a tmpfs(5) mount for /tmp. The dmesg output for this host is available in a Gist on Github.

Warren Block's article on SSD on FreeBSD and the various chapters of the FreeBSD Handbook were helpful. There were a couple of tools that were also useful in probing the performance of the SSD with my FreeBSD workload:

  • The smartctl tool in the sysutils/smartmontools package allows one to read detailed diagnostic information from the SSD, including wear patterns.
  • The basic benchmark built into diskinfo -t reports that the SSD is transferring 503-510MB/second.
But how well does it perform in practice?

Rough Benchmarks

This post isn't meant to report a comprehensive suite of FreeBSD benchmarks, but I did run some basic tests to understand how suitable these low power NUCs perform in practice. To start with, I downloaded the 11-stable source from Subversion and measured the build times to understand performance of the new system. All builds were done with a minimal 2 line make.conf:

MALLOC_PRODUCTION=yes
CPUTYPE?=core2

Build Speed

Build CommandEnvironmentReal Times
make -j4 buildkernel/usr/src and /usr/obj on SSD10.06 minutes
make -j4 buildkernel/usr/src on SSD, /usr/obj on tmpfs9.65 minutes
make -j4 buildworld/usr/src and /usr/obj on SSD1.27 hours
make buildworld/usr/src and /urs/obj on SSD3.76 hours

Bonnie

In addition to the build times, I also wanted to look more directly at the performance reading from flash and reading from the NFS mounted home directories on my 4-drive NAS. I first tried Bonnie++, but then ran into a 13-year old bug in the NFS client of FreeBSD. After switching to Bonnie, I was able to gather some reasonable numbers. I had to use really large file sizes for the random write test to eliminate most of the caching that was artificially inflating the results. For those that haven't seen it, Brendan Gregg's excellent blog post highlights some of the issues of file system benchmarks like Bonnie.


Average of 3 bonnie runs with 40GB block size
ConfigurationRandom I/OBlock InputBlock Output
Seeks/SecCPU UtilizationReads/secCPU UtilizationWrites/secCPU Utilization
NFS99.20.91065054.8899667.5
SSD880913.553867125.3316091711.3

The block input rates from my bonnie benchmarks on the SSD were within 5% of the value provided by the much quick and dirtier diskinfo -t test.

Running Bonnie with less than 40GB file size yielded unreliable benchmarks due to caching at the VM layer. The following boxplot shows the random seek performance during 3 runs each at 24, 32, and 40GB file sizes. Performance starts to even off at this level but with smaller file sizes the reported random seek performance is much higher.

Open Issues

As mentioned earlier, I liked the performance I got with running FreeBSD on a 2015-era i3 NUC5I3RYH so much that I bought a newer, more powerful second device for my network. The 2016-era i5 NUC 6i5SYK is also running great. There are just a few minor issues I've encountered so far:

  • There is no FreeBSD driver for the Intel Wireless chip included with this NUC. Code for other platforms exists but has not been ported to FreeBSD.
  • The memory stick booting issue described in the installation section. It is not clear if it didn't like my USB stick for some reason, or the port I was plugging into, or if additional boot parameters would have solved the issue. Documentation and/or code needs to be updated to make this clearer.
  • Similarly, the PXE Install instructions were a bit scattered. The PXE section of the Handbook isn't specifically targetting new manual installations into bsdinstall. There are a few extra things you can run into that aren't documented well or could be streamlined.
  • Graphics / X11 are outside of the scope of my needs. The NUCs have VESA mounts so you can easily tuck them behind an LCD monitor, but it is not clear to me how well they perform in that role.

Wednesday, January 7, 2015

AsiaBSDCon 2014 Videos Posted (6 years of BSDConferences on YouTube)

Sato-san has once created a playlist of videos from AsiaBSDCon. There were 20 videos from the conference held March 15-16, 2014 and papers can be found here. Congrats to the organizers for running another successful conference in Tokyo. A full list of videos is included below. Six years ago when I first created this channel videos longer than 10 minutes couldn't normally be uploaded to YouTube and we had to create a special partner channel for the content. It is great to see how the availability of technical video content about FreeBSD has grown in the last six years.

Friday, June 28, 2013

Trip Report from USENIX ATC 2013

I spent half of the week at USENIX ATC in San Jose. I previously attended in 2000, 2001, 2002, and 2004, and I have been to other more academic USENIX conferences in the intervening years such as FAST and OSDI, but I have not made it back to Annual Tech in nearly a decade.

The conference is very familiar but has also definitely changed since '04 (no more terminal rooms and the BoF board was nearly empty!) I was very happy with the caliber of the accepted papers in the main conferences as well as in many of the workshops of Federated Conferences Week (HotStorage, HotCloud, etc.). There is less industry and open-source participation now, but still a variety of really interesting talks about storage, networking, operating systems, virtualization, and more from academia and (a smaller subset of) industry.

As I've previously noted on this blog, I think the BSD conferences are great, but that it is very important for the FreeBSD community to also present work at some of the broader open-source and academic systems conferences. I would be much more likely to attend EuroBSDCon if it were held adjacent to EuroSys or FOSDEM, for example. And would be more likely to attend a U.S.-based BSD conference if it were held adjacent to a USENIX or O'Reilly Strata event.

On Wednesday my team presented one of our main projects of last year, Janus: Optimal Flash Provisioning for Cloud Storage Workloads. This work describes a method for automatically segregating hot and cold storage workloads in a large distributed filesystem, formulates an optimization problem to match the available flash to different workloads in such a way as to maximize the total reads going to flash, and then places that hot data on the distributed flash devices instead of distributed disk devices.

There were a number of other really interesting talks about flash, virtualization, and distributed storage systems, but I wanted to highlight two short-papers that I think would most appeal to the FreeBSD audience here:

  • Practical and Effective Sandboxing for Non-root Users, Taesoo Kim and Nickolai Zeldovich, MIT CSAIL
    This was a nice practical short paper about interposing system calls, using unionfs in a clever way, and taking some revision control ideas for a nice little tool.
  • packetdrill: Scriptable Network Stack Testing, from Sockets to Packets, Neal Cardwell, Yuchung Cheng, Lawrence Brakmo, Matt Mathis, Barath Raghavan, Nandita Dukkipati, Hsiao-keng Jerry Chu, Andreas Terzis, and Tom Herbert, Google
    Another practical short paper about a portable tool, which works on FreeBSD or Linux, that enables testing the correctness and performance of entire TCP/UDP/IP network stack implementations, from the system call layer to the hardware network interface, for both IPv4 and IPv6. This tool was instrumental in identifying 10 bugs in the Linux network stack and enabling the development of three new features: TCP—Early Retransmit, Fast Open, and Loss Probes.

I'm not sure if I'll go to FAST, or USENIX ATC, or both next year, but it's likely I'll attend at least one. What other industry conferences outside of the BSDCan/EuroBSDCon circuit does the FreeBSD community congregate at these days? For folks that have been in industry 10+ years, do you go to more or less industry conferences now than in the past?

Saturday, February 4, 2012

Updated TCP Proposals and FreeBSD

There are a number of proposals for improving TCP performance coming out of Google that have some implications for FreeBSD. These proposals have taken the form of a group of IETF proposals, RFCs, patches to the Linux kernel, and research publications. A nice summary of the different initiatives is available from Lets Make TCP Faster on the Google Code Blog.

TCP Fast Open by Radhakrishnan, Cheng, Chu, Jain, and Raghavan is based on the observation that modern web services are dominated by TCP flows so short that they terminate a few round trips after handshaking. This means that the 3-way TCP handshake is a significant source of latency for such flows, and they describe a new mechanism for secure data exchange during the initial handshake to reduce some of the round-trip network transmission and associated latency for such short TCP transfers. This work shares many goals and challenges with T/TCP, which was previously in FreeBSD but suffered from some security vulnerabilities.

David Malone posted some thoughts on my Google+ post about how FreeBSD could implement the various changes. Maybe we could have some Summer of Code students work in this area this summer?