Libre software and moral absolutism

I’ve been pondering what to write in my blog now that I no longer lead Adélie Linux. There isn’t a great amount of things to write about around my day job making libre networking software, and there’s even less to write about my “spare time” projects. (Mostly because that spare time is spent playing with my cat, my Mum, or my video games.)

I had originally planned for this blog to cover travel and photography around Oklahoma in addition to tech. For obvious reasons, this isn’t something I’m able to do at this time. There are only so many ways I can photograph the gardens around my flat, and inter-city (let alone inter-state) travel is not exactly possible in 2020.

There are actually a lot of tech subjects I would love to cover, but a lot of them revolve around non-libre software. After spending so many years in the communities I have, there seemed a very real sense of shame in the thought of writing about it. However, the more I’ve thought about it, the more I realise it is not shame I feel.

It is embarrassment. It is a sense of letting my friends and colleagues down by feeling anything other than contempt for running proprietary software.

But I do. I absolutely enjoy using my iPad Air 2. And I have dozens – maybe hundreds – of articles in me about retro computing with classic versions of Windows, Mac OS, and Solaris. Game consoles are another fun hobby of mine that I want to share more widely.

So this is the dilemma I face. To continue to write nothing but articles about Linux would be to hide a part of me that is real. Is that what I want from my life? I don’t think it is.

This is not a rebuke of libre software, nor is this some admission there is nothing left to write about it. The future will always be bright with libre software, and I have plenty of articles left in me with regards to libre software, I’m sure. But I think it’s about time for me to admit to myself, and the world, that I have other technological passions as well. And it’s time to stop being embarrassed about it.

Leaving the Linux distribution community

It is with a heavy heart that I am publicly announcing my immediate retirement from the Linux distribution community. This is not a decision I have arrived at lightly.

This year has been challenging for every living being on planet Earth. For me personally, 2020 has given me a lion’s share of financial challenges. I am financially hurting in ways that I could have never imagined.

Adélie Linux and its community has always been a passion of mine, and that passion is still intact. However, it is not responsible for me to continue to act as Project Lead when I need to focus on things that will make me enough money to survive.

Without the ability to be paid to work on Adélie Linux, there is not enough time in my day to devote to it and give it the attention and care that it so rightfully deserves. It is not fair to the community to have to wait weeks (or longer) for me to review merge requests, fix bugs, respond to issues, and so on.

My heart and soul will always belong to Adélie and the wonderful community we have built together over the past six years.

I will stay around long enough to properly transfer my Lead role to someone else, who can carry Adélie to great things in the future. However, I will not be contributing much in the way of patches or code to upstreams like musl or KDE going forward.

It is my sincere hope that when I am in a better financial situation, I can resume leadership of Adélie Linux, if the community should so desire. I cannot see the future and do not know when, or if, that could happen.

Until then, I wish all of you the very best and look forward to watching Adélie Linux continue to grow, from a distance.

With much respect,

–arw

Cascading failures (or, why I did nothing this weekend)

This is a fun one.

To set the scene and provide information in temporal order, my Talos and WD Black NVMe device have never “gotten along” well. Frequently, the device would fail to train for whatever reason. Calling reboot from a Petitboot shell with fast-reset enabled was enough to fix this, so I didn’t think all that much about it.

Late in the night Thursday (or early in the morning Friday, if you prefer), I was reading a few articles before I went to bed. At 01:36:13, an animal darted in front of a car about two miles away, causing the car to crash into a power pole. This caused a serious power surge as the lines came down on to each other (and the car’s frame).

My office is was protected by an APC BX1500G combination battery backup (SPS, not UPS) and surge protection unit. The power surge was severe enough that the unit failed. The lights, the Power Mac G5, and the Talos in my office went immediately dark as the alarm went off making a continuous noise while “F04” and “See Manual” flashed on the display. This code means “Clamp Short”, and means that the varistor that is supposed to arrest surges had become permanently ‘stuck’ in arrest mode.

My first priority was to ensure the integrity of my hardware, so I dug out a spare APC SPS and put the battery from the now-failed one in it. I powered on my Talos and while it seemed to IPL fine, lspci in Petitboot did not show my NVMe device as present. Looking in Hostboot, it now wasn’t even failing to train — the slot may as well have been unoccupied. I tried both a fast-reset and a full system power cycle to no avail.

The next day, I attempted to swap slots, thinking the PCIe slot that the NVMe device was connected to may have been damaged by the power surge. I swapped the NVMe and the sound card. The sound card worked in the slot formerly occupied by the NVMe, but the NVMe still wouldn’t come up in the slot formerly occupied by the sound card. Now came the worrying part: did the M.2 to PCIe adaptor fail, or did the NVMe media itself fail?

I went to two of our local computer stores to buy some parts. I decided that since I was already going to need to do a full disk swap (even if I could recover the data off the NVMe media), it would make sense to add SATA media as well. I had a 256 GB SATA SSD laying around that was supposed to be put in my Xeon until it failed. I found an “open box” Seagate 1 TB HDD for 30 USD at DISC Surplus Computers in Sand Springs. And I found a new 4-port Marvell chipset based SATA controller for 34 USD at Wholesale Computer Supply in Tulsa. I also had a 3 port USB 3.0 controller card sitting on a shelf since the slot it was meant to go in was occupied by a 2-slot Radeon that was used for big endian amdgpu.ko porting. I went ahead and shelved that Radeon (and the big endian porting effort, for now) and used the CPU 1 slot for SATA and the CPU 2 slot for the USB card.

I then turned my attention to recovering the NVMe media. I brought out my old Intel Reference Platform board, a DP43TF with developer firmware, and put the NVMe adaptor card with media in to it. Unfortunately, the DP43TF only has one PCIe slot larger than x1, and it’s the x16 typically used for a GPU. Since NVMe is x4, I had to find a PCI GPU. I pulled a GeForce 8400GS out of one of our Pentium 4 test boxes and attempted to boot the Adélie 1.0-BETA4 live CD.

Our Live CD does not support the JMicron PATA controller that the DP43TF’s DVD drive was connected to. I ended up using a USB optical media, but I also could have used SATA optical media. The CD I was attempting to use was scratched, and it refused to finish booting (the scratched section appears to have contained OpenRC). I had to find a computer capable of burning media, which was no small task since most newer computers don’t support writing optical media and most of my computers have marginal USB support at best.

One of our community members reminded me that the PowerBook G4 has a SuperDrive, which I used to burn a fresh x86_64 BETA4 CD. Finally booted, I noticed the NVMe was present but throwing occasional controller reset errors. I’m not sure if this was due to media degradation or the fact it was a Gen3 NVMe in a Gen1 PCIe slot. At any rate, I used dd to make a full clone of the NVMe to the 1 TB Seagate disk, and then put that in the Talos. A gracious member of the Adélie Linux community donated the funds needed to replace the NVMe with a Samsung 970 EVO Pro of the same size.

Yesterday I copied the data off the Seagate 1 TB to the new Samsung NVMe. Everything is working quite well, and the Samsung is much faster than the Western Digital; 712 MB/s uncached write vs 303 MB/s on the WD. The additional space on the Seagate can be used for further testing, and for possible expansion of Adélie to more platforms — I may post more on that later. 🙂

This was a very interesting experience for me. It’s been many years since I’ve seen a cascade of failures like this: car accident breaks APC SPS, which breaks NVMe marginally, which shows an issue booting our live CD on a specific computer. It also gave me a reason to re-catalogue a lot of the hardware I have on hand for testing purposes, and to know what needs fixing and replacing. And most importantly, it made me realise I need to perform weekly backups instead of semi-annual backups.

I want to especially thank the members of the Adélie Linux community that helped with this process, not only financially but with techniques and ideas to make this go well. My workstation is better than ever, and now I can get even more done for libre software. You rock!

Speaking with authority

I’ve just spent the better part of three hours arguing on IRC about Let’s Encrypt clients. After speaking with two others, I realised that nobody who I spoke with before knew their facts were facts.

Different people all told me various incorrect information, such as:

  • No ACME client supports doing a manual DNS TXT record for verification bootstrapping until you have an httpd up. (acme.sh, dehydrated, and certbot all support this.)
  • LE needs IPv4 for the HTTP challenge. (It worked fine for me with an IPv6-only host. I’m not sure which it would prefer if it had the choice between v6 and v4, or if it’d use Happy Eyeballs and connect to whichever responded first.)
  • It isn’t possible to step through the process manually as a debugging aid; you have to rely on your ACME client’s debugging facilities. (https://gethttpsforfree.com/ helped a tonne.)
  • You have to be listening for the HTTP challenge on port 80. (The TLS-ALPN-01 challenge type exists which will only ever use port 443 for the challenge.)
  • Critique of how I isolated each service on a separate VM so that they would be more secure, saying it was “over-convoluted”.

All of these people spoke with an air of authority. They sounded like they genuinely knew what they were talking about, and were trying to inform me of the limitations of ACME clients / Let’s Encrypt. Nobody actually knew the answer, but they thought they were right because it fit their experience.

When I speak to people about technology, whether in real life, on IRC, or on a mailing list, I always try to make the limitations of my knowledge clear. Many is the time I have said “I’m not sure if you can do that”, or “I don’t know if X supports Y“. And sure, on occasion I will say “I have never done Y and last I knew X couldn’t do that”. Note, however, that all of these are presented as statements from my hive of knowledge, and not presented as plain facts. The art of communication seems to be lost on far too many in the technology field.

There is no shame in not knowing the answer to something. It is certainly more helpful to say “I’m not sure you can do that”, instead of “you can’t do that”. I almost gave up on Let’s Encrypt and wrote another article on how useless it is, because I was told by people who used Let’s Encrypt that it had all of these limitations that made it seem useless, arbitrary, and ridiculous to me. (Thanks to Rich Felker of musl, and Freeyorp, for setting the record straight.)

Maybe we need a new term for this. “Organic FUD”, since it comes from the community itself? At any rate, I hope that in the future, more people note the limitations of their knowledge up-front rather than sounding authoritative about a subject they know little about.