Leaving the Linux distribution community

It is with a heavy heart that I am publicly announcing my immediate retirement from the Linux distribution community. This is not a decision I have arrived at lightly.

This year has been challenging for every living being on planet Earth. For me personally, 2020 has given me a lion’s share of financial challenges. I am financially hurting in ways that I could have never imagined.

Adélie Linux and its community has always been a passion of mine, and that passion is still intact. However, it is not responsible for me to continue to act as Project Lead when I need to focus on things that will make me enough money to survive.

Without the ability to be paid to work on Adélie Linux, there is not enough time in my day to devote to it and give it the attention and care that it so rightfully deserves. It is not fair to the community to have to wait weeks (or longer) for me to review merge requests, fix bugs, respond to issues, and so on.

My heart and soul will always belong to Adélie and the wonderful community we have built together over the past six years.

I will stay around long enough to properly transfer my Lead role to someone else, who can carry Adélie to great things in the future. However, I will not be contributing much in the way of patches or code to upstreams like musl or KDE going forward.

It is my sincere hope that when I am in a better financial situation, I can resume leadership of Adélie Linux, if the community should so desire. I cannot see the future and do not know when, or if, that could happen.

Until then, I wish all of you the very best and look forward to watching Adélie Linux continue to grow, from a distance.

With much respect,

–arw

Live from Adélie: Streaming Spotify on musl

Over the July 4th holiday weekend, I was working on a secret project. It was a resounding success and I can now announce to the world: Spotify runs on musl distributions!

This article will describe how I went about accomplishing this feat. If you just want to take Spotify for a test drive on your Adélie workstation or Void desktop, scroll to the “Instructions” heading.

Greetz

Thanks to these fine dwellers of IRC for helping make sense of the twisty mazes.

  • [[sroracle]]
  • Aerdan
  • cb
  • dalias
  • skarnet

gcompat 0.4.0: how very cash LC_MONETARY of you

The latest release version of gcompat did not get very far:

awilcox on laptop spotify % ./spotify
Segmentation fault (core dumped)

Inspecting the core file was minimally helpful:

Thread 1 "ld-musl-x86_64." received signal SIGSEGV, Segmentation fault.
0x0000000001d6ff60 in ?? ()
(gdb) bt
#0  0x0000000001d6ff60 in ?? ()
#1  0x00007fffffffd738 in ?? ()
#2  0x0000000001e94f13 in ?? ()
#3  0x00007fffffffd6d0 in ?? ()
#4  0x00007fffffffd738 in ?? ()
#5  0x0000000003e9d691 in ?? ()
#6  0x0000000003e9d698 in ?? ()
#7  0x0000000003e9d691 in ?? ()
#8  0x00007fffffffd738 in ?? ()
#9  0x00007fffffffdc40 in ?? ()
#10 0x0000000001ccd0f0 in ?? ()
#11 0x00007fffffffd7a0 in ?? ()
#12 0x0000000000000001 in ?? ()
#13 0x00007fffffffd720 in ?? ()
#14 0x0000000001e92e92 in ?? ()
#15 0x0000000003e9d691 in ?? ()
#16 0x0000000003e9d698 in ?? ()
#17 0x00007fffffffd738 in ?? ()
#18 0x00007fffffffd738 in ?? ()
#19 0x00007fffffffd760 in ?? ()
#20 0x0000000001e9dd51 in ?? ()
#21 0x00007fffffffdc40 in ?? ()
#22 0x0000000003e9b3e0 in ?? ()
#23 0x00007fffffffd7e8 in ?? ()
#24 0x00007fffffffd7b8 in ?? ()
#25 0x00007fffffffd7b8 in ?? ()
#26 0x00007fffffffd828 in ?? ()
#27 0x00007fffffffd810 in ?? ()
#28 0x0000000001e9df09 in ?? ()
#29 0x612f656d6f682f1a in ?? ()
#30 0x0000786f636c6977 in ?? ()
#31 0x0000000000000000 in ?? ()
(gdb) info registers
rax            0x54454e4f4d5f434c  6072345775086453580
rbx            0x53                83
rcx            0x53                83
rdx            0x2                 2
rsi            0x53                83
rdi            0x3e9b1a0           65647008
rbp            0x7fffffffd6f0      0x7fffffffd6f0
rsp            0x7fffffffd690      0x7fffffffd690
r8             0x0                 0
r9             0x0                 0
r10            0x1                 1
r11            0x7fffffffdb9c      140737488346012
r12            0x7fffffffd6b8      140737488344760
r13            0x7fffffffd6b0      140737488344752
r14            0x7fffffffd6a8      140737488344744
r15            0x7fffffffd6c0      140737488344768
rip            0x1d6ff60           0x1d6ff60
eflags         0x10202             [ IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

What are we trying to do? Looking at symbols present in the Spotify binary, this is actually part of the G++ runtime; specifically, std::ctype::do_tolower:

  1d6ff51:       48 8b 05 18 a8 12 02    mov    0x212a818(%rip),%rax        # 3e9a770 
  1d6ff58:       48 8b 40 70             mov    0x70(%rax),%rax
  1d6ff5c:       48 0f be cb             movsbq %bl,%rcx
=>1d6ff60:       8a 1c 88                mov    (%rax,%rcx,4),%bl
  1d6ff63:       89 d8                   mov    %ebx,%eax
  1d6ff65:       5b                      pop    %rbx
  1d6ff66:       c3                      retq

That rax value looks suspicious, and we can see if we translate it to ASCII that it is the little-endian representation of the string “LC_MONETARY”. We’re trying to reach 0x70 into a structure in %rax for a pointer value, but we’re getting a string instead.

It turns out that when libstdc++ is compiled on a glibc system, it will attempt to access the internal __ctype_* members in the locale_t of the current locale. musl’s locale_t is not ABI-compatible with glibc’s. In fact, it is only 48 bytes in length; 0x70 (or 112 bytes) is past the end of the locale object musl has provided it!

I implemented a stub locale module in gcompat, and… it tried to exec /proc/self/exe, which broke under the gcompat loader. This required me to write a patch interposing the execv* functions to catch this. And suddenly…

The lights that stop me turn to stone

Slight success! We have a Spotify window!

Spotify, but only a white screen

… but a blank white screen only. After some inspecting, I found that one of the many zygotes CEF was forking was segfaulting:

[158358.508029] ThreadPoolForeg[3230]: segfault at 0 ip 0000000000000000 sp 00007fe3203db448 error 14 in spotify[200000+1acd000]
[158365.067313] ThreadPoolForeg[3252]: segfault at 0 ip 0000000000000000 sp 00007f2d69c172e8 error 14 in spotify[200000+1acd000]
[158378.506832] ThreadPoolForeg[3312]: segfault at 0 ip 0000000000000000 sp 00007f52ed7c8448 error 14 in spotify[200000+1acd000]
[158383.654027] ThreadPoolForeg[3339]: segfault at 0 ip 0000000000000000 sp 00007fcb631eb2e8 error 14 in spotify[200000+1acd000]

I replaced libcef.so from the Spotify DEB package with a matched-version libcef.so from Spotify’s Open Source builds page. This allowed me to have more debugging symbols, and generating a core dump revealed:

Core was generated by `ld-linux-x86-64.so.2 --argv0 /usr/share/spotify/spotify --type=utility --field-'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
[Current thread is 1 (LWP 12774)]
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f79a8a3d671 in sqlite3MallocSize () at ../../third_party/sqlite/amalgamation/sqlite3.c:26957
#2  mallocWithAlarm () at ../../third_party/sqlite/amalgamation/sqlite3.c:26891
#3  sqlite3Malloc () at ../../third_party/sqlite/amalgamation/sqlite3.c:26913
#4  0x00007f79a8aff232 in sqlite3MallocZero () at ../../third_party/sqlite/amalgamation/sqlite3.c:27118
#5  pthreadMutexAlloc () at ../../third_party/sqlite/amalgamation/sqlite3.c:25755
#6  0x00007f79a8a4e9b2 in sqlite3MutexAlloc () at ../../third_party/sqlite/amalgamation/sqlite3.c:25298
#7  chrome_sqlite3_initialize () at ../../third_party/sqlite/amalgamation/sqlite3.c:24906
#8  0x00007f79a8a350bd in EnsureSqliteInitialized () at ../../sql/initialization.cc:55
#9  0x00007f79a8a30eb2 in OpenInternal () at ../../sql/database.cc:1357
#10 0x00007f79a8a30dfa in Open () at ../../sql/database.cc:270
#11 0x00007f79a8fb8de6 in InitializeDatabase () at ../../net/extras/sqlite/sqlite_persistent_store_backend_base.cc:99
#12 0x00007f79a8fb9751 in LoadNelPoliciesAndNotifyInBackground () at ../../net/extras/sqlite/sqlite_persistent_reporting_and_nel_store.cc:1041
#13 0x00007f79a5abe25b in Invoke<void (leveldb_proto::ProtoDatabaseSelector::*)(base::OnceCallback), scoped_refptr, base::OnceCallback > () at ../../base/bind_internal.h:498
#14 MakeItSo<void (leveldb_proto::ProtoDatabaseSelector::*)(base::OnceCallback), scoped_refptr, base::OnceCallback > ()
    at ../../base/bind_internal.h:598
#15 RunImpl<void (leveldb_proto::ProtoDatabaseSelector::*)(base::OnceCallback), std::__1::tuple<scoped_refptr, base::OnceCallback >, 0, 1> () at ../../base/bind_internal.h:671
#16 RunOnce () at ../../base/bind_internal.h:640
#17 0x00007f79a7776fa0 in Run () at ../../base/callback.h:98
#18 RunTask () at ../../base/task/common/task_annotator.cc:142
#19 0x00007f79a7792862 in base::internal::TaskTracker::RunBlockShutdown(base::internal::Task*) () at ../../base/task/thread_pool/task_tracker.cc:743
#20 0x00007f79a7792062 in RunTask () at ../../base/task/thread_pool/task_tracker.cc:598
#21 0x00007f79a77d42fb in RunTask () at ../../base/task/thread_pool/task_tracker_posix.cc:23
#22 0x00007f79a7791a43 in RunAndPopNextTask () at ../../base/task/thread_pool/task_tracker.cc:450
#23 0x00007f79a7798386 in RunWorker () at ../../base/task/thread_pool/worker_thread.cc:321
#24 0x00007f79a77980f4 in base::internal::WorkerThread::RunPooledWorker() () at ../../base/task/thread_pool/worker_thread.cc:223
#25 0x00007f79a77d4a05 in ThreadFunc () at ../../base/threading/platform_thread_posix.cc:81
#26 0x00007f79ac9fe2dd in ?? ()
#27 0x00007f79aca799e8 in ?? ()
#28 0x00007f7998247ce0 in ?? ()
#29 0x0000000000000000 in ?? ()
(gdb) frame 1
#1  0x00007f79a8a3d671 in sqlite3MallocSize () at ../../third_party/sqlite/amalgamation/sqlite3.c:26957
26957     return sqlite3GlobalConfig.m.xSize(p);

Inspecting the SQLite3 code, I realised that it was somehow getting a nullptr for the malloc_usable_size pointer. Further inspection revealed that this was not exactly the case:

(gdb) disassemble 0x7f79a77d5520
Dump of assembler code for function malloc_usable_size():
   0x00007f79a77d5520 :     push   %rbp
   0x00007f79a77d5521 :     mov    %rsp,%rbp
   0x00007f79a77d5524 :     mov    %rdi,%rsi
   0x00007f79a77d5527 :     mov    0x484a76a(%rip),%rdi        # 0x7f79ac01fc98 
   0x00007f79a77d552e :    mov    0x28(%rdi),%rax
   0x00007f79a77d5532 :    xor    %edx,%edx
   0x00007f79a77d5534 :    pop    %rbp
   0x00007f79a77d5535 :    jmpq   *%rax
End of assembler dump.

Looking at how the Chromium allocator works internally, the issue is that RTLD_NEXT won’t work on libraries loaded before libcef. And looking at the output of ldd spotify revealed both libm and libdl before libcef; musl always redirects these to libc for glibc ABI compatibility.

Using PatchELF to remove these two DT_NEEDEDs from the binary yielded a surprising result…

Music makes the people come together

Spotify on Adélie Linux
Spotify, playing “Rhinestone Eyes” by Gorillaz, on my Adélie laptop

It works! All the features I tested work: Spotify Connect, which means I can control the laptop’s playback using the iOS and Apple Watch apps; radio playback; Bluetooth speaker support.

Instructions

You will need to download the official Spotify 64-bit DEB. I have not tested this on a 32-bit system yet, but I see no reason it won’t work. Once you have the DEB, extract the data.tar.xz file somewhere. Use PatchELF on the Spotify binary as so:

$ patchelf --remove-needed libm.so.6 usr/share/spotify/spotify
$ patchelf --remove-needed libdl.so.2 usr/share/spotify/spotify

Move the extracted usr/share/spotify directory to your system’s /usr/share directory. For better integration, I moved the /usr/share/spotify/spotify.desktop file to /usr/share/applications. Then move the usr/bin/spotify link to /usr/bin.

Ensure that you have the latest gcompat installed. As I write this, only Adélie has the newest version in the current repo. I’ll be submitting merge requests to the distros I know that ship gcompat this week to ensure everyone has a chance to play around with the new bits.

Have fun!


Do you like running Spotify on musl? Or do you just like reading about fun hacks? Consider donating to Adélie to keep the fun going!

Cascading failures (or, why I did nothing this weekend)

This is a fun one.

To set the scene and provide information in temporal order, my Talos and WD Black NVMe device have never “gotten along” well. Frequently, the device would fail to train for whatever reason. Calling reboot from a Petitboot shell with fast-reset enabled was enough to fix this, so I didn’t think all that much about it.

Late in the night Thursday (or early in the morning Friday, if you prefer), I was reading a few articles before I went to bed. At 01:36:13, an animal darted in front of a car about two miles away, causing the car to crash into a power pole. This caused a serious power surge as the lines came down on to each other (and the car’s frame).

My office is was protected by an APC BX1500G combination battery backup (SPS, not UPS) and surge protection unit. The power surge was severe enough that the unit failed. The lights, the Power Mac G5, and the Talos in my office went immediately dark as the alarm went off making a continuous noise while “F04” and “See Manual” flashed on the display. This code means “Clamp Short”, and means that the varistor that is supposed to arrest surges had become permanently ‘stuck’ in arrest mode.

My first priority was to ensure the integrity of my hardware, so I dug out a spare APC SPS and put the battery from the now-failed one in it. I powered on my Talos and while it seemed to IPL fine, lspci in Petitboot did not show my NVMe device as present. Looking in Hostboot, it now wasn’t even failing to train — the slot may as well have been unoccupied. I tried both a fast-reset and a full system power cycle to no avail.

The next day, I attempted to swap slots, thinking the PCIe slot that the NVMe device was connected to may have been damaged by the power surge. I swapped the NVMe and the sound card. The sound card worked in the slot formerly occupied by the NVMe, but the NVMe still wouldn’t come up in the slot formerly occupied by the sound card. Now came the worrying part: did the M.2 to PCIe adaptor fail, or did the NVMe media itself fail?

I went to two of our local computer stores to buy some parts. I decided that since I was already going to need to do a full disk swap (even if I could recover the data off the NVMe media), it would make sense to add SATA media as well. I had a 256 GB SATA SSD laying around that was supposed to be put in my Xeon until it failed. I found an “open box” Seagate 1 TB HDD for 30 USD at DISC Surplus Computers in Sand Springs. And I found a new 4-port Marvell chipset based SATA controller for 34 USD at Wholesale Computer Supply in Tulsa. I also had a 3 port USB 3.0 controller card sitting on a shelf since the slot it was meant to go in was occupied by a 2-slot Radeon that was used for big endian amdgpu.ko porting. I went ahead and shelved that Radeon (and the big endian porting effort, for now) and used the CPU 1 slot for SATA and the CPU 2 slot for the USB card.

I then turned my attention to recovering the NVMe media. I brought out my old Intel Reference Platform board, a DP43TF with developer firmware, and put the NVMe adaptor card with media in to it. Unfortunately, the DP43TF only has one PCIe slot larger than x1, and it’s the x16 typically used for a GPU. Since NVMe is x4, I had to find a PCI GPU. I pulled a GeForce 8400GS out of one of our Pentium 4 test boxes and attempted to boot the Adélie 1.0-BETA4 live CD.

Our Live CD does not support the JMicron PATA controller that the DP43TF’s DVD drive was connected to. I ended up using a USB optical media, but I also could have used SATA optical media. The CD I was attempting to use was scratched, and it refused to finish booting (the scratched section appears to have contained OpenRC). I had to find a computer capable of burning media, which was no small task since most newer computers don’t support writing optical media and most of my computers have marginal USB support at best.

One of our community members reminded me that the PowerBook G4 has a SuperDrive, which I used to burn a fresh x86_64 BETA4 CD. Finally booted, I noticed the NVMe was present but throwing occasional controller reset errors. I’m not sure if this was due to media degradation or the fact it was a Gen3 NVMe in a Gen1 PCIe slot. At any rate, I used dd to make a full clone of the NVMe to the 1 TB Seagate disk, and then put that in the Talos. A gracious member of the Adélie Linux community donated the funds needed to replace the NVMe with a Samsung 970 EVO Pro of the same size.

Yesterday I copied the data off the Seagate 1 TB to the new Samsung NVMe. Everything is working quite well, and the Samsung is much faster than the Western Digital; 712 MB/s uncached write vs 303 MB/s on the WD. The additional space on the Seagate can be used for further testing, and for possible expansion of Adélie to more platforms — I may post more on that later. 🙂

This was a very interesting experience for me. It’s been many years since I’ve seen a cascade of failures like this: car accident breaks APC SPS, which breaks NVMe marginally, which shows an issue booting our live CD on a specific computer. It also gave me a reason to re-catalogue a lot of the hardware I have on hand for testing purposes, and to know what needs fixing and replacing. And most importantly, it made me realise I need to perform weekly backups instead of semi-annual backups.

I want to especially thank the members of the Adélie Linux community that helped with this process, not only financially but with techniques and ideas to make this go well. My workstation is better than ever, and now I can get even more done for libre software. You rock!

Libre software funding and market abuse

I’ve just read a troubling article from the developer of Aether.

What troubles me is not so much the differences we have, which likely stems from being in vastly different segments of libre software (he’s doing social media, and I’m in low-level systems). What troubles me is that he claims that it is an economic imperative to work at FAANG or a Silicon Valley startup for a number of years before working on libre software full time, and all of this on a false pretense.

Encouraging someone to have a long-term savings and funding plan is a good idea, perhaps even a great idea. It falls apart when he states that working for startups or FAANG are the only or best way you can earn that money — and then claiming that you could make 250,000 USD per month working at them[1]. This is flawed mathematics at best, and actively malicious to society at worst.

Most people are going to have to work at a company before founding their own, unless they have funding from external sources (be it angel investors, VC, inheritance, family and friends, etc). This is not what I take issue with. This issue I have is this false dichotomy that you can only make good money by working at FAANG or an abusive startup. As someone who actually has worked at two different startups in their life, I take personal issue with the way startup culture exploits its workers, investors, and society at large. This doesn’t even go in to how startup culture can also be bad for business.

This abuse is ingrained in to most, if not all, of the industry of Big Tech, ala FAANG. You might be able to wrestle some division of Apple, or the security research division of Netflix, out of this hole and point to them as an example of where I’m wrong. Oh, dear reader — even if you have the privilege of working in an area of the company that isn’t abusing its workers, you’re still complicit in that abuse by furthering the company’s mission and control over some part of the industry at best, and indirectly engaging in it yourself at worst.

It’s time for the computer industry to rise up and work at companies that respect their workers, and society. Quit FAANG like a bad habit, and find a company to work for that doesn’t trade in the abuse of power and users as its main product. And where those don’t yet exist, it’s time to found some. At the end of the day, we are all defined by the actions we take — which side of history do you want to be on?


[1]: And I quote, “If you can make $10,000 a month from donations doing open source work, I can guarantee you that your salary in any large tech company (or even startup) would be much more — to the tune of 10x to 25x.” The firm Indeed claim, at time of writing, that the highest paid research engineers at Google make about 246,000 USD per year; other companies pay even less. That’s 20,500 USD per month, or just about twice the amount he claims you might be able to make on donations doing ‘open source work’. And this doesn’t require you to further Google’s surveillance state.