2025: The year of new beginnings

There have been a few positive, blog-worthy notes happening in my life the past few months.

The first is that by the grace of God, I found actual full-time employment working with a wonderful team doing important work. And, as luck would have it, they actually care about work/life balance as well. That means I will still have time to contribute to open source and work on the projects I am passionate about at the consultancy, too. It has been a joyful time, and I am very thankful to everyone involved.

The second, speaking of the consultancy: we are gearing up to launch an AGPL 3-licensed, Rails-powered e-commerce application. We hope to empower people with the best parts of Libre Software, including the ability of everyone to contribute to and audit the code base, and the ability of enterprising people to stand up their own servers. We also hope to be able to continue our efforts maintaining this software, in addition to the other work we do, by offering a hosted version for a nominal monthly fee. More details will be forthcoming in the next months on the WTI Blog, so be sure to check it out if you’re interested!

The final is a more personal note, and it is that my extended family is growing! I won’t go into exact details, but suffice to say, we feel very blessed to be welcoming more of us into the world 🙂

All in all, it has been quite a start to 2025 – and I’m hoping it gets even better from here. I am polishing up a few articles for Mac Monday and FOSS Friday, and I’ll be posting them soon. Until then, happy hacking!

An RSpec matcher for validating Rails <meta/> tags

While writing a project for Wilcox Tech that we hope to open source soon, I had reason to test the content (or, in some cases, presence) of <meta/> tags in a Rails app.

I came across a good start in dB.’s blog article, Custom RSpec Meta Tag Validator. However, the article was from 2013, did not follow the modern RSpec protocol, and the matcher felt a bit off for my purposes.

What I really wanted was a matcher that felt like the built-in ones to Rails, such as assert_dom or assert_select. So, using dB.’s XPath query as a starting point, I wrote a new one.

I am under the assumption that dB.’s snippet is licensed CC-BY as is the rest of his post. I am therefore licensing my matcher under a dual-license, as I believe this to be legal: CC-BY with my authorship, or MIT (the standard license of most Rails-related Ruby Gems). To fully comply, you will need to acknowledge him as well.

Usage looks like this:

  describe 'meta product properties' do
before { assign(:item, create(:item, description: 'This is a test description.', price: 50.50)) }

it 'has the correct Open Graph type' do
render_item
expect(rendered).to have_meta 'og:type', 'product'
end

it 'has the description set' do
render_item
expect(rendered).to have_meta 'og:description', 'This is a test description'
end

it 'has the price set' do
render_item
expect(rendered).to have_meta 'product:price.amount', '50.50'
end
end

describe 'meta image properties' do
context 'with no photos' do
it 'does not have an og:image property' do
assign(:item, create(:item))
render_item
expect(rendered).not_to have_meta 'og:image'
end
end

context 'with a photo' do
before do
item = create(:item)
photo = mock_item_photo_for item
photo.description = ‘My Photo Description’
assign(:item, item)
end

it 'has an og:image property' do
render_item
expect(rendered).to have_meta 'og:image'
end

it 'has the og:image:alt property set to the photo description' do
render_item
expect(rendered).to have_meta 'og:image:alt', 'My Photo Description'
end
end
end

And here’s my spec/support/matchers/have_meta.rb:

# frozen_string_literal: true

class HaveMeta
attr_accessor :expected, :actual, :key

def initialize(*args)
raise ArgumentError, 'Need at least one argument' if args.empty?

@key = args.shift
@expected = args.shift
end

def diffable?
@expected.present?
end

def matches?(page)
meta = page.html.at_xpath("//head/meta[@property='#{@key}' or @name='#{@key}']")
return false if meta.blank?

return true if @expected.blank?

@actual = meta['content']
@actual.include? @expected
end

def failure_message
return "expected meta property '#{key}'" if @expected.blank?

"expected '#{key}' to contain '#{@expected}' in '#{@actual}'"
end

def failure_message_when_negated
return "expected not to find meta property '#{key}'" if @expected.blank?

"expected '#{key}' to not contain '#{@expected}' in '#{@actual}'"
end
end

# Determines if the rendered page has a given meta tag.
def have_meta(*args)
HaveMeta.new(*args)
end

The complexities of enabling OpenCL support

Hello, and welcome back to FOSS Fridays! One of the final preparations for the release of Adélie Linux 1.0-beta6 has been updating the graphical stack to support Wayland and the latest advancements in Linux graphics. This includes updating Mesa 3D. It’s been quite exciting doing the enablement work and seeing Wayfire and Sway running on a wide variety of computers in the Lab. And as part of our effort towards enabling Wayland everywhere, we have added a lot of support for Vulkan and SPIR-V. This has the side-effect of allowing us to build Mesa’s OpenCL support as well.

With Adélie Linux, as with every project we work on at WTI, we are proud to do our very best to offer the same feature set across all of our supported platforms. When we enable a feature, we work hard to enable it everywhere. Since OpenCL is now force-enabled by the Intel “Iris” Gallium driver, in addition to the Vulkan driver, I set off to ensure it was enabled everywhere.

Once again with LLVM

Mesa’s OpenCL support requires libclc, which is an LLVM plugin for OpenCL. This library in turn requires SPIRV-LLVM-Translator, which allows one to translate between LLVM IR and SPIR-V binaries. The Translator uses the familiar Lit test suite, like other components of LLVM. Unfortunately, there were a significant number of test failures on 64-bit PowerPC (big endian):

Total Discovered Tests: 821
  Passed           : 303 (36.91%)
  Failed           : 510 (62.12%)

Further down the rabbit hole

Digging in, I noticed that we had actually skipped tests in glslang because it had one test failure on big endian platforms. And while digging in to that failure, I found that the base SPIRV-Tools package was not handling cross-endian binaries correctly. That is, when run on a big endian system, it would fail to validate and disassemble little endian binaries, and when run on a little endian system, it would fail to validate and disassemble big endian binaries.

I found an outstanding merge request from a year ago against SPIRV-Tools which claimed to fix some endian issues. Applying that to our SPIRV-Tools package allowed those tools to function correctly for me. I then turned back to glslang and determined that the standalone remap tool was simply reading in the binary and assuming it would always match the host’s endianness. I’ve written a patch and submitted it in my issue report, but I am not happy with it and hope to improve it before opening a merge request.

Back to the familiar

I began looking around at the failures in SPIRV-LLVM-Translator and noticed that a lot of them seemed to revolve around unsafe assumptions on endianness. There were a lot of errors of the form:

error: line 11: Invalid extended instruction import ‘nepOs.LC’

Note that this string is actually ‘OpenCL.s’, byte-swapped on a 32-bit stride. Researching the specification, it defines strings as:

A string is interpreted as a nul-terminated stream of characters. All string comparisons are case sensitive. The character set is Unicode in the UTF-8 encoding scheme. The UTF-8 octets (8-bit bytes) are packed four per word, following the little-endian convention (i.e., the first octet is in the lowest-order 8 bits of the word).

As an aside, I want to express my deep and earnest gratitude that standards bodies are still paying attention to endianness and ensuring their standards will work on the widest number of platforms. This is a very good thing, and the entire community of software engineering is better for it. This also serves as a great example of the wide scope in which we, as those engineers responsible for portability and maintainability, need to be aware and pay attention.

The standards language quoted above means that on a big endian system, it should actually be written to disk/memory as ‘nepOs.LC’. The Translator was not doing this encoding and therefore the binaries were not correct. I attempted to look at how the strings were serialised, and I believe I found the answer in lib/SPIRV/SPIRVStream.cpp, but it seemed like it would be a challenge to do things the correct way. I decided that for the moment, it would be enough to make the translator operate only on little endian SPIR-V files. After massaging the SPIRVStream.h file to swap when running on a big endian system, I significantly reduced the count of failing tests:

Total Discovered Tests: 821
  Passed           : 500 (60.90%)
  Failed           : 313 (38.12%)

However, now we had some interesting looking errors:

test/DebugInfo/X86/static_member_array.ll:50:10: error: CHECK: expected string not found in input
; CHECK: DW_AT_count {{.*}} (0x04)
         ^
<stdin>:73:33: note: scanning from here
0x00000096: DW_TAG_subrange_type [10] (0x00000091)
                                ^
<stdin>:76:2: note: possible intended match here
 DW_AT_count [DW_FORM_data8] (0x0000000400000000)
 ^

You will note that the failure is that 0x04 != 0x04’0000’0000. This is what happens if you store a 32-bit value into a 64-bit pointer using bad casting, which is very similar to the Clang bug I found and fixed last month. On a hunch, I decided to look at all of the reinterpret_casts used in the translator’s code base, and I hit something that seemed promising. SPIRVValue::getValue, where they were doing equally questionable things with pointers, was amenable to a quick change:

#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
    if (ValueSize > CopyBytes) {
      if (ValueSize == 8 && CopyBytes == 4) {
        uint8_t *foo = reinterpret_cast<uint8_t *>(&TheValue);
        foo += 4;
        std::memcpy(&foo, Words.data(), CopyBytes);
        return TheValue;
      }
      assert(ValueSize == CopyBytes && "Oh no");
    }
#endif

Unfortunately, this only fixed eight tests. The last time there was an endian bug in an LLVM component, it was somewhat easier to find because I could bisect the error. There was a version at which point it had worked in the past, so the change revealed where the error was hiding. Not so with this issue: it seems to have been present since the translator was first written.

Upstream has been apprised of this issue nearly a year ago, with no movement. The amount of time it would take to do a proper root cause analysis on a codebase of this size (65,000+ lines of LLVM plugin style C++) would be prohibitive for the time constraints on beta6. I’d like to work on it, but this is a low priority unless someone wants to contract us to work on it.

The true impact of non-portable code

There is no way to conditionalise a dependency on architecture in an abuild package recipe. This means even if we condition enabling OpenCL support in Mesa on x86_64, Mesa on all other platforms would still pull the broken translator as a build-time dependency. It is likely that libclc itself would not build properly with a broken translator, meaning Mesa’s dependency graph would be incomplete on every other architecture due to libclc as well.

Unfortunately, this means I had to make the difficult decision to disable OpenCL globally, including on x86_64. We never had OpenCL support, so this isn’t a great loss for our users, and we can always come back to it later. However, a few of Mesa’s drivers require OpenCL: namely, the Intel Vulkan driver, and the Intel “Iris” Gallium driver, which supports Broadwell and newer. On these systems, using the iGPU will fall back to software rendering only. We cannot offer hardware acceleration on these systems until we enable OpenCL, for reasons that are not entirely clear to me. It was possible before, and while I understand the performance enhancements with writing shaders this way, not providing a fallback option is really binding us here.

It is my sincere hope that software authors, both corporate and individual, begin to realise the importance of portability and maintainability. If the SPIRV-LLVM-Translator was written with more portability in mind, this wouldn’t be an issue. If the Intel driver in Mesa was written with fewer dependencies in mind, this wouldn’t be an issue. The future is bright if we can all work together to create good, clean code. I greatly look forward to that future.

Until then, at least software rendering should work on Broadwell and newer, right?

Fear and loathing in kernel building

After a long and somewhat unreasonable delay, I have returned to bringing the Adélie Linux kernel package up to date with the latest LTS release, which at the time of this writing is 6.6.58.

Presently, we use the 5.15 LTS branch. I am hoping to see us land the 6.6 branch so that we can have support for newer hardware, features, and devices. There is also hope that there will be significant DRM improvements, allowing a better desktop experience for everyone.

Unfortunately, when it came time to build the x86_64 package, it failed to build. The kernel now requires elfutils to build an x86_64 kernel, even with CPU security issue mitigations disabled – and we don’t want to disable them anyway.

The elfutils library, being part of the GNU project, heavily relies on APIs that are only available in the GNU libc. It is not possible to build elfutils on a musl system without multiple shim libraries, in addition to patching out other behaviour that cannot be stubbed:

I have always been somewhat mistrustful of including software in the critical path that is not maintained and audited. And building the kernel is the most critical path in a distribution.

For this reason, if we must include an argp implementation, I want to make sure it is the best possible implementation we can have.

“Choice”: slim to none

I found a number of libraries that implement the argp interface, but all of them present significant challenges:

  • libargp: Based on gnulib code. Last commit: 9 years ago. Does not accept issues on GitHub, and links to a Bitbucket repository that has been removed. 9 years ago, gnulib didn’t support musl at all. In addition, the lack of ability to contact upstream isn’t great.
  • argp-standalone (Niels Möller edition): Based on glibc, which is what we are trying to emulate anyway. Last release: 20 years ago, which is approaching legal drinking age in the US. Pass.
  • argp-standalone (Érico edition): Based on glibc again. Last commit: 2 years ago. Okay, reasonable. The issues and pull requests are piling up, though: the build system isn’t generated in the released tar files, the install target doesn’t work, a shared library isn’t supported, and more.
  • argp-standalone (org edition): Somewhat we are now three forks deep. Last commit: just three months ago! It uses Meson as a build system, it seems to care about portability… but it fails to support non-English locales. It also appears to have issues with building on GCC 14, possibly, which could present issues in the future. They have self-identified that they should make a new release in June 2023, which is great, but they didn’t actually do it.

Honestly, the last option in that list isn’t so bad. Translation support could always be added later. However, these packages need to be added to our very core critical path of early packages built for the whole system. For that reason, we need to be excessively picky about:

  • Quality of implementation — is this trustworthy enough to be at the very centre of our dependency graph?
  • Dependencies — the Meson build system is great, but that introduces either Python or Muon into the very early graph, before the kernel is even built, which means kernel headers aren’t available.
  • Infrequency of updates — realistically, since changing packages this deep in the graph necessitates rebuilds of everything, updates cannot be done frequently.

And it is for that reason I am annoyed at this situation. The kernel has introduced a build time dependency that, at least on musl libc systems, presents a lot of uncomfortable challenges.

Oh well, it could be worse. It could be the Rust compiler, which means making Rust, LLVM, and all of their dependencies part of the early graph, and meaning that Rust compiler updates have to pass through the Platform Group and cause full system rebuilds!

I’ll see myself out now.