pleroma.debian.social

2025/05/21 3:02:45 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Oh, look, a rabbit hole.

2025/05/21 3:18:33 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

For a bit of context: I've been playing with go-away lately, trying to move the passive detection I built in Caddy to it. Progress is being made.

But then I came accross a thing that feels like it would be even closer to what I had in mind. Different tradeoffs, but ones that feel less pricey, perhaps.

So now I'm down the rabbit hole of figuring out whether my gut feeling is right.

2025/05/21 3:42:36 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

What if...

example.com {
  iocaine /run/iocaine.socket
  reverse_proxy whatever
}

In other words, what if iocaine (or a separate service, but in this case, iocaine itself feels more practical) could do the classification too?

And what if that classification was somewhat programmable?

I could then build a Caddy module that serializes the request headers, sends them over, and lets iocaine decide whether it wants to respond, or lets caddy pass through to the next handler.

2025/05/21 3:46:26 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

"What do you mean by 'somewhat programmable'?"

filtermap main(request: Request) {
  if request.user_agent.contains("GPTBot") {
    accept garbage.generate(request)
  } else {
    reject
  }
}

2025/05/21 3:52:22 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

So, first things first: lets build a small crate that lets me implement the classification rules I use now (but better), and see how convenient that is, and how fast.

Then figure out how to wrap that in a Caddy module.

2025/05/21 4:11:04 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Did I say iocaine /run/iocaine.socket?

How about this:

reverse_proxy iocaine:port {
  @fallback status 421
  handle_response @fallback {
    reverse_proxy whatever
  }
}

Looky. No Caddy module needed at all.

2025/05/21 4:15:24 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

I like this rabbit hole. It is comfy. There are books here, ambient lighting, and a soothing deep voice is calling out to me to explore deeper.

2025/05/21 4:15:55 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

It promises iocaine 3.0.

2025/05/21 4:16:34 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Or maybe 2.2, because there's nothing breaking here.

2025/05/21 4:20:51 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

This feels like alien technology. My classification rules will be so much simpler, and more correct. And best of all:

EASILY SHAREABLE AND REUSABLE

2025/05/21 4:24:59 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Like, I can make a number of helper functions, collected in a package, and anyone can import that package, and use whichever functions they like.

Or they can just import my entire classification as-is.

Or whatever. The possibilities are endless.

Now I "just" need to verify this would work as I imagine it would.

2025/05/21 5:42:46 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Eh. Not going as smoothly as I had hoped. The idea is solid, though. But I might need to fiddle with some implementation details.

2025/05/21 5:46:26 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

So, Big Reveal™, I guess: I've been looking at roto as a language to write classification rules for iocaine in.

There's some very neat things in there, but I'm having trouble working with strings.

2025/05/21 5:54:39 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

So, I guess I'll go look for a language I can use for classification purposes, which I can embed in Rust. I don't have as high requirements as NLNet did for Roto.

I'm fine with dynamically typed languages, and it is okay if it is not the fastest. Chances are, it will be faster than my Caddyfile contraption no matter what. So what I'm looking for is a language that feels fine, and one where the embedding part is reasonable too.

2025/05/21 5:58:56 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

What I am not looking for is to embed another language in Rust. PyO2, mlua and the like are not an option. I want a rust-y thing.

2025/05/21 6:18:19 PM UTC

Jonathan Dowland jmtd

@algernon on the basis of the penultimate toot, I’d recommend looking at crm114. But it likely fails the criteria in the toot I’m replying to

replies: 0
announces: 0
likes: 1

2025/05/21 6:22:21 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Currently looking at: Rhai, Rune, and Dyon.

Unsure about Dyon, feels a bit too complex for my tastes, and I found the documentation of Rhai and Rune more approachable. Down to two.

Rhai's docs say: "No first-class functions – Code your functions in Rust instead, and register them with Rhai". On the other hand: functions.

Rune feels more like the thing I'm looking for.

But these are just gut feelings. Based on how the embedded language looks and feels, all three would be acceptable. So I'll look at how to add them to iocaine, and see which one fits best.

I'll start with Rune.

2025/05/21 6:33:20 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

...and I'm exploring Rhai instead, because the Rune book doesn't show a minimal "how to embed Rune" example. The first such example includes termcolor, and says it can be made simpler if that's not needed, but doesn't show how.

Rhai's documentation has an embedding guide. That helps a ton.

2025/05/21 8:48:15 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

I think Rhai will work.

let user_agent = headers.get("user-agent");

user_agent.contains("GPTBot") ||
    (user_agent.starts_with("Mozilla/") && user_agent.contains("Chrome"))

If the script evaluates to true, iocaine will serve it garbage. If not, iocaine returns a 421, and the reverse proxy can serve the real stuff.

I now need a couple more helpers on the rust-side (regexp matching, for example), and then I can implement the "stdlib" on the Rhai side.

Why on the Rhai side? Because I want the stdlib to be separate, so that I can improve and release it independently, and an stdlib update shouldn't need an iocaine rebuild, just a restart (or later - once I implement it - reload).

2025/05/21 11:40:41 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Ooof. Initial implementation works, but if the classifier script is in play, there's a massive slowdown. From ~93k req/sec down to ~38k req/sec.

The good news is, if no classifier is set, the speed is indistinguishable from the speed of version 2.1.0.

Nevertheless, I'll try to make it a little faster.

2025/05/22 12:00:12 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Up to 50k req/sec now, and I'm not done yet.

2025/05/22 12:07:27 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Hrm. Not sure I can make it much faster tonight... that would require more lifetime juggling than I have the capacity for.

2025/05/22 12:27:37 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Hmm!

A simple classifier that always returns true (or false) is barely slower than running without one (~86k vs ~93k req/sec). So it might be the helper functions that slow things down.

That would make sense, actually. And I have ways to make them faster.

2025/05/22 12:29:25 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Hrm. Or maybe I won't... user_agent == "fasthttp" drops it down to ~50k req/sec. The regexp matcher (which I suspected to be the culprit) doesn't make much of a difference it seems.

2025/05/22 12:31:27 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

OTOH, if I return false, and do not generate the garbage, speed goes up to 73k req/sec.

That... makes sense, too.

2025/05/22 12:45:21 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Time to sleep. I pushed the feature/scripting branch meanwhile.

There are no docs, nor tests, but it's reasonably straightforward: put a rhai script somewhere, set [server].classifier to its path, and you're done.

The script needs to return a bool (true to make iocaine generate garbage, false to make it signal the reverse proxy to serve the real stuff), and has access to the headers variable. It has a .get(name) method with which one can retrieve any header.

Strings also have a few extra methods on top of what Rhai provides out of the box: .matches(pattern), .index_of(pattern), and .capture(pattern, group_name).

They kinda do what you would expect.

These should be enough to re-build my Caddyfile classifier in Rhai. Then, I'll have to benchmark which one is faster, I guess. That will be a bit rough, but I'll figure something out.

If iocaines built-in classifier is faster, then I won't care much (for now) about speeding it up further.

2025/05/22 12:48:32 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

I'm half-tempted to extend this further, and make it not just a classifier, but move some of the decision logic into this script, too.

Like, the templates are currently responsible for dispatching based on... whatever they wanna dispatch on. But what if it was the Rhai script that would do that?

If I want to make it possible to return custom HTTP statuses, perhaps even add custom headers and whatnot, then letting the Rhai script control more of iocaine would make sense.

BUT! That would likely slow things down considerably. I guess I'll stick to classification for now.

2025/05/22 12:56:37 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

What I will have to do, though, is change the script response from bool to Verdict, where Verdict would be an enum-like thing.

Basically, I want to be able to send information to the reverse proxy, via response headers. So I need Verdict::Bad(REASON) and Verdict::Good(REASON). What REASON should be, I don't know yet. Maybe just a string.

But this is a problem for after-sleep. Or during-sleep. One of those two.

2025/05/22 1:09:07 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

...and I have a few ideas for the stdlib, too.

Great fun will be had tomorrow! Can't wait!

2025/05/22 8:08:14 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Small optimizations were made, speed is up to 51k req/sec. Still considerably slower than the ~93k without the classifier script. Will have to compare it against the Caddy-based classifier soonish.

I also implemented a verdict module, so the script can return true / false or verdict::GOOD / verdict::BAD, or even verdict::good("reason") / verdict::bad("reason"). If a reason is given, iocaine will send it upstream in the x-iocaine-reason header.

Next steps:

Tests
Benchmarks
Benchmarks against Caddy
Commit history cleanup
Merge into main

2025/05/22 8:59:58 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

running 7 tests
test means_of_production::test::test_classifier_regex_error ... ok
test means_of_production::test::test_classifier_bad_return_type ... ok
test means_of_production::test::test_classifier_static_verdict_const ... ok
test means_of_production::test::test_classifier_static_bool ... ok
test means_of_production::test::test_classifier_static_verdict_with_reason ... ok
test means_of_production::test::test_classifier_header_get ... ok
test means_of_production::test::test_classifier_regexps ... ok

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured; 12 filtered out; finished in 0.01s

2025/05/22 9:23:29 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

     Running benches/classifier.rs (target/release/deps/classifier-fe88b2cd73d9304d)
Timer precision: 30 ns
classifier             fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ classifier_branchy  164.1 µs      │ 554.2 µs      │ 169.2 µs      │ 173 µs        │ 28795   │ 28795
╰─ classifier_static   177.5 ns      │ 12.71 µs      │ 183.8 ns      │ 186.7 ns      │ 1591640 │ 25466240

Although, now I wonder what makes classifier_branchy so slow here. Might look into that later.

2025/05/22 9:24:19 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Ok, now comes the hard part: benchmarking against Caddy.

2025/05/22 9:49:32 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Initial results are unfortunately not very promising, though the classification script I'm testing right now is rather primitive, a single regexp, rather than the complicated mapping stuff I do on Eru.

regexp matching via Rhai is slow, though, that's for sure.

I guess I'll have to benchmark with my entire classification engine, to have fair numbers.

2025/05/22 10:04:17 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

I has a sad. Even when using my elaborate Caddyfile-based classifier, that's almost twice as fast as even the most trivial iocaine-based classifier.

I suspect that serializing & deserializing all headers all the time adds considerable overhead.

And I discovered another downside: if iocaine is doing the classification, then implementing rate limits for the maze alone becomes a whole lot more complicated.

2025/05/22 10:05:02 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

On the flip side, the iocaine-classifier is far more expressive. It's just a tad slow.

2025/05/22 10:06:19 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

You know what... I'll try to rewrite my classifier in Rhai anyway, to see how much slower that is.

That'll give me more datapoints about where to optimize, too.

2025/05/22 10:21:55 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Yikes. We're entering 4k req/sec territory. That is prohibitively slow, unfortunately.

2025/05/22 10:26:02 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

There's a non-negligible Caddy overhead here (about 1k req/sec, slightly less, maybe), but bombardiering the classifier-enabled iocaine is still slow as a snail.

Maybe I should profile it... but I'll go with a gut feeling first.

2025/05/22 10:37:17 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

An annoying part of this benchmarking stuff is that compiling iocaine in release mode takes an eternity and a half.

2025/05/22 10:39:47 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

sigh

Unfortunately, gut feeling was wrong. I figured that passing all of the headers into the Rhai scope would be costly, so I tried passing the user agent only - but that didn't make a difference as far as speed goes.

2025/05/22 11:34:45 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

At this point, I can't switch, even if the scripting interface is more expressive, because this kind of slowing down is not acceptable.

So on to the next idea: what if I used something else than Rhai? I can give Roto another try.

I have a slightly better understanding how it all fits into iocaine, so maybe I can make that work.

2025/05/22 11:45:42 AM UTC

Wolf480pl wolf480pl@mstdn.io

@algernon I admire your ability to go down a wrong path, spend 2 fays of effort on it, realize it won't work, and try again without rage quitting

2025/05/22 11:52:24 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

@wolf480pl There's no reason to rage: I verified that the idea works, and that I can let iocaine do the classification without having to write a Caddy module. This is a huge leap forward, because I've been trying to do this for the past month.

Yes, the language I chose to embed ended up being prohibitively slow. But that's a tiny implementation detail. The bigger take away is that the idea works, and I don't need a Caddy module. That concludes a month of pondering!

I do need to find something faster, but... two days compared to the past month is a drop in the bucket. I made huge progress! I'm gonna drink a celebratory coffee.

2025/05/22 11:54:14 AM UTC

Wolf480pl wolf480pl@mstdn.io

@algernon hmm okay

But if you know the idea works, what motivates you to keep going?

2025/05/22 11:55:35 AM UTC

Wolf480pl wolf480pl@mstdn.io

@algernon well ok I guess there's the suspense of "will I be able to get >50k req/s out of it"

2025/05/22 11:58:26 AM UTC

Jeff L baturkey@flumph.masto.host

@wolf480pl @algernon why would you stop at this point?

2025/05/22 11:58:32 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

@wolf480pl I want to replace my Caddyfile-based classification, because while it works, there's multiple problems with it:

It is almost completely unshareable.
There are things I can't do within the constraints of a Caddyfile, which I could if I used a more suitable programming language.

The motivation is to move the classification out of Caddyfile, and make it at least as performant as that was, because that lets me do things I can't do now, and also lets me share my setup more easily, in a way that people can mix and match various pieces together.

2025/05/22 11:56:50 AM UTC

Xavier xavier@pony.social

@algernon while I see how scripting can be a fun engineering challenge, I'd be better served as a user by compatibility with anubis' rulesets syntax:

if you released any scripting capabilities, I'd just use it to translate these rules
pure-rust evaluation of a limited set of conditions should perform better
I cannot think of a lot of use-cases that cannot be matched by a config file

2025/05/22 12:07:23 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

@xavier Various importers are next on the roadmap =)

The reason I'm aiming at an embedded language is because I'd like to write rules like this:

if user_agent.index_of("Chrome/) > user_agent.index_of("AppleWebKit/) {
  return Verdict::BAD
}

...and subtleties like this. Yes, this can be expressed in a yaml config file too, but that quickly ends up in a yaml soup, a huge primordial mess, much like my current Caddyfile.

My gut's telling me that I can achieve acceptable speeds with an embedded language too, and can save myself from yaml. If that's doable, I can still bolt various translators on top to make the user experience nicer too.

But if I fail to achieve good speeds with a language, I will fall back to Anubis-style rulesets.

2025/05/22 12:19:07 PM UTC

Wolf480pl wolf480pl@mstdn.io

@baturkey @algernon because the mystery has been solved

2025/05/22 12:22:52 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

@wolf480pl @baturkey That's only part of it, though. While the mystery is solved, the goal has not been achieved yet.

Also, someone stole my brakes. And I got high from inhaling too much Rust. So I'm just standing here, flailing my arms while iocaine compiles in release mode, and every time minor progress is made I'll do a little dance.

(If you're questioning my sanity, please tell it to come home, I haven't seen it in a few decades.)

2025/05/22 12:39:36 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Initial hackery with Roto: incredibly naive implementation that rejects everything, and rebuilds the entire runtime on each request: ~25k req/sec (in release mode).

Same thing but building the runtime once, and then just calling the function for each request: 49k req/sec in debug mode. 277k req/sec in release mode.

This is promising. But the question remains: can I build it in a way that lets me implement the rules I want? That's where I bled out last night when I first tried Roto.

2025/05/22 2:08:58 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Strings are still a bit of a bitch with Roto.

2025/05/22 2:21:55 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

PROGRESS!

2025/05/22 4:21:58 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

TEHEHEHEHE.

filtermap main(request: Request) {
  let user_agent = request.header("user-agent");

  if user_agent.equals("curl/8.13.0") {
    reject "curl"
  } else {
    reject "not curl"
  }
}

74k req/sec in debug mode. The language is a bit more verbose, less nice than Rhai, but god damn, it is fast.

2025/05/22 5:02:30 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Huh, interesting. ~99k req/sec against the same script when bombardiering iocaine directly. ~23k req/sec when bombardiering through Caddy.

Just made sure, classification on that path is disabled. That's a significant overhead.

2025/05/22 5:07:49 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Disabled all other snippets: ~38k. Similar thing with the Caddyfile classifier: ~49k.

This is interesting, because I know that iocaine can be much faster: bombardiering it directly is ~99k. So this overhead is on the Caddy side.

http://localhost:38081 {
	reverse_proxy 127.0.0.1:42069 {
		@fallback status 421
		handle_response @fallback {
			header content-type "text/plain"
			respond "ok" 200
		}
	}
}

This should not incur such overhead.

2025/05/22 5:10:33 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

I'll deal with Caddy later. There's still some work left to do on the iocaine side.

2025/05/22 5:26:58 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Right. Implemented regexp matching, a simple thing that brought Rhai down to ~9k is ~92k req/sec with Roto, when attacking iocaine directly.

The iocaine side of this will be good enough, based on these observations. The next part (after I implemented all the things I need for the classifier) will be to figure out if I can make Caddy faster.

Because if my classifier is faster, but the Caddy overhead is slower than the speed gain, then I'm still fucked.

2025/05/22 5:27:32 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

The good news is: I have a few ideas.

But time for bedtime stories now.

2025/05/22 8:20:21 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

     Running benches/classifier.rs (target/release/deps/classifier-d294cf21366ac898)
Timer precision: 20 ns
classifier             fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ classifier_branchy  76.3 µs       │ 419.7 µs      │ 79.67 µs      │ 82.09 µs      │ 60590   │ 60590
╰─ classifier_static   31.21 ns      │ 3.773 µs      │ 56.74 ns      │ 49.3 ns       │ 1482685 │ 94891840

Nice. Branchy is up from about 28.7k samples in 5s to 60k.

2025/05/22 8:22:48 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

This is just the decision making, no garbage generation involved.

2025/05/22 9:13:01 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

First idea worked, somewhat, we're up to 41k req/sec. Still too slow, though.

2025/05/22 9:14:08 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

I mean, iocaine itself is pretty fast. But the reverse_proxy + handle_response part seems to be slowing things down considerably.

2025/05/22 9:20:40 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

D'oh.

I screwed up the rules, and the tests weren't fair! The Caddyfile test was serving good content to bombardier, the iocaine classifer was serving garbage. So I had the garbage generation tax on top of it.

Fixed the rules, and things feel a bit better now, up to 52k req/sec, ~2-3k faster than the Caddyfile classifier.

Mind you, this isn't my entire classification system ported yet, so I expect it will go down a little. But this is finally, finally, finally looking viable.

2025/05/22 9:55:32 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

I think I managed to port over my classifier. Lets see how it works...

called `Option::unwrap()` on a `None` value

Err. Oops. But it gets worse!

thread 'tokio-runtime-worker' panicked at core/src/panicking.rs:221:5:
unsafe precondition(s) violated: ptr::copy_nonoverlapping requires that both pointer arguments are aligned and non-null and the specified memory ran
ges do not overlap
stack backtrace:
thread 'main' panicked at cargo-auditable/src/cargo_auditable.rs:40:39:
called `Option::unwrap()` on a `None` value

I might have fubared something up a little badly.

2025/05/22 9:56:27 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Seems to be related to the unix socket stuff. Can't repro on TCP sockets.

2025/05/22 9:59:30 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Hrm, repro'd on TCP too, but only if proxied through Caddy. Huh.

2025/05/22 9:59:54 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

...nah. bombardier blew it up too.

Ok. Lets walk back a bit.

2025/05/22 10:01:48 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Interesting. It's not the unix listener stuff. Ok. This will be some fun debugging I guess.

2025/05/22 10:11:05 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

OOOF.

Massive L. Managed to make things work, somewhat, and the big ai.robots.txt regexp slows things down to ~10k req/sec.

Not fine. But I have an idea.

2025/05/22 11:05:50 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

Rewrote the ai.robots.txt matching to use .contains(), no regexps. 60k req/sec in debug mode.

So the regexps are... a Problem.

2025/05/22 11:09:24 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

I tried pre-compiling the regexp in Rust, but that leads to crashes all over the place.

2025/05/22 11:25:18 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

thread 'tokio-runtime-worker' panicked at /home/algernon/.cargo/registry/src/index.crates.io-6f17d22bba15001f/regex-syntax-0.8.5/src/hir/mod.rs:2001:9:
misaligned pointer dereference: address must be a multiple of 0x8 but is 0x3a3a736e69746c69

Hm.

While a single Regex can be freely used from multiple threads simultaneously

(Source)

2025/05/22 11:48:16 PM UTC

algernon ludd algernon@come-from.mad-scientist.club

New idea: I'll keep regexp support, but will rewrite my rules to not rely on regexps. Most of them don't need to be regexps anyway.

And, I can maybe leverage aho_corasick... now that'd be a nice big win.

2025/05/23 12:16:28 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Please tell me my gut feeling is wrong.

2025/05/23 12:21:53 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Phew. My gut was wrong. Not completely wrong, but wrong nevertheless.

I tried reproducing the regexp crash without jemalloc - the gut feeling was that the regex crate and jemalloc don't play well. But no, that's not it!

❯ cargo run -q --no-default-features -- -c tmp/config/classifier.toml
free(): double free detected in tcache 2
thread 'main' panicked at cargo-auditable/src/cargo_auditable.rs:40:39:
called `Option::unwrap()` on a `None` value

This is without trying to share the Regex instance, too, and that feels a bit iffy.

2025/05/23 1:10:15 AM UTC

algernon ludd algernon@come-from.mad-scientist.club

Right. Sleepy time. New thread tomorrow.

A few walls were hit today, but I remain hopeful.