pleroma.debian.social

2025/05/09 4:21:41 PM UTC

@aeva sweet, way to go...going to give this a try just because I like to hear strange sounds come out of my guitar.

2025/05/09 4:22:35 PM UTC

@skryking awesome! if you're on linux, clone https://github.com/aeva/convolver, build it with `dotnet build` put a short 48000 hz mono sample wav file in the same directory as the output executable, name the sound file revolver.wav or change the hard coded name in the source files and run ./convolver to fire it up. you'll need to use helvum or something to hook up the inputs and outputs, and it only supports mono audio right now

2025/05/09 4:23:09 PM UTC

@skryking if you don't have an appropriate integrated gpu, it should fallback to llvmpipe which may or may not be sufficient to run it

2025/05/09 4:24:10 PM UTC

@aeva thanks for the howto, was going to dig through code to try to figure out how to run it.

2025/05/09 4:25:10 PM UTC

@skryking the csproj file also contains the build commands if you don't want to install dotnet

2025/05/09 4:26:32 PM UTC

@aeva will it play nice with my rtx 4070 mobile?

2025/05/09 4:27:56 PM UTC

@skryking no idea! it needs a unified memory architecture to work, as that allows the CPU and GPU to share allocations without having to do a run around with a copy queue

2025/05/09 4:57:24 PM UTC

@aeva ok ill just give it a try.

2025/05/09 5:19:41 PM UTC

@aeva So I got it running but I'm getting an error and just a clicking sound out of it when I feed in my guitar straight to it... should I amplify it first? is a 20 second clip too long?

2025/05/09 5:22:34 PM UTC

@skryking that means it can't run fast enough to complete enough convolutions on time. try using a shorter wav file

2025/05/09 5:30:49 PM UTC

@skryking I had been using this sample for the revolver https://sound-effects.bbcrewind.co.uk/search?q=07019168 trimmed down to roughly this though if 1 second is too long you might want to try something shorter

starts at the first peak about a second in, goes for a second, and you can add a fade out to crop the fall off a little short

2025/05/10 12:10:23 AM UTC

@skryking you could also turn the sampling rate down for the entire system. that might make latency a little worse but it'll give convolver more head room. there's a global at the top of the file that controls the sampling rate. I find that sdl3 was doing a poor job at resampling the impulse response file (revolver.wav) so you'll probably want to resave it at the same rate. 22050 might be workable

2025/05/10 9:41:45 AM UTC

@skryking were you able to get it working?

2025/05/10 9:47:01 AM UTC

god damn this thing is so fucking cool. I've got it hooked up to my drum machine right now and the fm drum in particular is pretty good at isolating parts of the impulse response sample. I'm using a short sample from the Nier Automata song "Alien Manifestation" to convolve the drum machine and it sounds *amazing*. It's a shame I can't modulate the drum parameters on this machine, or I'd be doing some really wild stuff with this right now.

2025/05/10 9:49:31 AM UTC

some small problems with this system:

1. I've had to turn down the sampling rate so I can convolve longer samples. 22050 hz works out ok though for what I've been messing with so far, so maybe it's not that big a deal. longer samples kinda make things muddy anyway

2. now I want to do multiple convolutions at once and layer things and that's probably not happening on this hardware XD

2025/05/10 9:51:49 AM UTC

I'll probably have to switch to an fft based system for non-realtime convolution to make this practical for designing dynamic sound tracks for games that can run on a variety of hardware, otherwise I'll probably have to opt for actually recording my songs and stitching it together by some more conventional means

2025/05/10 9:55:39 AM UTC

this thing is also really good at warming up my laptop XD

2025/05/10 10:03:03 AM UTC

idk if I'm done playing around with this prototype yet, but I'd like to explore granular synthesis a bit soon. I think there's probably a lot of cool ways it can be combined with convolution, like having the kernel morph over time.

M. Verdone sixohsix@icosahedron.website

@aeva my math is rusty... how do you combine two impulse responses, anyway? Do you have to convolve one with the other?

2025/05/10 10:06:13 AM UTC

2025/05/10 10:08:33 AM UTC

@sixohsix oh i mean i want to have A × B + C × D, where × is a convolution and + is just mixing

2025/05/10 10:09:15 AM UTC

probably first is reworking this program so i can change out the convolution kernel without restarting it or at least make it so i don't have to recompile it each time

2025/05/10 10:07:50 AM UTC

anyways i highly recommend building your own bespoke audio synthesis pipeline from scratch, it's a lot of fun

ohmrun ohmrun@hachyderm.io

@aeva I think the airwindows guy is using fft

2025/05/10 10:14:01 AM UTC

2025/05/10 10:10:46 AM UTC

@ohmrun idk who that is but i gather fft is the normal method for this

Irenes (many) ireneista@adhd.irenes.space

@aeva we've been meaning to, tbh

2025/05/10 10:14:45 AM UTC

2025/05/10 10:16:28 AM UTC

@ireneista it's very satisfying to make sounds

Yukari Hafner

shinmera@mastodon.tymoon.eu

@aeva I built my own audio system and hate every time I have to work on it, so I guess different strokes and all that.

(fwiw:
https://shirakumo.github.io/libmixed/
https://shirakumo.github.io/cl-libmixed/
https://shirakumo.github.io/harmony/ )

2025/05/10 10:19:14 AM UTC

2025/05/10 10:25:19 AM UTC

@shinmera mine rewards me with magnificent sounds every time i play with it 😌

Rob bobvodka@mastodon.gamedev.place

@aeva Yeah, I had much the same thought myself and I'm working towards it (slowly 😅)

Currently I have FFT based morphing for grains (grab two grain chunks, FFT into blocks, then over X frame linearly blend between the two), and FFT based convolution for a filtering, so its only a short hop to mash 'em together 🤔

2025/05/10 10:26:42 AM UTC

2025/05/10 10:26:53 AM UTC

@bobvodka oh cool, how's it sound?

Rob bobvodka@mastodon.gamedev.place

@aeva Agreed! My DSP project is the most coding fun I've had in years, with bonus fun sounds too 🥳

The Graphics Programmer to Audio Programmer pipeline is real 😂

2025/05/10 10:27:52 AM UTC

2025/05/10 10:31:37 AM UTC

@bobvodka XD

Yukari Hafner

shinmera@mastodon.tymoon.eu

@aeva Mine frequently rewards me with ear-destroying noise and incomprehensible bugs

2025/05/10 10:32:15 AM UTC

2025/05/10 10:34:54 AM UTC

@shinmera puzzles :D

Yukari Hafner

shinmera@mastodon.tymoon.eu

@aeva

2025/05/10 10:39:36 AM UTC

Eniko (moved ➡ gamedev.place) eniko@peoplemaking.games

@aeva

2025/05/10 10:45:01 AM UTC

Rob bobvodka@mastodon.gamedev.place

@aeva Sounds good to me.
Its nice how, over a long sample and a lot of "frames' you can hear the target sample slowly come in at various frequencies.

Should work well on smaller grains too; need to implement it in my 'grain swarm' code at some point.

2025/05/10 10:45:04 AM UTC

2025/05/10 10:45:44 AM UTC

@eniko :3

2025/05/10 10:51:44 AM UTC

@bobvodka awesome :D

Ronflaix Ronflaix@mastodon.gamedev.place

Frankly @aeva ? I'd love to if I understood where to start and the involved math, it'd be a pleasure to suck at it as I do with rendering!

2025/05/10 11:39:41 AM UTC

Evie 🏳️‍⚧️ pupxel@mastodon.gamedev.place

@aeva I have notes on shader reflection if you need them :3

2025/05/10 12:42:10 PM UTC

crypticcelery 🔜 GPN crypticcelery@chaos.social

@aeva something about scrolling up through this thread and the length of it make me somewhat doubt that statement…

2025/05/10 1:49:23 PM UTC

ohmrun ohmrun@hachyderm.io

@aeva
Someone with a great deal of experience in fft that open sources their code. What am I, chopped pork?

2025/05/10 5:44:36 PM UTC

2025/05/11 1:31:13 AM UTC

@ohmrun can i offer you an imaginary number in these trying times *hands you "mañanity", which is the number of days i will leave you waiting*

2025/05/11 1:34:15 AM UTC

It occurred to me just now that I might be able to make this faster be rewriting it as a pixel shader. Each pixel in the output is an audio sample. Each PS thread reads a sample from the impulse response and the audio stream, multiplies them together, and writes out the result. To perform the dot product, the draw is instanced, and the add blend op is used to combine the results. I've also got a few ideas for variations that might be worthwhile.

2025/05/11 1:35:15 AM UTC

Like, having the vertex shader or a mesh shader read the sample from the audio stream, have the PS read the impulse response, and stagger the draw rect. Main snag there is the render target might have to be 512x1 or something like that, or I'll have to do counter swizzling or something.

2025/05/11 1:39:13 AM UTC

Also FP32 RGBA render targets would probably just batch 4 samples together for the sake of keeping the dimensions lower I guess.

2025/05/11 2:35:31 AM UTC

I think this should be likely to be a lot faster, because I've made a 2D convolution kernel a lot slower by rewriting it to be compute in the past 😎 but if any of ya'll happen to have inside knowledge on if ihv's are letting raster ops wither and die because AAA graphics programmers think rasterization is passe now or something absurd like that do let me know.

2025/05/11 2:38:04 AM UTC

@aeva The actual reason for that was almost certainly memory access patterns. Thread invocations in PS waves are generally launched and packed to have nice memory access patterns (as much as possible), compute waves and invocations launch more or less in order and micro-managing memory access is _your_ problem.

This really matters for 2D because there's lots of land mines there wrt ordering, but for 1D, not so much.

2025/05/11 2:41:51 AM UTC

@aeva To give a concrete example: suppose you're doing some simple compute shader where all you're doing is

cur_pixel = img.load(x, y)
processed = f(cur_pixel, x, y)
img.store(x, y, cur_pixel)

and you're dispatching 16x16 thread groups, (x,y) = DispatchThreadID, yada yada, all totally vanilla, right?

2025/05/11 2:45:24 AM UTC

@aeva well, suppose we're working in 32-thread waves internally (totally hypothetical number)

now those 32 invocations get (in the very first thread group) x=0,...,15 for y=0 and then y=1.

Say the image is R8G8B8A8 pixels and the internal image layout stores aligned groups of 4 texels next to each other and then goes to the next y, and the next 4-wide strip of texels is actually stored something like 256 bytes away or whatever.

2025/05/11 2:47:03 AM UTC

@aeva so, x=0,..,3 y=0 are all good, these are all adjacent, straight shot, read 16 consecutive bytes, great.

x=0,...,3 y=1 in threads 16..19 are also good, these are the next 16 bytes in memory.

But if we have 256-byte cache lines (another Totally Hypothetical Number), well, those 32 bytes are all we get.

x=4,..,7 for y=0 and 1 are in the cache line at offset 256, x=8,...,11 for y=0,1 at offset 512, x=12,...,15 at offset 768.

2025/05/11 2:50:23 AM UTC

@aeva And caches are usually built to have multiple "banks" that each handle a fraction of a cache line. Let's say our hypothetical cache has 16 16-byte banks to cover each 256B cache line.

Well, all the requests we get from that nice sequential load go into the first 2 banks and the rest gets nothing.

So that's lopsided and causes problems, and will often mean you lose a lot of your potential cache bandwidth because you only actually get that if your requests are nicely distributed over mem.

2025/05/11 2:53:36 AM UTC

@aeva long story short, this whole thing with your thread groups being a row-major array of 16x16 pixels can kind of screw you over, if the underlying image layout is Not Like That.

This happens all the time.

Ordering and packing of PS invocations into waves is specifically set up by the GPU vendor to play nice with whatever memory pipeline, caches, and texture/surface layouts it has.

In CS, all of that is Your Job, generally given no information about the real memory layout.

Good luck!

2025/05/11 2:57:20 AM UTC

@aeva If you do know what the real memory layout is, you can make sure consecutive invocations have nice memory access patterns, but outside consoles (where you often get those docs), eh, good luck with that.

The good news is that with 1D, this problem doesn't exist, because 1D data is sequential everywhere.

So as long as you're making sure adjacent invocations grab adjacent indices, your memory access patterns are generally fine.

(Once you do strided, you're back in the danger zone.)

2025/05/11 3:05:27 AM UTC

@aeva also I want to emphasize that this Purely Hypothetical Example with row-major invocation layout in CS vs. a column-heavy layout in the HW is of course entirely hypothetical and in no way inspired by real events such as https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/

Janne Moren jannem@fosstodon.org

@rygorous @aeva
On the CPU it's generally best to organize structured data as separate contiguous arrays for each element. But with the graphics pedigree of GPUs does that still hold, or does it handle interleaved data better?

2025/05/11 3:16:47 AM UTC

2025/05/11 3:16:32 AM UTC

@jannem @rygorous It depends on the IHV whether SoA or AoS is better and in what situations. Usually there will be a document outlining recommendations somewhere.

James Widman JamesWidman@mastodon.social

@shinmera @aeva [i know nothing about audio processing so i'm like 99.9% sure that there's a good reason why the following doesn't make sense; asking the following out of curiosity]

can the ear-destruction be avoided by like... doing some kind of analysis/checks on the final sample before sending it to the audio device...? (e.g. checking & asserting that its amplitude is less than some upper bound?)

[but if it were that easy, it probably would have been the fist thing anyone would try, so]

2025/05/11 3:17:52 AM UTC

2025/05/11 3:20:59 AM UTC

@JamesWidman @shinmera I just try to remember to turn the volume down before testing new changes

2025/05/11 3:59:08 AM UTC

@rygorous that sounds likely. I don't think I accounted for memory layout of the texture. I assume this is also why Epic seems to be so fond of putting everything in scan line order these days?

2025/05/11 3:59:40 AM UTC

@rygorous so, my program as written is two linear memory reads, some basic arithmetic, and some wave ops. I think it should be pretty cache efficient, or at least I don't have any obvious ideas for making it moreso. I would think all the extra raster pipeline stuff would not be worth it, but the opportunity to move one of the loads into an earlier shader stage to effectively make it invariant across the wave and make use of the ROP to implement most of the dot product seems maybe worthwhile?

2025/05/11 4:28:54 AM UTC

@rygorous the ROP is, like, free math, right?

James Widman JamesWidman@mastodon.social

@aeva @shinmera if i ever do audio programming, i will try to remember to make my program start with a giant ASCII-art splash screen that asks if the volume is set correctly before proceeding and makes me type "yes", because i would definitely forget sometimes (:

2025/05/11 4:46:30 AM UTC

2025/05/11 4:47:39 AM UTC

@aeva for 1D there's not much way to go wrong honestly, it's mainly a 2D (and up) problem

2025/05/11 4:54:47 AM UTC

@aeva Not really. The "math" may be free but the queue spots are not and you'll likely end up waiting longer in the shader to get to emit your output then you would've spent just doing the math directly

2025/05/11 4:56:23 AM UTC

@aeva Looking at the shader you posted yesterday (?) at https://github.com/Aeva/convolver/blob/excelsior/src/convolver.cs.glsl, you're in the Danger Zone(tm)

2025/05/11 4:59:38 AM UTC

@aeva the issue is SliceStart is derived from LaneIndex (Subgroup invocation ID) which is then multiplied by Slice

2025/05/11 5:01:05 AM UTC

@aeva I don't know what value Slice has with the sizes you pass in, but it would be really bad if Slice works out to be some medium to large power of 2.

The issue is that the "i" loop goes in sequential samples but invocation to invocation (which is the dimension that matters), the loads inside are strided to be "Slice" elements apart.

You really want that to be the other way round. Ideally sequential loads between invocations.

2025/05/11 5:35:03 AM UTC

@aeva so basically, try making the loop be "for (i = LaneIndex; i < SizeB; i += GROUP_SIZE)" and poof, suddenly those loads are mostly-sequential invocation to invocation instead of always hitting a few dozen cache lines

2025/05/11 5:04:14 AM UTC

@rygorous I gave that a try earlier today, but it ended up being a wash. I think the slice sizes should be big enough for it to matter, so I'm guessing the savings are smaller than some other bottleneck elsewhere, probably on the CPU side

2025/05/11 5:10:07 AM UTC

@aeva separately, don't want that % SizeA in there, that doesn't have to be bad but it can be, I don't know how good shader compilers are about optimizing induction variables like that

might want to keep that as an actual counter and just do (in the modified loop)

j += GROUP_SIZE;
j -= (j >= SizeA) ? SizeA : 0;

(you also need SizeA >= GROUP_SIZE now, but I don't think that changes anything in your case)

2025/05/11 5:16:08 AM UTC

@aeva even on a GPU, if you do enough MADs per sample eventually you're going to be compute bound with this approach, but I'd be shocked if you were anywhere close to that right now.

First-order it's going to be all futzing with memory access.

2025/05/11 5:20:40 AM UTC

@aeva I mean, you can literally do the math!

If you're on a GPU, then even on a mobile GPU from several years ago, you're in the TFLOP/s range by now for actual math.

So, ballpark 1e12 MADs per second.

48kHz stereo is ballpark 1e5 samples per second.

Math-wise, that means you can in theory do 1e7 MADs per sample, enough for brute-force direct convolution with a >3 minute IR. You're probably not doing that.

2025/05/11 5:48:53 AM UTC

@aeva You can always do better convolution algs, but even for brute-force, the math is just not the problem for IR sizes you're likely using.

But as written in your code, you also have two loads for every MAD, and there's nowhere near that level of load bandwidth available, not even if it's all L1 hits.

Making it sequential across invocations should help noticeably. But beyond that, you'll need to save loads.

2025/05/11 5:59:04 AM UTC

@rygorous huh. so is the ideal pattern something like out[0...N] += IR[0] * in[0...N], where the IR[0] is loaded once, and you basically just peel the first MAD for each sample being processed at once, and then do it all again for IR[1] etc. And the += would have to be an atomic add 🤔

2025/05/11 6:00:42 AM UTC

@aeva I don't know about ideal but there is definitely is some mileage to be had in loading one of the two into registers/shared memory in blocks, double-buffering the next fetch and having only one load via L1/tex in the inner loop.

That said the better FFT-based conv kinda nukes that.

Good news: FFT-based conv kinda automatically exploits all the sharing for you!

Bad news: that means you're now down to loading and using each IR FFT coeff exactly once.

2025/05/11 6:08:29 AM UTC

@aeva It is work-efficient and gets both your load count and your mul-add count way down, but it also means what's left is kinda BW bound by construction and there's not much you can do about it.

(I mean you can make the block sizes small enough that you're still summing tons of terms again, but that's defeating the purpose.)

2025/05/11 6:09:19 AM UTC

@rygorous ok weird thing happened just now, I gave the your about changing the iterations another try and did notice the worst case runs were cheaper while the average was about the same (this is not the weird part), but then I dropped GROUP_SIZE (sets both the work group size and required subgroup size) from 32 to 16 and the average time went from 7.7 ms to 6.175 ms and the best record time went from 6.8 ms to 1.8 ms.

2025/05/11 6:15:04 AM UTC

@rygorous this is on Intel. I tried setting the group size to 8 and the numbers got really nice but it stopped making sound lol

2025/05/11 6:15:56 AM UTC

@rygorous oh no lol if I add a print statement into the hot loop on the CPU it starts making sounds again with GROUP_SIZE at 8. It looks like that might improve throughput enough to show that my synchronization with pipewire's thread is broken :3

2025/05/11 6:26:56 AM UTC

@rygorous idk why dropping the group size did that, it didn't do that before. not sure if it's related to your change

2025/05/11 6:32:30 AM UTC

@rygorous well, either way, it's down to about 8 ms of latency now so ty for the wisdom :)

2025/05/11 6:41:51 AM UTC

@aeva this is a Side Question, do you have any idea how much of that is just fixed costs that don’t depend on the amount of compute at all? I‘m wondering because last time we looked at „should we run our audio processing on Not CPU“ the conclusion was a clear nope, latency to dispatch any work in 10ms batches already kills us, but likely a lot / most (?) of that was tflite not being set up for this type thing rather than the underlying systems, and we never had time to dig deeper than that

2025/05/11 6:46:28 AM UTC

@halcy I'm seeing round trip times as low as 1ms with an average of about 6. I'm using a UMA GPU though, which lets me leave memory mapped to both the GPU and CPU. Most of my perf is down to how much I can juice the compute shader and bringing down the timing variance caused by synchronization scheduler noise. Right now I have to leave 2 ms of head room or the audio becomes unstable, so my latency is roughly 8 ms.

2025/05/11 7:07:02 AM UTC

@halcy the rules are different if you're targeting discrete GPUs, but I'm not sure how much anymore since bus speeds are fast and we got stuff like resizable mbar now. Conventional wisdom says readback is a no-no, but I think enough has changed in the last few years to merit more investigation.

2025/05/11 6:49:18 AM UTC

@aeva I will happily take credit if it works but must disclaim all responsibility if it does not!!!11

2025/05/11 7:32:13 AM UTC

@halcy as more useful proof than measured numbers that can be wrong, I've been able to play using this with a live instrument.

Josh Simmons dotstdy@mastodon.social

@aeva @halcy main problem with readback is usually the way it synchronizes your CPU and GPU, not really that the readback itself is slow (ofc if you're reading something large that can be slow too). Since GPUs are big async chonkers if you just naively queue up some work and then wait for it on the CPU you're likely going to end up waiting for a lot more than you intended. But you can do high priority queues and whatnot these days if you want to juice the latency.

2025/05/11 8:34:27 AM UTC

2025/05/11 6:51:58 AM UTC

@rygorous wisdom was imparted, action was taken, and results were accomplished. nothing suspicious here 😎

2025/05/11 7:00:12 AM UTC

@aeva thanks! That does seem lower than what I remember getting…. though probably still means that for the amount of computation we usually have to do in a frame, it‘s CPU all the way

unless maybe in The Future we make the model a lot bigger

2025/05/11 7:05:50 AM UTC

@halcy I don't think audio processing on the GPU is worthwhile unless you're doing an absurdly expensive transform like what I'm doing or you're able to dispatch a large enough batch to saturate the GPU. there's a sweet spot where it is faster to do the round trip to the GPU, because time is convoluted

2025/05/11 8:38:49 AM UTC

@aeva yeah, I think that‘s still the conclusion

which is unfortunate because I would really like to be paid to do that! but it is difficult to argue for when like, even if you implement everything very well the roundtrip already has highs that would (in presence of the rest of the audio stack) cause issues. like, right now our model runs, on a weakish desktop, with 2ms averages and 3 to 4ms highs, for a 10ms frame, and that’s already kind of as high as we dare going

2025/05/11 7:01:14 AM UTC

@halcy well, if the end goal is to simply be paid to put audio on the gpu, then you can simply find a problem that fits the solution

2025/05/11 7:07:05 AM UTC

@halcy haha time is convolved

2025/05/11 8:39:03 AM UTC

@aeva (and there’s also power efficiency because Phones Exist, and there it becomes even harder still)

2025/05/11 8:42:18 AM UTC

@halcy what sort of audio processing are you doing?

2025/05/11 8:45:28 AM UTC

@aeva I like this gig. See, at work I am usually eventually required to put up or shut up.

2025/05/11 8:51:07 AM UTC

@dotstdy @halcy so you do have limited bandwidth for transfers with discrete GPUs so in theory that can also be problems depending on how much data you want to read back. I think a lot of the superstition people have about doing any readback at all is a product of bad engine design. like, you can have everything pipelined perfectly and have multiple readbacks in the middle of the frame. you just need to check a fence to see what data is ready to process, and let it be asynchronous.

2025/05/11 8:50:23 AM UTC

@dotstdy @halcy in the case of my audio convolver, the CPU thread that manages all of the vulkan stuff is basically just alternating between waiting for audio to buffer, running a compute dispatch, waiting for the results of the dispatch, bumping an atomic to tell the audio thread what is safe to read now, and then repeating the loop. I can probably shave the latency down a little more there by keeping the GPU hot, but the loop is tight enough that I'm not sure.

2025/05/11 8:52:49 AM UTC

@aeva noise / echo / reverb suppression and/or speech enhancement in various different configurations, for voice calling. so essentially „run smallish neural network on audio fast“

and yeah sure we can make the task arbitrarily more complex by making the model larger but then we need to do annoying things like justify the added compute by showing a major quality improvement. maybe requirements will do it for us eventually if someone decides we must have fullband stereo audio or something

2025/05/11 9:02:51 AM UTC

@halcy oh just make it worse and slap the "AI" sticker on it

Josh Simmons dotstdy@mastodon.social

@aeva @halcy right exactly, in an engine readback often means like "wait for downsampled depth buffer from the previous frame before starting to submit work for this frame", which can be a huge issue. most of the time your cpu and gpu executions are out of sync by a frame or so and throwing a random sync() in there is disaster. but nowdays you can even have a high prio compute pipeline that preempts whatever work is currently running (at least, if you can actually access that hw feature...)

2025/05/11 9:01:06 AM UTC

2025/05/11 9:20:33 AM UTC

@aeva pretty sure the ad copy is extremely AI‘d up already. and unfortunately, if we make it worse, people will increasingly click the little „call audio was bad“ button in the little window you sometimes get at the end of the call, making a number go down that then causes me (well, okay, our PM) to panic and stop rollout

2025/05/11 9:29:12 AM UTC

@halcy wait that button actually does something ?!

2025/05/11 9:52:35 AM UTC

@dotstdy @halcy yup. it would be a pain in the ass to retrofit a sufficiently crufty old engine for that, but there's some really cool stuff you could do designing for it.

Also it's worth pointing out that deep pipelines with full utilization is great if you goal is building a makeshift space heater out of the tears of AAA graphics programmers, but if your goal is juicing latency, frame time, and/or battery life running the CPU and GPU synchronously can be advantageous.

2025/05/11 9:53:38 AM UTC

@aeva doesn’t immediately file a bug, but when you press the uhhh idk what it looks like now they‘ve been messing with it but, thumbs down or below 4 stars or sad smiley face, whatever it is, button, ideally also with a Problem Token (the little what was actually bad questionaire) then at least for media (so calling a/v - can’t speak to how fronted or chat or whatever do it) in Teams/Skype it goes to a big database along with technical telemetry that is usually correlated with call quality (stuff like „percent of aborted calls“) which then feeds into a quite thorough A/B system, and if we spot regressions in what are considered Tier 1 metrics rollout stops (no statsig positive change is okay generally if you can justify why the change fixes a rare bug, adds a useful feature or whatever. Though of course if you can move a problem token or T1 metric in the right direction, that’s even better).

Mostly we catch issues before changes make it to public, though

2025/05/11 10:18:35 AM UTC

@aeva anyways what I‘m saying is you should definitely always vote 5 because that makes my KPI numbers go up

2025/05/11 10:19:43 AM UTC

@halcy i'm reminded of this xkcd for some strange reason https://xkcd.com/958/

2025/05/11 10:27:28 AM UTC

@halcy any time i get the urge to press one of those buttons, i'll gladly stab that 5 for you ♡

2025/05/11 4:17:39 PM UTC

@aeva in no jokes land, please do press the buttons that most reflect the call experience, which will make my life easier by contributing to a realistic picture of where we‘re at and what we most need to work on and what is and is not working

2025/05/11 4:19:14 PM UTC

@halcy the little reaction emoji thing has been broken for months, there's no overlay for it anymore so i don't see when people press it and i'm not sure if they see when i do. could you pass that along to whoever's problem that is?

also there doesn't seem to be an option to make it not use the GPU anymore which is somewhat problematic for me since i have to completely shut down teams when doing perf stuff

2025/05/11 4:19:53 PM UTC

@aeva i can try, but these sort of reports tend to not go anywhere unless I can repro

2025/05/16 7:48:22 AM UTC

@halcy i can try and provide more info tomorrow if you want

2025/05/16 7:52:07 AM UTC

I figure I should probably start recording my convolution experiments for reference, and this thread seems as good a place as any to post them.

Tonight's first experiment: An excerpt from a The King In Yellow audio book convolved with a short clip from the Chrono Cross OST (Chronopolis)

2025/05/16 7:59:36 AM UTC

Tonight's second convolution experiment: The same audio book excerpt, but convolved with a frog instead.

Recordings of speech seem to convolve really well with music and weird samples like this, but it really depends on the voice and what you pick as a kernel.

2025/05/16 8:01:39 AM UTC

@aeva are you able to time vary the impulse response? I wonder what it would sound like if you varied it by e.g. resampling it to get shorter and shorter as time goes on

2025/05/16 8:02:28 AM UTC

@halcy I was thinking about doing something like that but stretching / compressing the sample history of the audio stream instead, which is functionally the same idea. For a simple reverb IR you'd get a pitch shift in the delayed sound as well as it seeming to get faster or slower. I'm hoping slowing it down would sound like a tape stretching almost.

2025/05/16 8:03:16 AM UTC

@halcy that would be non-uniform time mutation which may or may not be what you were thinking

2025/05/16 8:03:35 AM UTC

@aeva yeah no same effect except something something aliasing whatever who cares probably sounds more fun if there's artifacts

2025/05/16 8:04:01 AM UTC

@halcy that's the spirit :3

2025/05/16 8:10:32 AM UTC

@halcy no eta, but I'd like to implement something like that

2025/05/16 8:16:21 AM UTC

I should remember to try the inverse of the first experiment later (but not tonight)

2025/05/16 8:17:23 AM UTC

I had a really great thing going with the chronopolis sample as the impulse response, and using my drum machine to drive it yesterday. The FM drum is really great for isolating specific sounds from the impulse response. I did try to record it, but I recorded the unfiltered line in on accident instead, so I'll have to redo it later

Efi (nap pet) 🦊💤 efi@chitter.xyz

@aeva *spins*

2025/05/16 8:27:14 AM UTC

ideaPDish ipd@universeodon.com

@aeva

Glad you are convolving. If the rest of civilization is convolving at the same rate as you, how do you know?

So, you gonna start a convolution?

2025/05/16 9:01:05 AM UTC

Farbs Farbs@mastodon.social

@aeva These are wonderful!

By inverse do you mean swapping which is the source and which is the filter? I'm only 75% sure but I _think_ that would be mathematically identical.

2025/05/16 3:59:49 PM UTC

2025/05/16 9:41:20 PM UTC

@Farbs not quite, I'd be taking a 1 second clip from the audio book and convolving the full song with it.

Farbs Farbs@mastodon.social

@aeva Oh I see!

Something I really want to try is moving that 1s clip window around over time, maybe oscillating or just moving it slower than the playback speed. Seems like that could sound wonderfully dynamic.

2025/05/16 10:47:09 PM UTC

2025/05/16 10:51:15 PM UTC

@Farbs as it happens, that is one of the things I'm interested in trying out with this eventually :3

2025/05/17 12:31:12 AM UTC

@Farbs I want to experiment with having the impulse response morph over time in a variety of ways. I think there's some interesting opportunities here to do some fun stuff with cross fading between wave tables, granular synthesis stuff, and other things

Farbs Farbs@mastodon.social

@aeva Yay! Please keep posting your results, I'd love to hear them.

My guess is that if you moved the filter window very slowly it'd sound a bit like Paulstretch.

Similar but different, I found that if you play a very short noise buffer on loop and reroll one random sample every time it plays you get a wonderfully grindy note that still has movement.

Also you can create waveforms by sampling the circumference of a drifting circle in a photo texture and it sounds like a brass instrument :)

2025/05/17 8:51:12 AM UTC

2025/05/17 9:51:36 AM UTC

Experiment 3: Impulse response is a clip from the audio book where the guy is dramatically saying the word "Carcosa". I got a pretty trippy dark ambiance out of it with the drum machine with it earlier, but I didn't feel like recreating it, so I ran a bunch of songs through it instead and Fire Coming Out Of A Monkey's Head sounded the most interesting with it. The Chrono Cross songs I tried didn't feel distorted enough to bother posting, and this one kinda doesn't either but its interesting.

2025/05/17 9:53:32 AM UTC

Experiment 4: Same impulse response as the previous one, it's the clip from the audio book where the guy is saying "Carcosa", but this time I'm convolving it with VCV Rack. I've got a feedback loop of two sine wave oscillators that are modulating eachother's frequency. the output of the one that is functioning as the carrier is being feathered by a pair of low frequency oscillators before applying an envelope.

2025/05/17 10:06:18 AM UTC

I'm really blown away by what I can do with fm synthesis + convolution.

2025/05/17 10:44:57 AM UTC

Experiment 4a: here's another with that same impulse response and nearly the same vcvrack patch, but this time it sounds like a Cello or something instead

dbat

dbat@mastodon.gamedev.place

@aeva wild and spooky. now hook it all up to geo nodes in B !

2025/05/17 3:08:18 PM UTC

Evie 🏳️‍⚧️ pupxel@mastodon.gamedev.place

@aeva I swear I heard this in a horror movie but I forgot which

2025/05/17 6:45:14 PM UTC

2025/05/18 9:48:13 AM UTC

@dbat :D!!

cancel cancel@merveilles.town

@aeva what's your convolution thing sound like with this impulse response i generated (wav file is very loud and bright on its own be careful) https://cancel.fm/stuff/share/gen%20IR%20for%20aeva.wav

2025/05/18 4:26:36 PM UTC

2025/05/18 4:41:22 PM UTC

@cancel I'll give it a try this evening. Is there a particular sort of song or recording you'd like me to convolve?

cancel cancel@merveilles.town

@aeva i think anything that isn't tonal would work

2025/05/18 8:25:34 PM UTC

2025/05/18 8:39:05 PM UTC

@cancel should I leave the little bit of leading silence in the sample?

cancel cancel@merveilles.town

@aeva Shouldn’t matter

2025/05/18 8:57:31 PM UTC

https://sound-effects.bbcrewind.co.uk/search?q=07031100

@cancel here it is, convolved with a bunch of random things I found on the bbc sound archive. in order of appearance:

https://sound-effects.bbcrewind.co.uk/search?q=NHU05032132

https://sound-effects.bbcrewind.co.uk/search?q=07066018

https://sound-effects.bbcrewind.co.uk/search?q=07066021

2025/05/18 8:58:01 PM UTC

2025/05/18 10:35:59 PM UTC

@cancel it might sound interesting with the drum machine, I'll try it out in a little while

2025/05/18 10:37:44 PM UTC

@cancel and here it is with the drum machine. doesn't change the sound all that much, but it's pleasant imo. makes it sound a bit more fruity, especially with the fm drum.

2025/05/19 6:34:00 AM UTC

Experiment 5: in which a clever internet people gives me a home made sound https://mastodon.gamedev.place/@aeva/114531222098257965

2025/05/19 6:57:53 AM UTC

Experiment 6: "snowmeltdown" aka lowfi sounds to show them and show them all to

(noodling around with the fm drum on the drum machine, and a short clip of rain or snow melting as the impulse response)