now that i am... writing my own agentic LLM framework thing... because if you're going to have a shitposting IRC bot you may as well go completely overkill, i have Opinions on the state of the world.
openclaw, especially, seems to be hot garbage, actually, because i was able to teach my LLM (which i trained from scratch on the highest quality artisanal IRC logs, 2003 to present, so i can assure you it is not a very good LLM) to use tools in the context of my own framework quite easily.
@ariadne say it's not true
@ariadne A shitpost bot trained on IRC logs?
Holy fucking shit you found a valid use for "AI".
@ariadne many years ago, I trained a Markov model on a decade or two of my IRC utterances to see if I could get it to replace me.
Now I'm realizing I could have described that as an early AI agent and run off with a huge pile of VC money.
@ariadne They are all quite bad and not really production-ready. Maybe support Docker at the minimum, but of course local volume mounts with mutable files. But imagine if it could scale workloads in Kubernetes, save to a database and use S3 storage.
@thomholwerda i trained it from scratch, this is peak IRC
@ariadne Did you pull in a tool use data set to fine tune on, or was this accomplished entirely through prompting? I've always been interested in how lean the models can get.
@dvshkn i generated a bunch of examples of valid and invalid JSON document fragments and then prompted it with "reply in JSON" and then a spec on what it can do.
the hardest thing has been convincing it to shut the fuck up actually.
@ariadne It might not be well received by everyone, but would read a blog post if you do write one
@ariadne If there are plans to make its... Musings available outside of IRC, I'm bookmarking that.
@thomholwerda i have no idea how to grant it the level of autonomy that would allow it to go full bcachefs
@dvshkn *shrug* i think my opinions on commercial AI are well understood by now (namely that i am quite skeptical of it)
@dvshkn and, if anything, this exercise has only made me *more* skeptical
@ariadne The world is not ready for that.
first of all, when i began i was quite skeptical on commercial AI.
this exercise has only made me more skeptical, for a few reasons:
first: you actually can hit the "good enough" point for text prediction with very little data. 80GB of low-quality (but ethically sourced from $HOME/logs) training data yielded a bot that can compose english and french prose reasonably well. if i additionally trained it on a creative commons licensed source like a wikipedia dump, it would probably be *way* more than enough. i don't have the compute power to do that though.
second: reasoning models seem to largely be "mixture of experts" which are just more LLMs bolted on to each other. there's some cool consensus stuff going on, but that's all there is. this could possibly be considered a form of "thinking" in the framing of minsky's society of mind, but i don't think there is enough here that i would want to invest in companies doing this long term.
third: from my own experiences teaching my LLM how to use tools, i can tell you that claude code and openai codex are just chatbots with a really well-written system prompt backed by a "mixture of experts" model. it is like that one scene where neo unlocks god mode in the matrix, i see how all this bullshit works now. (there is still a lot i do not know about the specifics, but i'm a person who works on the fuzzy side of things so it does not matter).
fourth: i built my own LLM with a threadripper, some IRC logs gathered from various hard drives, a $10k GPU, a look at the qwen3 training scripts (i have Opinions on py3-transformers) and few days of training. it is pretty capable of generating plausible text. what is the big intellectual property asset that OpenAI has that the little guys can't duplicate? if i can do it in my condo, a startup can certainly compete with OpenAI.
given these things, I really just don't understand how it is justifiable for all of this AI stuff to be some double-digit % of global GDP.
if anything, i just have stronger conviction in that now.
@ariadne it was never justifiable, but investors don't have your ability to just go play.
@ariadne I think your question in the fourth point is answered by your first point. A lot of the secret sauce is just hoarding compute.
@ariadne If you market it right*, you too can sell for a fuck ton of money to Meta.
* Shitposts better than any LLM on Moltbook 🙊
@dvshkn oh i could do it if i wanted, it would just take months to years.
@ariadne Yeah, you basically already answered it yourself, but China really destroyed the idea that there's some super secret training data that people can't get
@ariadne Having studied up a bit myself I can fill in a few pieces. Reasoning models just have been trained to chatter on in some kind of preamble that is intended to be hidden or de-emphasized in the UI, possibly wrapped in tags like <reasoning>blah blah blah</reasoning>, followed by a shorter answer. Mixture of experts is an orthogonal idea to structure the models so predictions can be run using only a in order to use less compute. Both ideas make models hard to train for different reasons.
@mirth sure, but the "thinking" ones do some consensus stuff to ensure it doesn't go off course
@ariadne Not at prediction time, they do another stage of training that works a bit differently but the resulting model is structurally identical to the input model. I think you're very right about the lack of defensibility though, if you wanted to catch up with the leading labs in a year or two you could probably do it with around $200M and the charisma to recruit the people who know how to do this stuff.
@ariadne where can I connect to talk to this LLM. I want to see if it retained some vintage IRC memes
This is a thoughtful, candid thread about building your own LLM and questioning the value of big AI—what surprised you most during your experiments?
@ariadne heck, even a Markov chain can be a decent shitposter. With what I know now about tf-idf (being ignorant about this was a major roadblock for calculating relevance) I'm really tempted to resurrect my python IRC atrocity from 2004 or so
@dngrs I wanted something cooler than a Markov bot, and was already researching SLM (small language model, e.g. language strictly as I/O) technology for a Siri-like thing anyway.
@ariadne I should say by "catch up" I mean to get to parity, my impression is the model research is kind of like drug development where a lot of the cost is paying for all the experiments that don't work, as a result it's much easier to catch up than to get out "ahead" whatever that means. Setting aside the ethical issues, the functional issue of how to effectively use plausible-sounding crap generators as part of reliable software systems remains unsolved.
@mirth the question is why compete with them at all? it has same energy as the unix wars. large, proprietary models that lock people in. I would rather see a world of small, modular libre models that anyone with a weekend and a GPU can reproduce.
@mirth interesting. what I've built is a modular pipeline which takes language input, converts it into structured data, enriches that structured data with other relevant information, processes the final query into a plan (which is also structured data) and then uses that plan to formulate a response
@ariadne To me it's a question of sufficient output quality, the strongest models available just barely function enough to do a little bit of general purpose instructed information processing unreliably. That will improve over time but the current stuff is very early.
The reason I'm a bit skeptical of a proliferation of weekend-sized models is that that size sacrifices the key ingredient enabling the whole LLM craze: the magical-looking ability to run plain language instructions.
@mirth i mean, i don't think that necessarily holds *if* you have the ability to build whatever you need with legos.
in many cases simply translating natural language to a specification for an expert system is enough
@ariadne I'm not sure if there's a common name in the research but I think that kind of multi-step system that put the whole gloopy mess of linear algebra on some kind of rails is inevitably going to be necessary to make these things reliable. Even the smartest and most highly trained human specialists still rely on lookup tables and checklists and so forth to do their jobs.
@mirth back in the earlier AI wars, these were called "expert systems"
my idea is basically SLMs for I/O with other small models and tools governed by a user-generated expert system
@ariadne I think there's a lot of merit to that idea although I don't understand how to build it. As models get more powerful the harnesses required to make them write coherent code or whatever aren't getting any simpler, so I think that's a strong argument for the "small pieces in a structured formation" kind of arrangement. Big LLMs have the attracting property that a user can start with a small description and see something happen right away, I wonder how to replicate that.
@ariadne @pinskia @mirth
What they are doing is forcing competitors to do more with less. Smaller models with a clever architecture, not huge monoliths trained by brute force. Might come back to bite them sooner or later.
I'd like to see more hybrid models, where the LLM largely sticks to being the language module, and other models (possibly not even NN) specialize in other functions.
@jannem @pinskia @mirth yes, this is what i eventually want to build. a set of libre building blocks for building ethical, libre and personal agentic systems that are self-contained.
the shit Big AI is doing is not interesting to me, but SLMs and other specialized neural models legitimately provide a useful set of tools to have in the toolbox.
today, however, I just want to prove the ideas out by shitposting in IRC ;)
@ariadne
Yeah, one thing I've wondered is how much simpler a system that, instead of processing code, took the plain english "refactor this to blah blah" and just processed the language and figured out what to tell the IDE and etc for everything else, could be.
Run a calculator instead of being one - and you have a much simpler problem to solve.
Could the reliability and ethical problems all be solved -- maybe, i dunno, but - yet another case of "tech could be cool if the harmful parts go away..."
@pixx @mirth i think small LLMs do not really have an ethical problem: i trained a 1.3B parameter LLM off of my own personal data in my apartment by simply being patient enough to wait. no copyright violations, no boiling oceans, just patience and a professional workstation GPU with 96GB RAM.
the ethical problem is with the Big AI companies who feel that the only path forward is to make bigger and bigger and bigger monolithic prediction models rather than properly engineer the damn thing.
that same ethical problem is driving the hoarding, because companies are buying the hardware to prevent their competitors from having it IMO.
ngl this matches what ive seen running small ops. the hype is way disconnected from whats actually useful day to day. the real value isnt some magic in the model, its finding what problem it actually solves for your specific situation. most companies just buying in because theyre afraid of missing out.
@ariadne I do not talk as an educated in the field, but my wild guess, the AI craze is like the evolution of cloud computing business model that some corporations are running from a decade or more.
A way to move workflow into their services even when this workflow could be done offline.
@ariadne I've been skeptical of it from the beginning as well - in part because of a delightfully weird project called Neuro. She's an AI virtual YouTuber who can autonomously stream, sing karaoke, play Minecraft, interact with guests, call and message friends on discord, talk to her chat, and more, all before the recent LLM boom. Which corporation was responsible for this marvel of modern engineering? None of them. A single British dude made her out of an osu! bot because he felt like it.
@ariadne I forgot to mention, if you haven't come across them already you might be interested in checking out the Olmo series of models. AFAIK they are actually open source in that they tell you the training data and it's available. It's not just model weights.
@ariadne the reasoning marker is disjoint from moe, it’s about meta prompting going on in the thread to add a little recursion which the model architecture itself lacks. moe is primarily a cost optimization enabling large portions of the network to be offline/uncached while processing a query.
@ariadne I wish they trained on Wikipedia!!! Then it could at least possibly return useful results, vs the regression to the mean idiot garbage as I call it.
@dvshkn tell me more
@mirth @ariadne There is a reason why we don't use natural language to tell computers what to do: Natural language isn't precise enough, and it's quite often ambiguous. Even when in the context of everyday life the text has only one reasonable meaning, you can often find one or more possible meanings that are nonsensical or silly. Fairytales often contain mischievous fairies misunderstanding human wishes on purpose. Jokes often use things like that. We invented computer languages in order for every instruction, every statement, every procedure, to have a structure that can mean only exactly one thing and nothing else.
yes, but i think natural language an an *interface* is still useful, and so SLMs are useful here because you can do things like
"please turn off the lights" --> {"action": "lighting-control", "state": "off"}
and I think weekend-sized models are perfectly fine for that.
@ariadne Basically if you find you need more general purpose training data, or something else, in the future you could pick apart what they did to make Olmo since all of that information is available. Allen institute has some other models and initiatives, too.
there are of course other paths for that which don't require models, but using a model to process natural language and translate the intent to structured data seems like an obvious path to ensuring consistency across different languages
verses say, manually looking for specific keywords in the text to infer intent, but that requires maintaining large sets of keywords and so on and so forth and it turns into a nightmare.
that a model can guess what the intent is and represent it as structured text with a confidence score is useful.
@mirth @ariadne
What I'm worried about is that it turns out the market for software really doesn't care about reliability, because an app that barely works but is first to the market wins over a well-engineered app that arrives late.
It does seem like that's the case currently, but hopefully this is just the "fuck around" phase and a "find out" is coming.
@ariadne @mirth Call me old-fashioned, but I don't like to talk to my house. I'm lazy and I like to control the lights from the sofa, that's why I've got a few remote controlled switches. A few more lights are controlled by a Raspberry Pi via GPIO, and I use simple command line tools to control those, I wrote them in Python without any LLM, and I access them from any device on the LAN via SSH.
led-bar-rgb 192 128 96
...and the LED bar switches to a nice pastel orange for a nice summer sunset feeling.
*you* do not, but many people i know, including myself, want a libre voice assistant.
a voice assistant requires the ability to process arbitrary natural language and make a reasonable guess as to what the user wants.
hence the need for a model.
@ariadne @LordCaramac In my prior thinking about this kind of problem I came to the view that handling assistant-type requests at its core starts with program synthesis, but not any kind of clarity on how those programs should be written. It would be an interesting exercise to make a catalog of representative queries, and try to hand-write pseudocode or Python for each just to get a sense of what the deficiencies of that approach are (I suspect the answer is a special-purpose language, unsure)
@ariadne
I just so happend to try writing a timeline sumarizer yesterday. Not having the hardware and skills to train an own model specifically for this, I had to stick with pre-made MLMs (medium canguage models) from the Ollama repo though. Only having a ten-year old laptop dGPU available, I had to stick to ~1,2b models to not run out of VRAM and apparently those are considered just too small for that kind problem (while still being slow).
Of the ones tried only DeepSeek-1 (general model) and LFM2.5 (supposidly optimized for summaries) ran well enough and depending on the exact input tried they would sometimes produce adequate summaries, but start falling apart or fantasize on even minor input/instruction changes. (And also LFM2.5 read like it was trained by a middle manager no matter what I did?
)
Apparently ~7b models are supposed to be much better for this, but I kinda wonder: Do you think a custom-made model would be able to do this even with even <1b params too? Or it really cannot?
@curiousicae models aren't good at summarizing no matter the parameter size, they summarize by deletion without real understanding
@ariadne What are your thoughts on specialized LLMs for things like writing software or formal proofs?
@pixx @mirth @alwayscurious that's just for training. CPU inference works well enough.
@pixx @mirth @alwayscurious in concerned about the copyrightability of the code generated by LLMs
@pixx @mirth @alwayscurious I am concerned about both, but case law so far shows that users using the model are probably fine, while commercial AI operators are liable for operating in bad faith.
@pixx @mirth @alwayscurious and it being not copyrightable is a problem because not all jurisdictions have "public domain".
and "public domain" is also a risk to OSS licensing.
@ariadne huh, interesting. it does make me wonder if it's all just to keep juicing the chip market, or something
@ariadne @curiousicae this was ‘’mostly’ fixed sometime in mid 2025 for the best frontier models, I had exactly the same issue and bitched incessantly.
I think there’s no public paper explaining what was done to address it unfortunately. I’m very interested to read it but everything I find is nonsense about context length and retention which is not the same.
There’s a lot of fun stuff at the edges of this using your own model, implementing CoT and distillation / fine tuning is enlightening.
@ariadne @mirth sorry to reply to you in separate threads but I missed these original posts.
You are correct about this. I worked in a tertiary way on something with formal verification and some other sorcery involved doing pretty much exactly this pre-public LLM but way after expert systems (which were before my time).
There’s fruit there that pays off, they sold the implementation.
When you say "ethically sourced from $HOME/Logs"......
@fxchip I mean there is probably at least 10GB of goatse ASCII art it got trained on
@fxchip and the first iteration really loved to talk about supernets #Superbowl but whatevs
@ariadne I've had the (mis)fortune to do some full time research on tool calling LLMs for about half a year now. I wanted to know what all the fuss is about and the org I work for decided to pay for my time, some hardware and time on the GPU cluster.
I mostly agree with your assessment. If you just want a natural language user interface (in one or two languages) a small 1B or even just 500M parameters is enough, given you use one of the more modern architectures. To train those the "chinchilla optimal" amount of tokens would be around 20B tokens. That should be doable. Think Wikipedia dump + synthetic data that teaches it how to call tools.
The hardest part that I could identify is how to match your tools and tool prompts in a way that does not clog up the context window and still get correct results more often than not. Also there are some fun tricks that you can pull to catch fuckups by the LLM (there are a lot of those) by analysing its output and the tools it called. This way you can automatically slap the LLM when it starts to bullshit and have it self-correct.
Overall there's surprisingly little magic there. Just lots of badly written tooling.
@sebastian yep, one of the wikipedia database dumps is what my next experiment will be
@ariadne the fact that teenager me is in there is horrifying
@ZiggyTheHamster I only took the text not the nicknames 😂
@ariadne well, at least my dumb messages won’t be attributable to me :)
@ariadne @pinskia @mirth isn't this whole AI infra grift about scalability? Big tech want to offer LLMs interactions on demand for anyone. I think the process of LLMs training is only a part of the problem. You also need to run those models on some hardware.
Like you could do it on your own machine just as you run linux but most of normies dont know its possible. They bought idea of subscribtion model for commercial AI system just as the idea that you need subscription to watch movies online.
The famous Google "we have no moat" paper continues to hold, technically. The moat became marketing, funding and denial of access to market. As it usually does when monopolists latch on.
@ariadne
@ariadne Have you uploaded the code you used for this anywhere or did you only use available tools?
@jak2k I just have some patches to the scripts they use to train qwen
@ariadne What are the patches doing?
@jak2k they are just tweaks specific to my setup
@ariadne
I was on "the early days" (in my perception) playing with an llm on python. I learnt really fast how the inference worked, and instantly i tought "huh, can i make this thing generate bash commands? And make a guardrail such that the user has to confirm those commands?"
The next week, agentic AI was marketed as a revolution getting us nearer to AGI, and i am a doubter since then.
I GOT THE SAME IDEA IN 5 MINUTES WHILE ABSOLUTELY STONED.
ERGO, THIS IS NOT A GOOD IDEA.
- replies
- 0
- announces
- 0
- likes
- 0
@wombatpandaa @ariadne Neuro is a commercial LLM. Possibly an open weights model fine tuned and run locally, but a standard commercially and unethically trained LLM nonetheless.
It's a persistent myth that Neuro was trained from scratch, Vedal never claimed that.
Neuro debuted in 2022, GPT2 was released in 2019.
For one example of how it might work, see here (this one uses Llama3): https://github.com/kimjammer/Neuro
@ariadne Would you be able to create a useful LLM coding agent with ethically-sourced training data and much less resources than they do?
Genuine question.
@alwayscurious I have no idea but that isn't a goal
@ariadne In a similar vein, a Youtuber I follow just posted a video saying "Why would anyone pay $20/mo to an LLM provider when you can download free models and run them locally?" Their business model is cooked.
@domo it wasn't $10k when i got it a few months ago