My job as a senior developer with a team of juniors is to figure out what to write, sketch a PoC as guidance, and then delegate the actual implementation to them. I'm going to look at that, explain misunderstandings or poor style choices, and guide them into implementing something that meets our standards.
I don't think LLMs can do my job yet. But I think we're getting shockingly close to them being able to do the other part. And I'm worried how we're going to get more senior developers.
@mjg59
On one hand, very much yes.
On the other, I just read an article about where we currently stand with climate change, and I don't think "where will senior programmers come from" is going to be that much of an issue.
@nicolas17
Or if not that, wars for resources and otherwise trying to keep enough of farming going will require too many hands directly applied to problems for them to be spending time on keyboards.
@mjg59
@mjg59 Musk tweeted that this is the Year of the Singularity, so I don't think you have to worry about the senior developers part either... 🤓
I would not have said the same thing 6 months ago - the amount of progress here is significant. And I'm not denying that the technology has resulted in massive quantities of poor quality code produced by people who aren't in a position to review it, or that the externalities of all of this are large. But capitalism isn't going to give a shit, so we're getting all of this anyway whether we like it or not
@mjg59 i think it will take learning a hard lesson for capital to come to any good conclusions about this tbh, it will be a rough time and the labor market will contract no matter what
@mjg59 do you have some way of evaluating that progress in the last 6 months in some way that is not the subjective impression of improvement?
@mjg59 every mid-to-large FOSS project is seeing their "Good First Issue"s getting sniped by 20 LLM bots. Those exist to feed new contributors into dedicated ones. If you cut the bottom rungs off the ladder, how is anyone going to be able to get to the top?
@greg yeah, exactly. I've helped people turn into senior devs, I don't know how to turn an LLM into one - embodying good taste is a different problem to generating code that meets a functional description
@mjg59 I think of it in the same ways a lot of programmers just use high level languages and never look inside - I guess many juniors will be LLM minding from a high level task; the better ones will be the ones who figure out when it has a weird problem or think of a better way than it did.
@mjg59 I dunno. Sure we'll get a lot of (more) terrible apps, services, etc in the short term. "AI" is sort of accelerating the "everything is unreliable slop-ware" trend that's been infecting software development for the past decade plus.
At some point I suspect there may (once again) be a market for software that's not utter garbage.
I feel bad for everyone stuck working for these awful companies (increasingly *all* companies), while the industry destroys its capability to write software.
@glyph not at all, other than my occasional requests for the robot to write code for me getting increasingly close to code I'd be willing to deploy
I think we are going to see the end of the agency model being entrusted with deep technical knowledge and concerns.
But we are also seeing the beginning of a new leaner type of business where senior specialists can now establish their own business remotely, with developers of all skill levels reporting to them from around the world.
This is a huge disruption that I think is going under the radar. Or perhaps is intentionally under reported cause it's a threat to the capitalist agenda.
@mjg59 thanks. that kind of data is really hard to come by, so I am just asking everyone with this experience :)
@mjg59 I'm obviously missing something - they are definitely not that good.
@mahadevank It involves a bunch of handholding, but honestly pretty much?
@mahadevank Definitely not for all cases - they're massively better at boilerplate than implementing a poorly written specification
@mjg59 ah ok, as long as we're talking of handholding - in my experience, its destroying Juniors - they've stopped using their brains because the tech lords have told them to focus on "higher-order" problems and not worry about code.
@mjg59 As someone who is trying to develop the high-level design intuition to become a senior dev, I honestly think that I'm going to crash out before I get there. I don't want to lead a group of agents who will build a thing for me. The prospect of that feels so unrewarding, and it's probably the thing that will take the ambient devaluation that I've felt increasing for a long time and push me over the edge.
@bersl2 oh god the idea of not getting to actually help people develop is deeply depressing
@mjg59 @greg I agree wholeheartedly with the junior pipeline problem, though I suspect that we end up with junior devs who are good at piloting the models, and learn to debug even hard problems within that context.
We didn't stop being able to computer when people stopped learning assembly or c, I hope we have a similar outcome here.
@PaulM @mjg59 Someone I respect has said *some* version of this to me every month since ChatGPT first shipped though, and I am tired of retesting various models and having them all produce the same hot garbage for my problems, while wondering if they're slowly making me psychotic as a side-effect. I keep asking this question because if *hard* evidence shows up, the kind of ROI you see on a balance sheet, I don't want to miss it.
@glyph @mjg59
that's entirely fair, and they have been getting better, but what constitutes "worth using" is pretty individual. I'm curious if you have any examples of something you'd quantity that way.
Maybe some relatively complex feature or bugfix you already wrote that you'd like to use as a benchmark for capability? Alternatively, a couple of trivial features you'd like in a personal project but haven't gotten around to building?
At a more mundane level, I suspect they could reliably alleviate a significant amount of the drudgery associated with maintaining OSS - fixing tests when dependencies are updated, etc. Nothing you can't trivially do yourself, but also in my experience painful to try to get the ADHD brain to pay attention to.
@PaulM @mjg59 At this point I am too nervous about the risks to actually touch one for anything non-trivial, and I think everyone should refrain from their use for ethical and safety reasons. One pretty robust argument in that discussion is "they're most likely actually an economic drain, even if they seem useful". But this is a tenuous argument that might become false at any moment, and if I'm not using them I won't know when that moment is.
1. Why do some people develop AI psychosis and others don't? Or does everyone eventually succumb and we just haven't used it longitudinally enough? Hormesis or linear-no-threshold?
2. How can one maintain a balance of failed-vs-successful prompts, to avoid time-wasting? Intuitive evaluation will always favor the successes.
3. If the tech *does* actually work, doesn't give me psychosis, and works more often than not with enough of an edge, de-skilling seems like a big problem.
@PaulM @mjg59 Related to 2, I am also concerned about addiction. ADHD is highly comorbid with problem gambling, and I don't want to be putting myself in a daily behavioral loop where I'm getting a little thrill from every minor success, even if I do, in some circumstances, have a demonstrable edge over the "house" which I guess in this case is pointless re-prompting with no progress
@glyph @mjg59 for ai psychosis, my sense is that the observed outcomes are some combination of "already psychotic/already narcissist", people who are unusually susceptible to the same validation/reinforcement traps used in social media who discover the feedback loop can be instantaneous and permanently tilted in their favor, and an unfortunate subset of people who are prone to believe everything they read.
Which models they interact with, and how those are configured, makes a big difference. Some models are brokenly sycophantic, and that encourages this. Some models gladly engage in the kind of secret world government mind control "I discovered secrets the FBI needs to know about" kind of roleplay that draws susceptible people in. Training the model to refuse to go down these rabbit holes and keeping discussions factual is a hard problem, but one that modern models are much better at.
These dangers are one of the reasons that readily accessible open source model weights with near frontier capabilities worry me. I recognize that sounds hypocritical given my employer, but these systems are easier to misuse, and their snapshot-in-time nature can't benefit from ongoing safety work.
My belief is that occurrence is the product of underlying susceptibility, multiplied by unsafe model behavior. If those don't combine to meet a threshold level, people stay grounded in the real world. I don't see longitudinal use as an additional risk, although it obviously exacerbates symptoms for people who are above that threshold.
With modern models deployed with safety measures from the major providers, I think the risk is relatively low for most users.
@mjg59 "we're getting all of this anyway whether we like it or not", sounds like slippery slop argument
@chris_evelyn @mjg59 @greg to be pedantic, computers are only sorta kinda mostly deterministic if you squint at them just right. From the perspective of any given program executing in a modern operating system, there's a whole lot happening around it which is completely opaque, even if execution mostly proceeds in an apparently sequential fashion.
@chris_evelyn @PaulM @greg I'm a kernel developer, this happens to me more than you'd think
@chris_evelyn @PaulM @greg massively depends, a *lot* of the kernel is super boilerplate and it's largely fine at that, and then you reach the point where you're dealing with CPU errata and you're going to have a bad time. I wouldn't say no to it in general (and we know chunks of Linux are already LLM developed), but I'd have several concerns around its use in more specialised areas
The way to make it work is not to use a web interface, but instead to use a tool like https://opencode.ai/ to
- generate the code
- generate the tests
- run the tests
- have it loop over 'fix any failures and try again'
- test the code yourself
By themselves, they will get things about 80% right. That's not perfect, but with that feedback loop, enough to get something that works.
@PaulM @mjg59
- replies
- 1
- announces
- 0
- likes
- 0
@glyph @mjg59 I also worry about addiction. As Netflix learned, hours-go-up is likely a bad thing for your business to optimize in isolation, because its usually bad for your customers too. I know my team works hard to avoid that trap.
A lot of that concern is related to my earlier response to #1, but it can also be a great enabler of hyperfocus, which can be both very pleasurable and counterproductive.
All that said, you seem pretty convinced that using these things is like pulling the arm on a slot machine - sometimes you get a reward but a lot of the time you get garbage and have to try again a different way. They really truly are not like that these days in my experience, and have not been like that for a while. If you model them or their users that way in your reasoning, you will be making category errors.
As a user, (and maybe you'll say I have AI psychosis) the experience is more like working with a very fast, very precocious junior who has memorized half of wikipedia and is very quick with google, and who is getting better at writing code, but reasonably often needs detailed instruction or directional course correction. You don't cut their head off and ask the talent agency to send you a new one every time they give you an answer that doesn't quite match what you want, you clarify your request, or ask for a more achievable scope of work. Unlike searching google, your queries compound to vector the agent where you want it to go, conversationally, rather than standing alone individually.
@mjg59 having spent the last week debugging an intermittent timing bug in multithreaded code, which only occurred on a loaded production system, I'm wondering how an LLM will do that...
@mjg59 I say this to everyone who will listen to me.
Production of all hardware, building and operation of all data centres are huge environmental issues, and while human activity was certainly extremely polluting even before that, the whole content generating stuff comes on top of that.
This idiotically might not concern companies like your employer.
But content generation models shift power to those who own them.
This might also not concern your employer, but if it's a SW corp they're externalising their core product
@swetland @mjg59 hilariously tho we just aren't seeing any new successful startups built on "vibed" code, it's all headlines from big corps like Microslop about how "AI" writes like 80% of their code already and blog posts about building a clone of a web page to do $thing that's been done a hundred times, but… nothing in between????? As Ed Zitron likes to ask: where are the startups??
@valpackett @mjg59 Yeah I'm pretty strongly on team "if this is such a miracle technology why are they expending all their effort trying to convince others to use it rather than building things that couldn't possibly by built without it?"
@mjg59 @chris_evelyn @PaulM @greg examples?
@mjg59 @chris_evelyn @PaulM @greg Any examples of CPU eratta being relevant, other than the obvious security holes?
@alwayscurious @chris_evelyn @PaulM @greg "You must ensure that certain things have weird alignment otherwise the CPU will fault or return garbage" is a surprisingly common thing for CPUs to insist on and also typically not present outside kernels, so there's very little training data that embodies it
@mjg59 @chris_evelyn @PaulM @greg Is this found on the big CPUs or mostly limited to embedded?
@mjg59 @chris_evelyn @PaulM @greg Can the boilerplate be replaced with a (non-LLM) code generator?
@alwayscurious @chris_evelyn @PaulM @greg there's huge piles of "What does driver initialisation look like" that could be replaced with macros except that would reduce readability
@mjg59 @chris_evelyn @PaulM @greg Could it be replaced by a YAML file or similar?
@alwayscurious @chris_evelyn @PaulM @greg in theory? But that's not really what the kernel community likes
@mjg59 @chris_evelyn @PaulM @greg They finally did that for Netlink parsing.
@alwayscurious @chris_evelyn @PaulM @greg Less common on big CPUs these days, but it's the kind of thing that early Ultrasparc and 90s MIPS had a bunch of