abi stability is another topic, and that one is more valid of a reason, but i'd still say it's not that hard, use sonames properly -- but if you *know* the struct definition will be changing a lot, then it's fair (but consider structuring your api differently, be modular and rely less on one single big struct type)
@navi @lanodan The mian advantage is that you can change a getter from “return an internal field” to “return a field in an internal structure” or to computed property, all while maintaining API compatibility.
The runtime cost is real, and ideally you want the language/compiler to optimise the “it’s just an internal field” case.
Although a concern that I often feel like is a third-party binaries for somewhat isolationist distros and proprietary software, as otherwise bumping soname ABI is typically fine.
@navi It's not really just about that. If you're trying to create a compile time (or hell, runtime) backend system, you can snip off the opaque data into its own struct within its own file. If you've ever worked with a C++, one of the worst things you'll ever deal with is header bloat, and overtime as headers get pulled in, you'll end up overincluding things you don't want and compile times end up actually tacking up. Trust me, a project I work on happens to take ~8 minutes to compile on a few cores and even simpler files can take 6-7 seconds each to build, becsuse the original devs just exposed everything over each header (with occasional forward decls when possible, which can sorta do similar to what im gonna describe), and it just ends up being a bit of extra work and everyone has just kept putting up with that. The solution to this is just make private inplementations of the data, so at most the c++ file wont be pulling in STL and sdl or wayland or glib headers that you dont need (this gets groser for those "unity headers" which just include everything for convenience).
It's just generally good practice to not expose a lot of stuff that isn't needed to the user. If your struct is mostly ints and chars and a pointer here and there, its not neccesarily something you need to hide. But even then, in some more complex codebases where you DO care about api stability/abi, you shouldnt really be just giving someone a struct anyway and should have some getters in the case where you need to do "backwards compat" or actually modify what happens to get at the data someday.
But its never, to my knowledge about "treating the user as dumb." in the case of public, private it kind of is.. but.... if you intend for data to never be poked at, keep it hidden, because otherwise people may screw at a struct they saw and then risk things breaking (oh i changed the x value but forgot to update prevx and change this thing, this wouldnt have happened if i used change_x(int)...)
I think ive only had to request data be exposed once in my life for a library many years ago, and even then, the guy just exposed a function to get at what I wanted.
most of c++'s compile time include costs is template instantiation, and those can't really be "opaque" anyway, and a better solution to it is to modularize your api, and to https://include-what-you-use.org/
headers should only include other headers if they *need* the definitions that are in those headers. if your types or functions don't, then you include the other header in the translation unit that does -- yes, a lot of people include random things in the header because they use it in the c file, but that doesn't make it a good practice
and yes, you can also do some sort of polymorphism by making types opaque and changing the definition based on a backend, but that's an exception rather than a rule
"hiding things from the user" is not worth the mess and runtime costs you get from it, now every structure needs to be heap allocated, now every public field needs to have a getter and setter, your library is twice as painful to use, unnecessarily slower, for a small benefit on the general case
if i intend data to be poked at, i document it, otherwise don't touch it. in c23 you can even go a step further:
```c
#ifndef mylib_private
# define mylib_private [[deprecated("private field")]]
#endif
```
now you compile your code with `-Dmylib_private`, but clients of the library would need to actively bypass compiler warnings to do stupid shit, and if you really really want, pre c23, you can even do things like wrapping private data in a `.private` struct field
making the whole struct opaque doesn't just impact "users can't touch it", it impacts your whole api, adds runtime constraints and limitations, and complicates memory management in many cases
@navi > headers should only include other headers if they *need* the definitions that are in those headers.
that was.. quite literally my point. I am explaining to you why people _need_ (or rather, like) to have private implementations, because they want to dig out stuff being exposed. This applies to both libraries and applications.
> every structure needs to be heap allocated, now every public field needs to have a getter and setter, your library is twice as painful to use, unnecessarily slower, for a small benefit on the general case
do you... actually believe this is 'slower'? like esp on modern memory and being paired with a good malloc impl and all that... a little heap redirection (and allocation) isnt going to cost that much. Unless you're in a tight for loop where you really care about how your CPU is caching, this is not something you should _actually_ be worrying about in your code, like.. unless you are really tight on restraints. Seriously. Benchmark a running program and tell me how many waiting games you're playing where this genuinely a noticable issue. I'm no java programmer, i tell you, but there are plenty of areas where performance means a lot, and this one isnt really it....
its also worth nothing that heap allocation whilst keeping private is a _deliberate_ choice by the dev, not just because they dont want you digging but quite literally because that struct always changes and usually you just pass it around... opaquely. its not done because they hate the users but because they just want their data in the heap to begin with. Otherwise you'll end up with padding tricks and v2 v3 stuff structs to work around it. It's a technique to save energy on maintenance, not performnace.
> "users can't touch it"
once again, it was never really about this.
> most of c++'s compile time include costs is template instantiation, and those can't really be "opaque" anyway
im aware, but i dont believe i bring that up.
(sorry of this comes off harshly, i just woke up, headache etc)
that was.. quite literally my point. I am explaining to you why people need (or rather, like) to have private implementations
since when self-contained headers require private implementations? my point is about being mindful of what your headers declare, nothing to do if ‘thing’ is declared at all or not
foo.h can have struct nya; and use struct nya * on it’s declarations, then nya.h can have struct nya { ... };, and now the user can allocates nya anywhere they want, stack, heap, array, inside a hashmap, and pass in that to say foo_do_thing(&nya), then foo.c includes nya.h and then it does whatever it needs
and so what i meant in the end was that foo.h should not include nya.h just to use struct nya *, but a lot of developers do that and unnecessarily re-export symbols
do you… actually believe this is ‘slower’? like esp on modern memory and being paired with a good malloc impl and all that… a little heap redirection (and allocation) isnt going to cost that much
it’s not going to cost that much, sure, but a) embedded and old hardware still exist and are still functional, and b) being “unnecessarily slower” is just one of the points
but it can also be tangibly slower, if you’re keeping an array of $thing, e.g. some library context per-connection, you could have an array of packed connection data to iterate, but instead you end up with an array of pointers and iteration times get painfully slower – and if you don’t believe me that it matters, see this talk: https://www.youtube.com/watch?v=IroPQ150F6c
Otherwise you’ll end up with padding tricks and v2 v3 stuff structs to work around it. It’s a technique to save energy on maintenance, not performnace.
i did address this on op:
but if you know the struct definition will be changing a lot, then it’s fair (but consider structuring your api differently, be modular and rely less on one single big struct type)
but if you’re not breaking abi so often, and when you do, bump the SONAME and that’ll make (good) build systems and distro tooling rebuild things pretty much automatically
@navi and its important to note: there is a time and a place for private impls :) there is also a time and a place for raw structs youd just work with. Its a design decision because sometimes one makes more sense then the other. Not everything is fit for the pimpl or opaque struct design. You deliberately pick and choose the one which you think fits, and thats up to how you plan to use the struct.
Its never a project specific thing but just something you do when it makes sense and youre confident the data will change a lot, for example, and yes, the pointer redirection, like x y values could maybe slightly possibly have a tiny performnace overhead. :)
@navi Take sdl for a simpler example, there are many things that get pimpl'd where it fits and plenty of things where structs are deliberately exposed for you to fill in. It entirely depends on what youre doing and how youre doing it.
it's so painful to work with, say, libinput, having all event objects be opaque structures, when the event data itself is basically just a few numbers from hardware, and those ain't changing since hardware ain't changing, if anything would change it would be a new event, but instead, opaque structs with _get functions, just because
and while libinput is the most fresh bad example i have, there's plenty others too
stable abi is nice, but it's only really a "problem" when you can't rebuild, aka, with proprietary software
@navi I'm pretty sure I agree with you. The only reason for an opaque-blob struct is when you're making a framework and you expect the struct to be supplied by a user of the framework to pass data specific to _their_ code which will be used by the framework. Eg., supplying a cryptographic algorithm to an SSL library as a plug-in. The structs to hold parameters to it should be opaque to the SSL library but absolutely NOT opaque to the plug-in code.
though i do see quite a few newer apis giving you a concrete struct (usually the one that holds _ops function pointers), and passing that to the pointers themselves, expecting you to use `container_of` or similar constructs to get your actual data
different solution but the same problem, my post is more about libraries that make things opaque for no reason except "but my private fields!"
@navi > but instead you end up with an array of pointers and iteration times get painfully slower
I was deliberately implying that by the way, which is why I was referring more to "pointers." It's not cache friendly at all, but its not something you need to pull hair over. Hell you could have that sort of thing in a game loop these days. In fact in my old game engine i made when i first started programming this was one of the first things i learned not to do (it was for tile data, mind you, lmao), but even then, given the amount of data i was actually dealing with, it wasn't significant enough, and i mean damn, youre meddling into a hearty dose of nanoseconds at that point, which isnt great but its not really as insanely slow as you think it is.
> since when self-contained headers require private implementations?
never said that, in fact i later said that its not even about that. You pick and choose what you think fits.
RE: bump SONAME
a lot of these changes tend to be internal though, which is my point, so youd just make the functions change and well, deprecate the behavior but still keep the functionality. That is precisely why people do this sort of thing. bumping soname is fine but it can get pretty obnoxious doing it once a week.
In a big, heavy moving project where even little abi breaks can cost a fortune, you have to account for this stuff; you need to provide backwards compat and consistent behavior between functions,even if internally the logic for say, getting the x value of something is to change, or libinput junk. Sometimes it can be misused, wont deny that, but its more precautious programming if anything...
I want to reply more but i need to head off here. Thanks for chatting! ☺️
> You pick and choose what you think fits.
i'm not trying to, i'm trying to understand what your point was and talk about that, but seems like i can't seem to understand what it is
> bumping soname is fine but it can get pretty obnoxious doing it once a week.
well, releasing once a week would be hellish, but, versioned symbols are also an option, or, try to not change things so much, the more modular your interface is, the less any individual thing needs to change, and you end up adding new structures way more often than changing existing ones
> I want to reply more but i need to head off here. Thanks for chatting! ☺️
good luck out there, thanks asw
So you also end up with quite few FOSS projects providing pre-built packages or even non-packaged binaries to fill the void.
i remember people downloading .deb and .rpm out of a developer's website and it was painful
last i heard of community repos for debian, they were trying to copy the AUR and MAKEPKG files -- this... says something about debian's recipe format
But at the same time, I hate the debian mess of files approach and I'm probably not the only one. While ebuild, abuild, rpm spec are okay to sometimes pretty comfy.
@navi Indeed. For that case the library should be exposing a type and using visibility to control access. Unless you're dealing with a language like C without visibility control, and most of the time you don't actually have a technical need to prevent access to private fields beyond telling the developer "this is private, don't mess with what you know not of".
c23 has a way to mimick visibility, by building with `-Dmylib_private` and having:
```c
#ifndef mylib_private
#define mylib_private [[deprecated("private member")]]
#endif
```
but tbh even without that it's also possible to just say "do not touch any undocumented field, you've been warned" -- it's not like c doesn't already have api constraints that live in documentation (e.g. "this pointer may/may-not be NULL"), the less of those we have the better but one step at a time with improving semantics
at least it's not debian :D
So you get the downsides of sometimes arcane gentoo eclasses, with also the downside of "okay… where does this stuff comes from now??"
and yeah modern RPM is quite good compared to what it used to be like: build.opensuse.org/projects/home:mia/packages/vapoursynth-plugin-placebo/files/vapoursynth-plugin-placebo.spec?expand=1
BuildSystem: meson tells rpmbuild all it neds to knowthe same as `inherit meson` in gentoo, though in gentoo meson.eclass also adds the meson as a dependency, and c++ compiler is included by default in @system, so the only BDEPEND we'd have would be the 3 libraries
Otherwise you end up with effectively having the python-pip problem of having to unpack and evaluate to know the dependencies.
imo the best use for that kind of introspection is to fill a template, reduce how much a developer has to type and streamline the process of making a package, could even be automated to make a bump in response to say, a new release detected via rss
but the thing here is a package maintainer should still need to look over the result before pushing the repo and building, and we do not evaluate dependencies at build time, beforehand
Otherwise it more seems like an attack vector on packagers, like running a meson file can mean executing arbitrary commands. (same for setup.py, hence my earlier example of python-pip)
since meson.build is not turing complete, i'm pretty sure you could build a full dependency graph without evaluating anything, and noting in the output what the condition would be
and then for the sake of gentoo, we'd match `get_option` conditions to useflags, any condition that queries meson.{host,build}_machine we can also map, since those are semantic, and anything else either assume true, or leave to the discretion of the packager
@whynothugo @lanodan @navi get someone interested in them who already is a packager to do that
(though uploading to Debian proper is much more effort, it’s a commitment to stable support and all that)
this is not a bad thing, even if it means random upstreams cannot just get their code in… which has proven to also not be a bad thing, mind you
@navi @toast @whynothugo @mia You could have restricted meson but it would be incomplete, like there's still annoying libraries without a .pc file and other kinds of weird stuff.
i remember PPAs being hell and breaking all the time, though -- community repos like the AUR or GURU at least allow other users to swoop in and fix the package, so while i never heard of debusine before, i'm weary of this just being the same pain as PPAs all over again
dependency(), find_program(), and cc.find_library(), are the 3 calls we'd need to look for, any other sort of "manual" dependency would be out-of-scope for this tool, as the goal is not to automatically generate a perfect meson package, but to get 90% of the way there in the most common cases
And I'm not saying upstream should make packages, in fact it's effectively what happens with third-party Debian packages, which typically means garbage packages as upstream might not daily-drive Debian.
Instead it should typically be more like experienced Debian admins, that might already have messed with packages.
@navi @lanodan @whynothugo that’s what OOP proponents say, but honestly, how often do you do that when you have a good design upfront?
@lanodan @navi @whynothugo the cost of rebuilding the entire r-deps cycle (it’s often not a tree) is high though, especially if things in the middle break for unrelated reasons but the whole bunch has to move to testing as one to stay installable
GURU has less chances of issues happening as it's submissions to dev branch that's then reviewed by trusted committers against suspicious changes before pushing the branch to HEAD. (Which can comment/ping on some commits for improvements, but aren't required to)
Part me of still prefers the more thorough review process (as for example done for proxied-maintainers in Gentoo, and for Alpine testing & community repos), but it also doesn't scales well given an active community.
Often feels like a hack but I've often seen the cycles being broken by splitting the package recipes (which can also have the nice effect of an isolated package when binary seeds are involved).
@lanodan @whynothugo @navi yeah, it’s annoying. There’s a very strong movement towards declarative-only for 90+% of cases, which has upsides but is annoying for packagers.
And debhelper is magic of its own (and cdbs black magic). I only really understood Debian packaging when someone posted the "goodbye" Debian package, a pun on the "hello" packaging example for debhelper.
https://debr.mirbsd.org/repos/wtf/dists/sarge/wtf/Pkgs/ has a few examples of packages done in the "goodbye" style (doing the actions manually, with the corresponding debhelper calls as comments), but debhelper’s benefit is that it can do more, e.g. it gained the ability to limit the timestamp in PNGs to at most the source date for reproducible-builds.
I think one should know both extremes and a bit in the middle.
Ofc Debian initial packaging of mksh took me two whole days*, a quarter day for pkgsrc, and another quarter day for a number of other build systems. But the integration is worth it.
*) and I’ve still not read the library packaging policy because I don’t package any C libraries atm…
@navi @whynothugo @lanodan Packages should be done by the distro, never by upstream, and I will die on this hill
@whynothugo @lanodan @navi yeah.
We got something like that with @freexian ’s recent debusine repos for packagers and @wouter ’s already in long-term use extrepo framework.
Not sure if a wild west of add-on repos is a good idea in general… though having offered one for over a decade and a half by now, I can’t lean myself out of the window.
It beats upstream-provided "files in .deb format" anyway.
@whynothugo @navi @lanodan once they all are built you don’t need to break the cycle for rebuilds, merely take care to not recurse too far.
Famous example of a justified circle is the krb5 - openldap - cyrus-sasl2 one.
@mirabilos @whynothugo @navi I'm on the side of things where the current recipes should effectively be clean enough that a rebuild from some sort of seed should pretty much be guaranteed. (Like AFAIK Alpine does that before releases, and NetBSD allows to build it's base from a simple toolchain + few common unix utilities)
One reason being to make sure you can introduce new architectures (or change system ABIs, like time_t 64 migration is effectively that), another being audits of builds.
I've seen way too many packages where I had to dig in the version control log and found out they just grabbed a random binary from another distro or upstream binaries as bootstrap, and of course that part doesn't always gets archived.
char pointer aliasing for byte-wise access is allowed, and as long as you `container_of` with the right type, the effective type of the memory would be constant, as well as the inner pointer is obviously part of the same allocated object
so i don't ever found anything that would make it UB, even though there's nothing explicitly allowing it either
packages breaking once they’re in tumbleweed is rare but if it’s something that OBS can detect then you will get emailed about it
@lanodan @navi @whynothugo but reality isn’t like that. Update GCC, boom, hidden issue with a program surfaces, or a library whose packager did not disable the -Werror upstream added fails its configure tests. Fix a bug in the shell, boom, script relying on the bug breaks. Etc.
There are full archive rebuilds to catch some of that, but fixing takes time.
By the way, src:musescore2 again doesn’t build because cmake (this time 4.0 -> 4.2), can you help? ;-)
@lanodan @navi @whynothugo bootstrap is a whole other problem and being worked on, but once you have reproducible-builds (a much less hard problem), many of the problems with bootstrapping vanish.
New architectures tend to start by cross-compilation these days, not random binaries from other distros or upstream. But if upstreams make it hard (Kotlin, Rust, …) it’s hard.
@mirabilos @navi @whynothugo Well yeah, stuff breaks sometimes, and some are accidental.
Others, are more like tech debt which ought to be addressed, like for -Werror there's tooling against that (Like Gentoo got QA checks based on -frecord-gcc-switches)
yes. visibility in OOP is about that. not trusting their users
>use sonames properly
yeah but it's also pretty much unix only thing
yes. visibility in OOP is about that. not trusting their users
oop languages have private keywords, this is about c, which in c you need to be aware of the semantics of any api you use anyway, modern standard version are slowly adding syntax to express those semantics (in c23 you can even make a pseudo-“private” member using the [[deprecated("reason")]] attribute), but we’re not there yet
c is a language where you have to trust your users otherwise they’ll already break things anyway
yeah but it’s also pretty much unix only thing
windows don’t usually have many third party system dependencies, the standard there is to ship dlls with your exe – but even there libraries usually have a naming scheme of $libname-$so_version.dll when they’re versioned
and aside from windows, all other platforms i see people targeting either don’t have enough of an operating system to have runtime libraries, or have them and support sonames
@whynothugo @navi @lanodan that’s roughly the average timeframe for rebuilds of Debian, too
>c is a language where you have to trust your users otherwise they’ll already break things anyway
yeah but also i've seen cases where people either repurposed supposedly "unused" fields or added new without recompiling the library of course and that expectedly lead to memory corruption.
not saying this should be a general rule to use opaque types but I can understand why some would choose to not trust their users
yes I've meant windows but being able to swap the DLL that's shipped with the program is I think a cool feature. opaque types in this case just help with the backwards compatibility burden. (and I think that's basically what you've said earlier, my bad)
and imo the negatives of "everything's opaque" are worse than this one specific reason to do it (there is good reasons to do it, just, not the ones i mentioned on my post)
swapping so/dlls is cool, and i honestly believe that a well made library that has it's data structures actually designed (rather than just adhoc reactivelly modified) is not only often better and faster, but also won't have noticeable breakages -- the same way that well designed functions don't need their parameters changed often
it's more work, but it's also a way better end result, imo
If you dig deep enough, there's always a binary that was used for bootstrapping that isn't available anymore.
How far you need to go for that is what matters, IMO. Debian's process for adding a new port to the archive is:
- someone builds the initial chroot using cross compilers or vendor binaries or something.
- these are used to start the port on ports.debian.net
@whynothugo @navi @mirabilos
- replies
- 2
- announces
- 0
- likes
- 1
- the build daemons churn away at building unstable constantly, both to prove that the hardware can keep up and that the toolchain is not riddled with bugs
- after a few years of this, the architecture is added to the official archive. packages from ports are used to build a minimal chroot
- the build system is configured to recompile those packages
- now they build the rest of unstable
@mirabilos @navi @whynothugo
End result is that you have several years worth of build logs on buildd.debian.org that you can go and audit, as well as binaries on snapshot.debian.org.
And also the core packages are all reproducible, so you can verify independently.
@mirabilos @navi @whynothugo
See https://github.com/fosslinux/live-bootstrap/ for a start.
was basically some 3-step process of some potato VM (like a forth machine, uxntal, etc) that "a person could implement in about two weeks," with the sole job of booting a larger image (smalltalk/lisp style) where the actual tools lived. (stage 3 is basically just "and then we added the optimizing compilers" etc)something somewhere still has to write the stage 1 but there's always punch cards and assemblers i guess.
i think people just don't consider it to be a problem that needs solving because for the most part hardware flexiblity is completely gone. you're on an ARM or AMD and that's it.
Yes, I agree about putting at least some thought into data types but in my case it was and still a lot of very legacy code and I wish things went differently almost three decades ago.
but when i get chances to write new code, i'm always applying the concepts so that at least in 30 years from now, i hopefully have less to complain about
okay i wanna ask to please get untagged from this bootstrapping thread (can't mute the thread as there's other subthreads of this post i'm still following, sorry)
And yet, since 2000, Debian has never had less than 6 ports, and for much of that time the counter was at 11.
Looks like it will be 8 for forky...
@whynothugo @lanodan @navi @mirabilos
@lanodan I'm not sure how well GURU will scale, it's already a lot of work, even if the devs hardly review most things.
@navi @wouter @lanodan @whynothugo only for amd64 tho
And for platforms beyond those like say sparc and macppc they sadly tend to have a lack of software support, plus are getting more and more hard to obtain for new developers.
That said that bit is more to provide a path with avoiding binaries, solving issues like say the C compiler being ultimately derived from some unknown binary (which could still be archived as part of the artifacts the packages build from).
For the other arches you could then cross-compile to obtain a seed.
@lanodan @whynothugo @wouter @navi Debian archivs a whole lot of things.
I also still have my slink CDs.
@icedquinn @whynothugo @wouter @lanodan @navi that’s so annoying.
Porting to all the different architectures has pointed out actual bugs in code.
@lanodan @whynothugo @wouter they’re for x86 though.
Totally useless if you’re on a new arch, say shoort64, which did not exist until last year, and want to bootstrap it, which also means you cannot just build older versions until you’re there because they won’t have support for that new architecture.
That’s why I see “the bootstrapping problem” as not as relevant for bootstrapping ports, as long as you’ve got reproducible builds and can cross-bootstrap.
(IIRC from the discussions in #debian-ports, things like rustc are cross-built regularily because some arch doesn’t keep up or the mandatory in-between version had a bug for that arch. And Helmut has done enough word on cross-building that an import of an arch into dpo, debian proper, or both could conceivably be cross-built instead of relying on maintainer-provided binaries.)
solving issues like say the C compiler being ultimately derived from some unknown binary
With reproducible builds, it suffices if you can later cross-build it via a different chain, say a BSD with pcc → GCC 3.x → GCC 4.4 → GCC 9.5 → current GCC → cross compiler to the OS and arch you’re interested in. And a $vendor_unix with a similar chain but from a vendor C or C++ compiler. If you eventually arrive at the bytewise same binaries, you’ll have won.
@lanodan @whynothugo @wouter (Ada is more of a problem; GCC 3.x GNAT cannot build GCC 4.x GNAT, so there’s a discontinuity there.)
🔜fosdem