IRC Logs of #wayland on irc.freenode.net for 2024-09-02

07:59 wlb: weston Merge request !1606 opened by Marius Vlad (mvlad) build: bump to version 13.0.95 for the RC3 release https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1606
08:06 wlb: weston/main: Marius Vlad * build: bump to version 13.0.95 for the RC3 release https://gitlab.freedesktop.org/wayland/weston/commit/186937dc9528 meson.build
08:06 wlb: weston Merge request !1606 merged \o/ (build: bump to version 13.0.95 for the RC3 release https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1606)
08:17 wlb: weston New tag: 13.0.95 https://gitlab.freedesktop.org/wayland/weston/tags/13.0.95
08:36 wlb: wayland.freedesktop.org Merge request !89 opened by Marius Vlad (mvlad) releases: add weston 13.0.95 release https://gitlab.freedesktop.org/wayland/wayland.freedesktop.org/-/merge_requests/89
08:37 wlb: wayland.freedesktop.org/main: Marius Vlad * releases: add weston 13.0.95 release https://gitlab.freedesktop.org/wayland/wayland.freedesktop.org/commit/6d92664ccfe0 releases.html
08:37 wlb: wayland.freedesktop.org Merge request !89 merged \o/ (releases: add weston 13.0.95 release https://gitlab.freedesktop.org/wayland/wayland.freedesktop.org/-/merge_requests/89)
14:43 emersion: kennylevinsen: the website doesn't contain the UTF-8 thing because that commit hasn't been released yet
14:47 davidre: Isn't that kinda an important break, if someone ferried non utf8 in a custom protocol
14:47 kennylevinsen: yeah, that makes sense
14:47 mclasen: kennylevinsen: no idea what you are talking about with differentiating utf8 encoded vs decodable
14:47 mclasen: either it is valid utf8, or it isn't
14:47 mclasen: that is a pretty black-and-white thing
14:48 emersion: davidre: some implementations already killed clients based on UTF-8 encoding
14:48 emersion: and we don't know of any such custom protocol
14:48 kennylevinsen: mclasen: no? any sequence of bytes &0x7F can decode as utf-8, but that doesn't mean it was ever utf-8
14:48 emersion: can always revert it if it makes sense to
14:48 mclasen: there is no difference between "can be decoded as utf8" and "is utf8"
14:48 kennylevinsen: Maybe I'm nitpicking, but it still feels like a bogus validation
14:49 mclasen: if it can be decoded as utf8, it is utf8
14:49 mclasen: there's no secret 'essence of utf8' that you could miss
14:50 kennylevinsen: mclasen: by that definition, /dev/random &0x7F is an UTF-8 string. I only define an utf-8 string as "text that was *encoded* as UTF-8"
14:50 psykose: it would indeed be a utf8 string
14:50 kennylevinsen: and if you want to ensure that "clients are using the right charset", then you care about the latter
14:50 mclasen: whether decoding yields is a meaningful Unicode string is an entirely different question
14:53 kennylevinsen: that's not hwo I read the spec: "the string must be encoded in UTF-8" - if you have to encode in UTF-8, it must yield a meaningful unicode string
14:54 mclasen: which spec is this?
14:54 kennylevinsen: that was just xdg_toplevel:;set_title
14:56 kennylevinsen: (wording aside, I imagine everybody would agree that a client is buggy and not adhering to spec if what it sends does not actually decode as a meaningful unicode string)
14:56 vyivel: did we run out of sheds to color
14:56 kennylevinsen: ("meaningful" in this case meaning "the unicode string the client actually wanted the user to see")
14:56 kennylevinsen: vyivel: it's regarding a MR to add UTF-8 validation to libwayland
14:57 vyivel: i'm aware
14:57 vyivel: you can validate data to match a format, you can't validate the intent, doesn't make it less useful
14:57 mclasen: the spec could be more precise, for sure
14:58 kennylevinsen: My word isn't final anyway, I just don't see any point in having a utf-8 validator in libwayland and would just let it be up to compositors to decide if they want to strip or render mojibake :/
14:58 mclasen: for more most people, "utf-8 encoded" is pretty clear
14:59 kennylevinsen: I wholeheartedly accept that I may at times be pedantic in non-standard ways, and don't always know when that is the case. :)
14:59 mclasen: you could say "utf-8 encoded Unicode string"
15:00 mclasen: but what constitutes a valid unicode string is a good deal less clear than what utf-8 is
15:01 mclasen: and really, its up to the compositor to handle the data safely, and use its best judgement when I send it something that may be valid utf8, but maybe not valid Unicode
15:07 kennylevinsen: I would just extend that and say that the client is required to send utf-8 encoded unicode string as now, and not doing so is broken - but whether the compositor can decide whether it wants to show the garbage it got or fail the client
15:07 soreau: Is this about fixing a client sending '\U000xyz00' as a char in the title through the wayland socket currently?
15:07 jadahl: isn't "garbage" a fluid thing? what was "garbage" when the compositor was written became valid when the client was written
15:08 jadahl: with new code points being added all the time
15:09 kennylevinsen: (garbage referring to whatever the result of rendering the borked text is, mojibake/tofu/whatever inclusive)
15:12 mclasen: jadahl: that was my point. Unicode is fluid
15:12 davidre: Valid utf8 zalgo window titles :)
15:12 mclasen: not saying it is garbage...
15:12 kennylevinsen: I do sometimes find these kinds of discussions where you realize your quite firm mental model of something differs a lot from everyone else quite intriguing. Teaches you a lot about how people see things differently.
15:14 mclasen:hates food metaphors in software, so not going with 'tofu'
15:14 kennylevinsen: hah
15:14 kennylevinsen: do we have another word for the unicode codepoint square?
15:15 mclasen: hexbox is the more descriptive name
15:15 kennylevinsen: that kind of sounds like something you'd use to curse someone, but fair
15:16 kennylevinsen: mojibake is pretty standard for garbage displayed when encoding and decoding charset isn't matched
15:16 davidre:uploaded an image: (20KiB) < https://matrix.org/_matrix/media/v3/download/kde.org/5e90fbe51b443da327509a45fd033a38acffb7811830625954381168640/Screenshot_2024-09-02_05%3A16%3A31.png >
15:17 kennylevinsen: mmmm
15:17 davidre: Looks like our taskamanger needs a bit of clipping :D
15:29 jadahl: davidre: blurring the line of what's garbage and not :P
15:37 DemiMarie: As the one who made the MR: My main concern is that a compositor might pass a non-UTF-8 string to an API that requires UTF-8 and create a security hole. For example, there are many glib APIs that require valid UTF-8. Also, earlier validation is better for security.
15:38 kennylevinsen: DemiMarie: I'm actually slightly concerned that this would be thought to be sufficient for security
15:39 kennylevinsen: e.g., skipping normalization and such
15:41 kennylevinsen: would be good enough for strictly utf-8-parsing related bugs though
15:47 kennylevinsen: I do personally prefer explicitly treating foreign data as untrusted and let checks belong at time of use depending on need, as I feel it reduces risk of misjudging guarantees
16:03 DemiMarie: kennylevinsen: It's not sufficient for security on its own.
16:04 DemiMarie: There are, however, quite a few APIs that require valid UTF-8 or else undefined behavior results.
16:04 vyivel: ^
16:06 DemiMarie: Also, if a protocol requires something, implementations of that protocol should check that something.
16:06 kennylevinsen: glib has its own built-in utf-8 validation function, and in case fo the Rust example from the MR one even has to explicitly go out of your way to call unsafe conversions to make it blindly assume utf-8...
16:07 DemiMarie: Rust is nice and encodes "is this known to be valid UTF-8?" into the type system. Most languages are not so nice.
16:07 kennylevinsen: Our protocols always require much more than we can validate, so I'm not really caught on that aspect. It's just bugs, and some get caught by validation, some get caught by user eyeballs, etc.
16:08 kennylevinsen: DemiMarie: I know, it's just that Rust's str was the specific example in the MR :)
16:08 emersion: undefined behavior if input isn't valid utf8? never heard of such a cursed api
16:09 DemiMarie: emersion: glib has quite a few of them
16:09 DemiMarie: kennylevinsen: Not my example.
16:10 kennylevinsen: I am aware
16:11 DemiMarie: Validating this in libwayland prevents an entire class of vulnerabilities.
16:12 DemiMarie: It also ensures that sending invalid UTF-8 will be detected by every compositor, making "it works on my machine" issues less likely.
16:12 kennylevinsen: I don't believe in duct tape security
16:13 kennylevinsen: detection is true, my only problem there is that it will probably mainly affect and be reproduced by minority users
16:14 DemiMarie: Is that because of incorrectly encoded data?
16:15 DemiMarie: Also, what do you mean by "duct tape security"?
16:17 kennylevinsen: I think the most likely bug here would be "client includes a string from a document that is foreign encoding in its title" - to trip it you need a user to use that client with non-english, non-utf8 content, which is an increasingly rare occurrence.
16:18 DemiMarie: I see.
16:18 DemiMarie: One option would be to replace invalid bytes with U+FFFD REPLACEMENT CHARACTER.
16:19 kennylevinsen: (because utf-8 is widely adopted, not because of non-english of course - e.g., my language requires either UTF-8 or ISO-8859-1 to encode the last 3 letters of its alphabet, so you need not only an ISO-8859-1 document, but one whose title also used those three letters)
16:20 DemiMarie: Such conversion belongs on the client side, though.
16:21 kennylevinsen: yes but that's the kind of bug you're trying to catch
16:21 kennylevinsen: a non-malicious bug would be that it accidentally took that title and used it, byte for byte
16:21 kennylevinsen: but the likelihood of this check catching such flaw is tiny
16:22 DemiMarie: I see
16:23 kennylevinsen: what I mean by duct tape security is trying to cover up broken software by doing a few checks in front of it and hoping that does the trick, vs. protecting the actual usage properly
16:23 DemiMarie: I see
16:24 kennylevinsen: (in my line of work I'm constantly dealing with security products placed in front of broken stuff, which gives a false sense of security by covering a few bugs but not fixing underlying issues)
16:25 kennylevinsen: (the pain of devsecops with corp IT and separate security departments, but that's off-topic)
16:26 kennylevinsen: (Not to say that early checks cannot be good, but a check for the check’s sake is not)
16:37 dottedmag: IMHO "here's a bag of bytes, they are valid utf-8 sequences" is no worse than "here's a bag of bytes, the spec says it's utf-8, knock yourself out validating it", and having multiple checkers in every implementation, all slightly different.
16:41 mclasen: DemiMarie: it will be detected by every compositor using libwayland-server. Wayland is defined by the protocol, not the library...
17:11 Company: you guys need a testsuite that fails if the compositor doesn't reject invalid utf8
17:11 Company: and then you do a webpage that validates all the compositors against it
17:12 Company: and then you let social media make sure that all compositors are valid
18:22 bl4ckb0ne: sounds like a lot of work
18:29 kennylevinsen: Compliance by social media shaming is certainly a very modern approach
18:45 DemiMarie: Company: Or libwayland can do the validation for every compositor.
18:46 Company: that still needs testing
18:48 mclasen: for every compositor using libwayland
18:49 Company: you also need to test the other thing
18:49 Company: that the compositors correctly accept utf-8
18:49 Company: and do the right thing with codepoints that they have no glyphs for and fun stuff like that
18:49 DemiMarie: mclasen: true, but that is still much less work than checking in each and every request & event handler.
18:50 DemiMarie: Company: that is still needed, yes.
18:50 zamundaaa[m]: Demi: fwiw I don't understand the pushback in here, having this check be done by the library seems purely like a good thing to me
18:51 DemiMarie: zamundaaa: thank you.
18:51 DemiMarie: Should libwayland validate enums as well?
18:54 zamundaaa[m]: If they're also validated on the client side, probably
18:55 zamundaaa[m]: Otherwise it's likely better to have the compositor emit the relevant protocol error instead, so the client knows what's going on
19:37 DemiMarie: Does that mean that checks should be symmetric between marshalling and unmarshalling?
19:56 wlb: weston Merge request !1607 opened by Jonas Ådahl (jadahl) xdg-shell: Allow poup grab on non-grabbing popup https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1607 [libweston-desktop]
20:26 DemiMarie: zamundaaa: What if the unmarshalling code could emit a helpful protocol error, such as "Bad enum value X for version Y"?
20:27 zamundaaa[m]: Yeah, that should work
20:34 jadahl: there were at times ideas to allow extending enums in other protocol extensions. this idea in particular (early xdg-shell) was about storing them in an array, but doesn't seem unthinkable that there could be protocols that do this to plain enums too. error:ing out clients doing this on an IPC level would regress that
21:10 Company: jadahl: I would assume you'd use an int in that case
21:11 Company: xx-color kinda is in that situation where it has named vs manually specified primaries
22:04 emersion: wl_shm allows values outside of enum