• Re: Unicode...

    From James Kuyper@3:633/10 to All on Wednesday, December 31, 2025 18:04:59
    On 2025-12-24 01:17, Lawrence D?Oliveiro wrote:
    On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

    Sorry for the delay in my response. We spent the last week in Orlando,
    without access to usenet.

    Could you identify which document guarantees that every Unicode locale
    contains "UTF-8"?

    How else would it work? ...

    I think you're missing the point of my question. Take a look at the list
    from my message - none of the locales on my system have names that
    contain "UTF-8". Most of them contain "utf8" instead, and many have neither.

    As I mentioned in my last message, there are also many different
    possible encodings for Unicode. MS has used both UCS-2 and UTF-16.
    Chinese government systems use GB18030. Would you expect locales using
    those encodings to have names that contained "UTF-8"?


    ...Bytes have to be 8-bit.

    Incorrect - the only requirement is that CHAR_BIT >= 8. There are real
    systems where CHAR_BIT == 16. There have been real machines where
    CHAR_BIT==9 would have been the most reasonable option.

    I'm not sure why you mentioned that, however. Why do you think it's
    relevant? UCS-2 and UTF-16 have no problem existing on machines with
    8-bit bytes; they just occupy two such bytes. UTF-32 would occupy 4 of them.


    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From Lawrence D?Oliveiro@3:633/10 to All on Wednesday, December 31, 2025 23:11:26
    On Wed, 31 Dec 2025 18:04:59 -0500, James Kuyper wrote:

    On 2025-12-24 01:17, Lawrence D?Oliveiro wrote:

    ...Bytes have to be 8-bit.

    Incorrect - the only requirement is that CHAR_BIT >= 8. There are
    real systems where CHAR_BIT == 16. There have been real machines
    where CHAR_BIT==9 would have been the most reasonable option.

    Those are sizes of ?characters?, not ?bytes? though, are they.

    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)
  • From James Kuyper@3:633/10 to All on Wednesday, December 31, 2025 18:36:35
    On 2025-12-31 18:11, Lawrence D?Oliveiro wrote:
    On Wed, 31 Dec 2025 18:04:59 -0500, James Kuyper wrote:

    On 2025-12-24 01:17, Lawrence D?Oliveiro wrote:

    ...Bytes have to be 8-bit.

    Incorrect - the only requirement is that CHAR_BIT >= 8. There are
    real systems where CHAR_BIT == 16. There have been real machines
    where CHAR_BIT==9 would have been the most reasonable option.

    Those are sizes of ?characters?, not ?bytes? though, are they.

    No. In the C standard, a byte is a unit of measurement for memory. The
    size of a byte is CHAR_BIT bits, which is implementation-defined. A
    character is something that you can store in memory. A byte is required
    to be large enough to store any member of the basic character set of the execution environment. Since the basic character set need only contain
    96 characters, and CHAR_BIT is required to be >=8, that's not an onerous requirement.



    --- PyGate Linux v1.5.2
    * Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)