Forum: Jacob's Hideout BBS

Re: this girl calls c ugly

From James Kuyper@3:633/10 to All on Friday, May 29, 2026 15:57:07

On 2026-05-29 12:10, Janis Papanagnou wrote:

On 2026-05-29 13:19, Bart wrote:

...

* What is the order here: a ^ b | c

(a^b)|c

Personally I don't think that there's a prevalent definition
how these should be ordered.

I'm not sure what you mean by "prevalent definition". Ordinarily, I'd
expect the C standard to qualify - it definitely defines the order, and
the very purpose of a language standard is to prevail over non-standard alternatives. However, I'm sure you're aware of the C standard, and made
that comment anyway, so I presume you mean something different by it.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Friday, May 29, 2026 22:00:30

On 2026-05-29 20:09, Bart wrote:

You actually said this:

You continue your trollish stance to cherry-pick words without
understanding or trying to understand what's been expressed.

The insight appears to me that you're taking communication in a
similar way as you "design" your languages; focusing on personal
*syntax* preferences instead of the more important *semantics*.

Despite we're talking in your native language (and not mine) you
obvious completely miss or deliberately ignore that there's a
difference between "it makes perfectly sense" and "it's perfect".
(I said the former, you put the latter in my mouth. And to not
get identified as a liar you're squirming with such moves. Gee!)

Or are again just confused about the difference, and despite you
have already been advised to quote what I think can neither be
misrepresented nor misinterpreted (since it doesn't contain common
word patterns that obviously confuse you)
>> What I would say is that operator precedences are in "C"
>> "sensibly and appropriately defined, modulo the bit-ops".
you're still playing your stupid game; you ignored that. I suggest
to try to map this statement to either of the above two statements,
the one I said and the one you (wrongly) attributed, and see which
one fits. (Hint: the former.)

[...]

Dan Cross:

Programmers _should_ absolutely learn the rules.� But in C,
there are many of them, and some of them are deceptively subtle.

JP:

We agreed.

Bart, you are incapable of understanding semantics and associating
context.

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Friday, May 29, 2026 22:03:04

On 2026-05-29 21:49, Bart wrote:

On 29/05/2026 20:28, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

or would you continue to post contrived examples that make it appear
as confusing as possible?

Examples are examples. Do you want me to post one that didn't illustrate
an issue? It necessaily has to be contrived.

Bart, what do you want?

Today I was just replying to this post today that I found annoying:

JP:

Unsurprisingly; since exactly *that* was the obvious (and single)
issue with C's precedence definitions.

That is, the suggestion, made several times by JP, that there is only
one thing wrong.

The C-precedence rules have one issue. The rest is a sensible choice.

(The C-language has more issue, but that was not the topic here.)

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Friday, May 29, 2026 15:16:51

On 5/29/2026 6:22 AM, David Brown wrote:

On 29/05/2026 12:20, BGB wrote:

On 5/29/2026 2:52 AM, Janis Papanagnou wrote:

On 2026-05-28 11:57, BGB wrote:

On 5/28/2026 2:18 AM, Janis Papanagnou wrote:

On 2026-05-28 01:49, BGB wrote:

[...]

But, not really an "easy" way to avoid bloat, other than to write
code specifically for what cases are relevant; while also avoiding
needless duplication and copy paste (where, overuse of copy/paste
can also lead to bloat; along with turning the code into an ugly mess). >>>

Hmm.. - as said, the during very early days there were issues; I
recall on one platform duplication of template code in more that
one source unit. And/or some environmental hacks (of the compiler)
to deposit template code for linking. In the later days I've not
seen such immature things anymore.

Possibly, a lot could depend on how one is counting things as well.

In a lot of cases when using GCC, I end up using:
�� -ffunction-sections -fdata-sections -Wl,-gc-sections

On many targets, "-fdata-sections" can lead to noticeably larger and
slower code because it effectively eliminates section anchor
optimisations.� It does not negatively affect x86 AFAICS, because x86
does not use section anchors.

<https://godbolt.org/z/zeoq41Y7d>

With -fsection-anchors (enabled with optimisation on targets that
support it - generally RISCy load/store architectures), program-lifetime variables are kept together in a lump (as though they were in a struct)
and often addressed by a pointer to that pretend struct.� Thus if a
function accesses two variables "a" and "b", instead of having to load
the addresses of each of "a" and "b" into separate registers, it loads
an "anchor" into one register and accesses the variables with reg+offset addressing.

I've seen "-fdata-sections" used regularly in embedded systems - it is almost always a bad idea.

("-ffunction-sections" is often very helpful to reduce code image size,
so keep that one.)

Both seem to help on x86, x86-64, and also on RISC-V, at making GCC's
output at least sorta space-comparable to my own compilers.

The merit of "-fdata-sections" is mostly that it eliminates unused
global variables; whereas "-ffunction-sections" eliminates unreachable functions.

Neither is needed with my own compiler, which compiles things in a way
such that it eliminates anything that is unreachable.

Both posed an issue initially when porting ROTT, because in some cases
it relied on the ability to go out-of-bounds for one array to access
data in another array. I ended up reworking some of these cases though
to use a single larger array.

Have noted though that GCC targeting RISC-V still tends to produce
fairly large binaries even with "-Os". Its code for the basic subset
(RV64G) does tend to be a little faster than what BGBCC generates, but
also a fair bit more bulky. Though, the final ELF file ends up bigger
still, as a significant chunk of the file ends up needing to hold ELF
related metadata (comparably, PE/COFF can end up much leaner here).

Though, on the other side, with modern MSVC, despite the relative
leanness of the PE/COFF format, MSVC tends to produce binaries with much larger ".text" sections.

This issue was a lot less with VS2008 though, which tended to generate less-bloated binaries (with code-size more competitive with GCC).

Also in modern MSVC, there is little distinction between "/O1" and
"/Os", both being more space-efficient than "/O2" (though, "/O2" is
usually faster, but also more prone to misguided attempts at auto-vectorization).

Because otherwise it likes wasting code space by retaining unreachable
functions.

Using "static inline" functions also carries a risk because the can
end up duplicated across multiple translation units, or in multiple
places within the same translation unit, so is best used sparingly.

Usually you would only use static inline functions for small functions
in headers, where they are a better choice than function-like macros. In
a C file, there is rarely much point in declaring a function "inline" - optimising compilers will inline or not as they see fit, without regard
for "inline".� "static" on its own is, of course, always a good idea for functions or data that is not "exported" by the current translation
unit, and will often make generated code smaller.

How much or how little duplication of code there will be within one translation unit will depend on compiler settings and the rest of the
code, and not on whether or not you use "inline".

OK.

But, yeah, small functions are usually better than macros in at least
that the compiler can avoid duplicating them (or maybe merge them
between translation units when it notices that the contents are identical).

As for assembler:
Main reasons not to use assembler for everything:
�� Needlessly verbose;
�� Non-portable.

However, often one can still end up writing C code that looks like
assembler sometimes, as this is often an effective way to optimize
things.

Say, for example:
�� v0=cs[0];
�� v2=cs[2];
�� v1=cs[1];
�� v3=vs[3];
�� ct[0]=v0;
�� ct[2]=v2;
�� ct[1]=v1;
�� ct[3]=v3;
Vs:
�� ct[0]=cs[0];
�� ct[1]=cs[1];
�� ct[2]=cs[2];
�� ct[3]=cs[3];

Because the extra variables can avoid help sidestep latency from the
load instructions and staggering stores can avoid penalties of two
adjacent stores to the same cache-line in some cache architectures.
Where, in the latter case, the compiler may fail to as effectively
avoid the load-latency or realize the need to stagger the stores for
best performance, ...

That might be the case for a very simplistic compiler.� With an
optimising compiler, these extra variables will quickly be eliminated.
If the compiler has a good scheduling model of the device, it do
whatever instruction scheduling works best for that processor.� If the
model is not good enough, it will be suboptimal.� I would not, however, expect any different in the generated code for the two code snippets.

Sometimes this kind of "manual optimisation" is helpful when you have to
try to get efficient results from a weak compiler, however.

Possibly, but this sort of thing can help with both BGBCC and with MSVC IME.

While BGBCC does use a shuffle-to-reorder instructions things, it may
fail to do so in some cases:
If the instructions end up mapped to the same CPU register;
If its heuristics can't prove non-alias.

Though, in the simple example given, it could (probably) turn the latter
into the former, but "better" to write code such that things are in
closer to the optimal order by default.

Note that using different variables with overlapping scopes reduces the likelihood of the compiler assigning both to the same register, which is
a much more real risk if relying on implicit temporaries (whose lifetime
only exists within a single expression).

But, in my case, a lot of this comes down to trying to tweak the
compilers' internal register allocation heuristics for best results (and
the tight balance between how many registers to save/restore for the
function, vs avoiding assigning short-lived temporaries to the same
register too quickly and hindering the instruction-scheduling).

Arguably, could make sense to instead do the reordering at the 3AC
level, rather than reordering at the level of ISA instructions, but this
is just sorta how I ended up doing things (and one can know the
effective timing latency of a CPU instruction a lot more easily than a
3AC op).

Usual strategy is to try to limit how much code is written, and also
to avoid doing things in ways that result in too much code, or too
much cruft.

Best to avoid both copy paste when reasonable, and sticking anything
non-trivial in macros.

We avoided macros if possible.

They are de-facto for constants and similar, but for longer stuff is
better avoided.

Macros are rarely the best way to define constants.� They are needed if
you are using the constants for pre-processor stuff like conditional compilation.� But generally you get clearer code, better typing, and potentially several other benefits from using alternative choices like "enum" (even for stand-alone integer constants), "static const"
variables, and in C23, "constexpr" variables.� There's no doubt that a
lot of code /does/ use macros for constants, but I view it as a relic of
the past rather than good coding practice.

They are traditional...

Like:
static const double M_PI = 3.14159265358979;

Could also make sense, but people don't do usually this, they usually
use macros...

In BGBCC, both can be handled as constants, just they end up being
handled at different stages:
#define: Constant ends up inlined in the preprocessor/parse stage;
const: Constant shows up in the "reducer" (which evaluates constant expressions).

Where, as noted, BGBCC's pipeline looks kinda like:
Toplevel:
Ingest each named source file;

Then, in the C case, per translation unit:
Preprocess;
Parse;
Frontend Compile + Reduce;
This does an AST walk, but at each stage,
invokes the reducer to see if it can perform AST level rewrites;
Reducer can also implement some edge-case features.
So, is mostly necessary, vs an optional optimization thing.
Emits output as a Stack IL.
May be output to a file, or used as input to next stage.
The stack IL partly resembles a mix of JVM and .NET bytecode.
The IL ops themselves operate more like in .NET bytecode.
This serves the role of static libraries and object files.
For a static library, all the stack IL gets blobbed together.
So, every translation unit ends up effectively appended on.

Middle Stage (processes IL Blobs):
Processes Stack IL, translates to 3AC (loosely SSA form);
Builds a big table of all global declarations, etc.

Backend:
Walks call-graph to determine dependencies;
Unreachable functions/globals/etc are marked as culled.
Ranks all the functions and variables by priority;
Sorts them into roughly priority order;
Then does shuffling to try to density-optimize globals;
Swaps globals when doing so would allow more memory density.
May also apply random shuffling and clustering heuristics.
Then, compiles each function:
Figure out stack-frame layout,
how many registers to reserve,
etc.
Emit machine code for 3AC ops;
Try to shuffle instructions to improve instruction scheduling;
Or, if a variable:
Figure out whether it goes in ".data" or ".bss"
If initialized, deal with initialization stuff;
...
Or, an ASM Blob:
Assemble it.
Or, Ingests contents that go into ".rsrc" section;
May involve image and audio converters, etc;
BGBCC uses different resource sections from Windows though.
...
Output:
Gets is input as a set of sections, symbols, and relocs;
Figures out layout within the output image (eg, PE/COFF);
Figures out how much space it needs for base relocs, etc.
Builds up a table of "initial base-relocs"
Splats the sections into the image buffer;
Applies relevant relocs;
Sorts base relocs by RVA;
Generates actual ".reloc" section contents.
Fill in PE/COFF headers and similar;
If applicable, LZ4 compress the image.
I tend to store EXE's in LZ4 compressed form,
the image is decompressed during load.
This format leaves the initial PE/COFF headers uncompressed.
Need the headers to figure out where to load the image.
Else would need a temporary buffer to decompress into.

Typical loader process:
Look at headers;
Figure out where to load to, etc;
Read in (or decompress) image contents;
Apply base relocs;
Pull in any DLLs, etc;
Go.

The LZ4 compression is mostly because:
Loader is often IO bound;
May save memory in some cases;
LZ4 decompression is faster than more IO;
It also seems to be effective against program binaries (*).

*: I have my RP2 format, which generally does better for general purpose
data compression, but slightly worse for compressing program binaries,
so LZ4 has mostly won here. Also generally don't want a "stronger"
compressor, like Deflate, both because an Inflater is a much bigger
chunk of code, and also much slower than LZ4.

Can note that BGBCC also mostly takes over the role of the "resource
compiler" as well, so can process resources. These are generally listed
as a text file of entries to import, giving an internal "lumpname",
external filename, and a tag to specify which file conversions to apply.

I am using a vert different resource section type than Windows though,
in that I just sorta replaced it with a modified version of the Quake
WAD2 format (not to be confused with PAK, where PAK serves a different
role). Note that the WAD2 directory in this case uses RVA's and not
WAD-file offsets (so, effectively, it is integrated into the PE/COFF
image, not just a WAD file that was shoved in).

Generally, one can access lumps from C land with declarations like:
extern unsigned char __rsrc_lumpname[];

Typically, formats used internally are things like BMP and WAV. Though,
when using BMP, it is typically 16 color or 256 color to avoid wasting
space. Sometimes monochrome or 4 color. One downside of BMP is that for
a full 256-color palette it needs 1K of memory just for the palette,
tempting to consider a non-standard variant that uses RGB555 for the
palette (thus reducing it to 512 bytes). For small images it is often
smaller to store them as 16-bit hi-color to avoid the space penalty of
the color palette.

There are already non-standard BMP variants though, like BMP with LZ compression. A lot depends on what is needed for a particular use-case.

For WAV typical formats are 2 or 4 bit ADPCM at 8/11/16 kHz.
2 bit ADPCM: 16/22/32 kbps.
4 bit ADPCM: 32/44/64 kbps.

Have found encoder-side tricks to make ADPCM more compressible with LZ4. Basically, it tries to do a reverse LZ search when encoding and encodes
audio following patterns when the pattern would be a "close enough" match.

had also experimented before with using some trickery involving FIR
filters and lookup tables to improve perceptual quality of 8kHz/2b ADPCM
to try to make it sound "less like total crap". But, this requires
additional metadata and a more complex process to decode (and to get
best results with this will result in worse audio quality if the audio
is just naively decoded as 8kHz/2b ADPCM without the filters).

But, yeah, with these tricks can reduce the effective bitrate (when LZ4 compressed) down to around 8-12 kbps. Note that while entropy coding
could help more, it is modest, and the most effective strategy (range
coding) being mostly too slow to be worthwhile.

Also, of the things I have tested, ADPCM was still the front runner for "actually passable" audio quality in this domain (to me, some of the
modern cellphone codecs sound like unintelligible broken garbage, and
require much more complex decoders, not worth the bother).

only thing I have found that gets much lower bitrate is, say:
One divides the audio up into chunks of 64 samples (1/125 second for 8kHz); Pick the top 4 square-waves from between 1 and 4 kHz;
Encode the phase and intensity of each square wave.

Typically, the strategy was to break it into 4 half-octaves and pick the highest peak in each half-octave; and then totally ignore everything
below 1kHz. If the frequency and amplitude are encoded in around 16 bits
each, this achieves an effective bitrate of 8 kbps.

Though, another strategy was 8 quarter octaves and pick the top 4 loudest.

But, audio quality is worse than 2b ADPCM.

Can push it to 4kbps by only encoding the top 2 waveforms.
But, then speech sounds robotic and borderline unintelligible.
Note that dropping to 62.5 Hz sampling also makes speech unintelligible.

While traditionally, this used sine-waves (sinewave synthesis) I had
better results with square waves (simpler/cheaper, also better results audio-wise). Computational cost for decoding is fairly modest (mostly
some "for()" loops and fixed-point arithmetic).

Though, effective bitrate may be lower, because it seems that speech
encoded this way is often LZ compressible as well (and can be helped
along with pattern matching tricks).

...

But, yeah, generally want images and audio to be fairly compact when
shoving them inside an EXE or DLL, for more general asset data,
generally better to use an external file.

I had often used a custom "WAD4" format here, which is kinda like "WAD2
but with longer names and a directory tree". It then exists as a lower
cost option to the ZIP format (while semi-popular, ZIP is a
high-overhead format to be used this way).

Also can use WAD4 as a sort of VFS packaging.

But, things can be considered in relative terms:
Like, C++ may carry various penalties vs C.

I don't find C++ carries noticeably penalties compared to C, for my
embedded work.� But I do disable exceptions and RTTI - exceptions may
have very little run-time time overhead, but the unwind tables can be significant when code size is important in small systems.

Yes, that is the main thing.
They carry zero performance penalty in practice;
But, have a non-zero penalty for image size.

Not enough to be a deal-breaker towards using them if they are used, but enough that one wants them disabled if not used...

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Friday, May 29, 2026 22:17:54

On 2026-05-29 20:18, Bart wrote:

On 29/05/2026 17:10, Janis Papanagnou wrote:

On 2026-05-29 13:19, Bart wrote:

Further, here: 'a * b + c' the multplication is done first, but here:

�� a� *= b += c

It is done second.

You understand that '=', '*=', and '*' are three different things,
don't you?

I hope you understand that '=' should have low precedence. And that
it makes sense to evaluate that from right to left. Do you follow?

"C" obviously decided to have them all, =, +=, *=, etc. in a single
group, and thus evaluated from right to left. - Easy rule, easy to
memorize. - And that is actually what you are demanding from many
other operators, to put them in a single group. - But here you are
complaining about it!

Of course the rules for those combined (sort of two-address) operators
could have been defined differently, in an own group with other rules.
(Algol 68 had done that, actually; the semantics are like "apply these
operations from left to right, indicating an incremental modification
of the underlying value.)

For those who don't know, the behaviour of this C code:

Those you have read my post already know that, since that was
what I explained as a possible alternative rule for these sorts
of operators. (It's still quoted above.) Folks here are capable
of understanding that without your echoing post.

�� a += b += c += d

is very different from the equivalent Algol68:

�� a +:= b +:= c +:= d

This only modifies 'a'.

I explained already in my post that there's a difference. (Are you
so proud of having understood that that you want to repeat it? -
"Look Ma, no hands!")

(And neither is "perfect"; both are sensible choices. - Not sure
you understand that.)

[...]

You seem to like making it 100% about me. How about stopping making it always so personal.

What you expose here (about your personality) is nothing new, and
it's about your personality; you obviously aren't really interested
to know or understand or learn the facts.

You had asked, even insisted for answers to your samples because
you obviously weren't intellectually capable of understanding the
topic, and all you posted is this reply! - Pathetic!

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Friday, May 29, 2026 22:34:08

On 2026-05-29 21:57, James Kuyper wrote:

On 2026-05-29 12:10, Janis Papanagnou wrote:

On 2026-05-29 13:19, Bart wrote:

...

* What is the order here: a ^ b | c

(a^b)|c

Personally I don't think that there's a prevalent definition
how these should be ordered.

I'm not sure what you mean by "prevalent definition". Ordinarily, I'd
expect the C standard to qualify - it definitely defines the order, and
the very purpose of a language standard is to prevail over non-standard alternatives. However, I'm sure you're aware of the C standard, and made
that comment anyway, so I presume you mean something different by it.

It was about Bart's confusion; he seems to be looking for something
"naturally" understandable, like the common * and / have precedence
over + and - , which is known by most non-IT people from basic math.

There is no such commonly know ("prevalent") definition for the bit
operations. So we need to look that up in appropriate documents to
get to know their evaluation order. - That was what I intended to
express.

Sorry if that was unclear, and thanks for asking to clarify that.

How "C" defines their precedence can of course be read in any book
about "C", there's not even a "C Standard" document necessary.

In addition I gave an explanation why they decided to have these
operators in three separated precedence groups, and hinted on what
was the rationale for the same order as the boolean && and || .

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Friday, May 29, 2026 21:47:40

On 29/05/2026 21:17, Janis Papanagnou wrote:

On 2026-05-29 20:18, Bart wrote:

(Are you
so proud of having understood that that you want to repeat it? -
"Look Ma, no hands!")

What you expose here (about your personality) is nothing new, and
it's about your personality; you obviously aren't really interested
to know or understand or learn the facts.

you obviously weren't intellectually capable of understanding the
topic, and all you posted is this reply!

- Pathetic!

It doesn't look like a civil discussion is possible here, so long as you
keep up the personal insults.

I thank you for those replies but there doesn't seem any point in taking
this further.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, May 29, 2026 13:56:35

Bart <bc@freeuk.com> writes:

On 29/05/2026 20:28, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
or would you continue to post contrived examples that make it appear
as confusing as possible?

Examples are examples. Do you want me to post one that didn't
illustrate an issue? It necessaily has to be contrived.

Bart, what do you want?

Today I was just replying to this post today that I found annoying:

JP:

Unsurprisingly; since exactly *that* was the obvious (and single)
issue with C's precedence definitions.

That is, the suggestion, made several times by JP, that there is only
one thing wrong.

I note your refusal to address most of what I wrote.

Upthread, you asked a question:

And then the point becomes, if you always add the parentheses, what
was the point of having that particular precedence level?

You've made it clear that you were never interested in an answer.

Please do not waste everyone's time by asking questions when you're
not interested in the answers. Please do not assume that anyone
can tell whether one of your questions is sincere, figurative,
or rhetorical.

Bart, what do you want?

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Friday, May 29, 2026 22:14:32

On 29/05/2026 21:00, Janis Papanagnou wrote:

On 2026-05-29 20:09, Bart wrote:

You actually said this:

You continue your trollish stance to cherry-pick words without
understanding or trying to understand what's been expressed.

The insight appears to me that you're taking communication in a
similar way as you "design" your languages; focusing on personal
*syntax* preferences instead of the more important *semantics*.

Despite we're talking in your native language

English is my second language, technically.

(and not mine) you
obvious completely miss or deliberately ignore that there's a
difference between "it makes perfectly sense" and "it's perfect".
(I said the former, you put the latter in my mouth.

It's paraphrasing. Here is your quote again:

(The point is that - with the exception of & ^ | - the ranking
makes perfect[ly] sense and should be easily usable without doubt
by a concept-knowing programmer."

I've isolated the 'ly' as that is incorrect grammar and have ignored it.

I don't know what impression someone can take from this other than you
think it's all dandy apart from that one exception.

So, what am I missing? Did you mean the rest is all fine, considering
this is C, but it is not perfect?

In that case, what /would/ be perfect in your view? Assume a fantasy
language where anything is possible.

� >> What I would say is that operator precedences are in "C"
� >> "sensibly and appropriately defined, modulo the bit-ops".
you're still playing your stupid game; you ignored that. I suggest
to try to map this statement to either of the above two statements,
the one I said and the one you (wrongly) attributed, and see which
one fits. (Hint: the former.)

OK, so why don't you list all the things you think are amiss with C
operator precedences. Apart from that exception.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Friday, May 29, 2026 22:54:08

On 29/05/2026 21:56, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 29/05/2026 20:28, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
or would you continue to post contrived examples that make it appear
as confusing as possible?

Examples are examples. Do you want me to post one that didn't
illustrate an issue? It necessaily has to be contrived.

Bart, what do you want?

Today I was just replying to this post today that I found annoying:

JP:

Unsurprisingly; since exactly *that* was the obvious (and single)
issue with C's precedence definitions.

That is, the suggestion, made several times by JP, that there is only
one thing wrong.

I note your refusal to address most of what I wrote.

Upthread, you asked a question:

And then the point becomes, if you always add the parentheses, what
was the point of having that particular precedence level?

You've made it clear that you were never interested in an answer.

You said this:

"You're asking why C is designed the way it is. We could waste a
great deal of time and effort answering that for you. There are
numerous documents about the design and history of C, and of
its ancestor languages. I could provide you with links."

Actually I'm not asking why C is like that. We're already there.

I'm saying that there is no value in those extra levels, some people
think is, and I'm arging about that. I was replying to tTh.

As for my question, what /is/ the point? I'm still waiting!

Of course, I want the answer to be that there isn't any point if
parentheses will be used anyway.

Bart, what do you want?

What answer do you want from me? As I said it was a reply to JP. You
didn't need to step it.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, May 29, 2026 15:52:14

Bart <bc@freeuk.com> writes:

On 29/05/2026 21:56, Keith Thompson wrote:

[...]

I note your refusal to address most of what I wrote.

Upthread, you asked a question:

And then the point becomes, if you always add the
parentheses, what was the point of having that particular
precedence level?

You've made it clear that you were never interested in an answer.

You said this:

"You're asking why C is designed the way it is. We could waste a
great deal of time and effort answering that for you. There are
numerous documents about the design and history of C, and of
its ancestor languages. I could provide you with links."

Actually I'm not asking why C is like that. We're already there.

Then your question was unclear. The only reasonable interpretation
I could see for your question, quoted above, is that you wanted
to know why Dennis Ritchie chose the specific precedence rules
that he chose when he was designing the C language in the 1970s.
(The precedence rules have been stable since then.)

What did you mean to ask? Was your question meant to be rhetorical?
Did you just mean to let us know that you don't like C's precedence
rules? I think we all knew that.

[...]

As for my question, what /is/ the point? I'm still waiting!

I see that (what *is* the point) as a different question from what
you wrote earlier (what *was* the point).

The rules are what they are.

I honestly don't have any strong opinions about what the rules
*should* be.

Of course, I want the answer to be that there isn't any point if
parentheses will be used anyway.

Parentheses are not always used. Some programmers know the
precedence rules well enough, and expect their readers to know them
well enough, that they don't need to add parentheses. I don't bother
with parentheses in `a = b + c` or `a + b * c`. Others might not
bother with parentheses in more obscure cases where I would use them.

C compilers must implement the rules as specified in the standard.
Future editions of the standard are unlikely to reorder the
precedence rules, since that would quietly break existing code.

C programmers may or may not choose to remember and/or take advantage
of all the precedence rules. I haven't memorized all of them myself.

[snip]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Friday, May 29, 2026 23:18:42

On Fri, 29 May 2026 12:19:04 +0100, Bart wrote:

* Why do bitwise & | ^ need their own level anyway

So that you can do shifting and masking with minimal parentheses.

* Why do << >> have their own level anyway

So that shift expressions can use common arithmetic operators with
minimal parentheses.

Further, here: 'a * b + c' the multplication is done first, but
here:

a *= b += c

It is done second.

That kind of thing is disallowed in Python, for some reason.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Friday, May 29, 2026 23:20:49

On Fri, 29 May 2026 08:09:53 +0200, Bonita Montero wrote:

There's no language where the users are so detail focussed and open
to new features [than C++].

But they still don?t have ?try-finally?.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Saturday, May 30, 2026 01:26:47

On 30/05/2026 00:18, Lawrence D?Oliveiro wrote:

On Fri, 29 May 2026 12:19:04 +0100, Bart wrote:

* Why do bitwise & | ^ need their own level anyway

So that you can do shifting and masking with minimal parentheses.

Can you give examples?

Because you can do 'a << b & c' without << >> needing their own private
level; it only needs to be lower than bitwise ops.

For example they could have the same level as * and / as they
essentially do the same thing.

* Why do << >> have their own level anyway

So that shift expressions can use common arithmetic operators with
minimal parentheses.

Again, examples?

Further, here: 'a * b + c' the multplication is done first, but
here:

a *= b += c

It is done second.

That kind of thing is disallowed in Python, for some reason.

I disallow it too (in my stuff). It's too confusing, no matter that is
100% unambiguous according to some arcane language rules.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Friday, May 29, 2026 20:31:50

On 2026-05-29 18:52, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

...

Of course, I want the answer to be that there isn't any point if
parentheses will be used anyway.

The answer, of course, is that the condition of your "if" clause is not
true. In the overwhelming majority of the cases, people do not use
parentheses to clarify the order of evaluation that is guaranteed by C's grammar rules. They only use them in the cases where they feel that
there's a significant chance of confusion. Of course, that depends upon
your audience. If I was required to write code in such a way that you
would have trouble misunderstanding it, I'd write

a = m*x + b;

as

a = ((m*x)+b);

I internalized C's grammar rules a long time ago (which causes problems
on the rare occasions when they've changed them). The main exception are
the bit-wise operators which are well known as having the wrong
precedence - but I've seldom needed to use those operators.
As a result, I seldom have any confusion as to the order of evaluation,
which makes it very hard for me to realize that it might be a good idea
to put in some redundant parentheses to clarify that order for other
people. That means that some of my code is probably more cryptic than it
should be.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Saturday, May 30, 2026 02:03:54

On 30/05/2026 01:31, James Kuyper wrote:

On 2026-05-29 18:52, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

...

Of course, I want the answer to be that there isn't any point if
parentheses will be used anyway.

The answer, of course, is that the condition of your "if" clause is not
true. In the overwhelming majority of the cases, people do not use parentheses to clarify the order of evaluation that is guaranteed by C's grammar rules. They only use them in the cases where they feel that
there's a significant chance of confusion.

Those are the cases we're talking about! That is:

<< >> & | ^

Maybe add == != and < <= >= > is someone wants to take advantage of
their different levels, but I guess 99% wouldn't even know about what.

Most of the rest, there tends to be agreement across languages:

school arithmetic group - comparisons - logical and/or

I haven't included ?: as that's too weird.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, May 29, 2026 19:02:36

Bart <bc@freeuk.com> writes:

On 30/05/2026 01:31, James Kuyper wrote:

On 2026-05-29 18:52, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

...

Of course, I want the answer to be that there isn't any point if
parentheses will be used anyway.

The answer, of course, is that the condition of your "if" clause is
not true. In the overwhelming majority of the cases, people do not
use parentheses to clarify the order of evaluation that is guaranteed
by C's grammar rules. They only use them in the cases where they feel
that there's a significant chance of confusion.

Those are the cases we're talking about! That is:

<< >> & | ^

Maybe add == != and < <= >= > is someone wants to take advantage of
their different levels, but I guess 99% wouldn't even know about what.

Most of the rest, there tends to be agreement across languages:

school arithmetic group - comparisons - logical and/or

I haven't included ?: as that's too weird.

So what is your question? I had thought that you meant to ask why
Ritchie defined the precedences that way, but apparently that's
not what you meant.

Do you even have a question? Is there anything anyone could tell
you that you don't think you already know?

If you have a question, can you restate it in unambiguous terms?
If not, what are we talking about?

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Saturday, May 30, 2026 03:49:45

On Fri, 29 May 2026 05:20:20 -0500, BGB wrote:

To be excluded from being syntactic sugar, it needs to be something
that is not generally possible to express within the base language.

So, for example: Things like operator overloading or classes are
syntactic sugar IMO, as what they do can be expressed in C, even if
a lot less pretty (or far from an idiomatic style).

I would not consider exceptions or RTTI as syntactic sugar, because
these involve things that do not map to native C.

But surely *anything* that ?is not generally possible to express
within the base language? woud ?not map to native C?.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Saturday, May 30, 2026 04:25:31

On Sat, 30 May 2026 01:26:47 +0100, Bart wrote:

On 30/05/2026 00:18, Lawrence D?Oliveiro wrote:

On Fri, 29 May 2026 12:19:04 +0100, Bart wrote:

* Why do bitwise & | ^ need their own level anyway

So that you can do shifting and masking with minimal parentheses.

Can you give examples?

You haven?t done much bit manipulation, have you?

Extracting RGB components from a pixel:

const unsigned int
r = pixel >> 16 & 255,
g = pixel >> 8 & 255,
b = pixel & 255;

Combining RGBA components into a pixel:

colors[i] =
channel[0] << 24
|
channel[1] << 16
|
channel[2] << 8
|
channel[3];

* Why do << >> have their own level anyway

So that shift expressions can use common arithmetic operators with
minimal parentheses.

Again, examples?

From the same code module, putting together a subpicture image
consisting of 2 bits per pixel:

pixbuf[bufpixels / 4] |= histogram[histindex].index << bufpixels % 4 * 2;

<https://bitbucket.org/ldo17/dvd_menu_animator/src/master/spuhelper.c>

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Saturday, May 30, 2026 01:04:14

On 5/29/2026 7:58 AM, Bonita Montero wrote:

Am 29.05.2026 um 11:15 schrieb BGB:

Like, if one doesn't care that the compiler takes a long time to run
and the EXE is needlessly large, maybe OK, not great if one does care...

Binary size doesn't matter with Windows.

On a typical modern desktop PC...

Does still get annoying if it is needlessly large for no good reason.

Even if, yeah, modern PC will not care much about loading a 50 or 100MB
EXE file...

Having to spend minutes or more waiting for the compiler would
seriously hurt momentum for many tasks.

Use C++20 modules and parallel builds.

Possibly, I have my reasons, and not all of my current development is
limited to PC class systems.

Say, for example, if the Boot ROM requires keeping everything under 32K.

C++ was designed for large scale program development.
With 32K-systems you can stick with C.

There are intermediate options, where ones' RAM is measured in MB.

Or, basically, say imagine writing software on something where CPU
speeds and RAM sizes are basically similar to what things were like in
the 1990s.

Comparably, a desktop PC is much faster, and with almost limitless RAM.

Well, and one thing I am often messing with:
Well, I am using a PE/COFF variant...
But, the OS is not on Windows, and the ISA is not x86 based, ...
Still has EXE's and DLL's though.

Can't use C++ there, because no (native) C++ compiler exists.
Well, except if using GCC to generate RV64G; can run RV64G on it;
But, RV64G's performance is lacking, and the ELF files are bloated.
Kinda sucks when a significant part of the binary is just metadata.

Then again, the PE variant is non-standard:
No MZ stub;
LZ4 compression;
Mostly using 64-byte section alignment;
And a FileOffset==RVA constraint, ...
Various structures have been tweaked;
...

Well, and the format is itself an offshoot of a WinCE variant of the
format rather than from mainline Windows. Well, imagine an OS that sort
of took design inspiration both from WinCE and the Unix family OS's.

And ended up with a CLI experience kinda similar to Cygwin. And a very
crap attempt at making a GUI (that mostly just launches with a terminal
window that can be used to start other programs).

Well, a basic view of my crappy GUI can be seen here: https://www.youtube.com/watch?v=HAyMDRzxYzY

Though, the point of this video was more Doom but with the musical notes replaced with DTMF-like tones (but still having octaves and similar so
it at least sorta sounds like music). As sort of a bit of hackery with
the MIDI playback code (tweaking out the FM synthesis to to play DTMF
tones rather than the normal FM instruments).

Well, that and another recent-ish video of Doom modified to sorta
resemble the monochrome style of "Return of the Obra Dinn" (well, and
also this Doom port also has a 3D glasses mod, and 3D+Obra which
actually works pretty OK, ...).

...

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bonita Montero@3:633/10 to All on Saturday, May 30, 2026 11:18:21

Am 30.05.2026 um 01:20 schrieb Lawrence D?Oliveiro:

But they still don?t have ?try-finally?.

There's RAII:

#pragma once
#include <utility>
#include <concepts>
#include "nui.hpp"

template<std::invocable Fn>
struct defer final
{
defer( Fn &&fn, bool enabled = true ) :
m_fn( std::forward<Fn>( fn ) ),
m_enabled( enabled )
{
}
defer( defer const & ) = delete;
void operator =( defer const & ) = delete;
~defer() noexcept( std::is_nothrow_invocable_v<Fn> )
{
if( m_enabled ) [[likely]]
m_fn();
}
template<typename ... Fns>
bool operator ()( defer<Fns> &... additional ) noexcept( std::is_nothrow_invocable_v<Fn> && (std::is_nothrow_invocable_v<Fns> &&
...) )
{
if( !m_enabled ) [[unlikely]]
return false;
m_enabled = false;
m_fn();
return (additional() && ...);
}
template<typename ... Fns>
void disable( defer<Fns> &... additional ) noexcept
{
m_enabled = false;
(additional.disable(), ...);
}
template<typename ... Fns>
void enable( defer<Fns> &... additional ) noexcept
{
m_enabled = true;
(additional.enable(), ...);
}
private:
NO_UNIQUE_ADDRESS Fn m_fn;
bool m_enabled;
};

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Saturday, May 30, 2026 12:01:27

On 30/05/2026 05:25, Lawrence D?Oliveiro wrote:

On Sat, 30 May 2026 01:26:47 +0100, Bart wrote:

On 30/05/2026 00:18, Lawrence D?Oliveiro wrote:

On Fri, 29 May 2026 12:19:04 +0100, Bart wrote:

* Why do bitwise & | ^ need their own level anyway

So that you can do shifting and masking with minimal parentheses.

Can you give examples?

You haven?t done much bit manipulation, have you?

Extracting RGB components from a pixel:

const unsigned int
r = pixel >> 16 & 255,
g = pixel >> 8 & 255,
b = pixel & 255;

This merely requires <<'s precendence to be lower than &.

It doesn't need & | ^ to be distinct (only one is used here anyway).

It doesn't beed << >> to be in a distinct group from multiply or add groups.

Combining RGBA components into a pixel:

colors[i] =
channel[0] << 24
|
channel[1] << 16
|
channel[2] << 8
|
channel[3];

Exactly the same applies here. But if one of those | was & or ^, then
you might start needing parentheses.

* Why do << >> have their own level anyway

So that shift expressions can use common arithmetic operators with
minimal parentheses.

Again, examples?

From the same code module, putting together a subpicture image
consisting of 2 bits per pixel:

pixbuf[bufpixels / 4] |= histogram[histindex].index << bufpixels
% 4 * 2;

This is a better example, as is this from your link:

pixbuf[nr_buf_pixels] = colors[srcpix >> src_pix_index * 2 & 3];

This can indeed be written with fewer parentheses given the priority of
<< relative to * and &.

But it is also not clear because the part after >> is sprawling. You'd
want it like this:

pixbuf[nr_buf_pixels] = colors[srcpix >> (src_pix_index * 2) & 3];

Now there is less analysis to do to establish the span of the shift-count.

These are examples from MZLIB:

crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];
crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b >> 4)];

C's precedence rules say that many of those parentheses are not strictly needed, which means the following are exactly equivalent:

crcu32 = crcu32 >> 4 ^ s_crc32[crcu32 & 0xF ^ b & 0xF];
crcu32 = crcu32 >> 4 ^ s_crc32[crcu32 & 0xF ^ b >> 4];

So why were they added? Could it be that they make things clearer?

Remove ambiguity in the mind of the reader? Leader to fewer surprises
when a new term needs to be added?

With the original, NOBODY NEEDS TO CARE what the hell the precedences of

^ & with respect to each other. Port the fragment to a language with slightly different rules and it it would still work.

Post that fragment somewhere, and people will know what it means
*without needing to know which exact language it is*.

This is why I think it is pointless to devote 4 dedicated levels to <<

& | ^, and poor to rely on them for the meaning of your code.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Saturday, May 30, 2026 12:12:29

On 30/05/2026 03:02, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 30/05/2026 01:31, James Kuyper wrote:

On 2026-05-29 18:52, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

...

Of course, I want the answer to be that there isn't any point if
parentheses will be used anyway.

The answer, of course, is that the condition of your "if" clause is
not true. In the overwhelming majority of the cases, people do not
use parentheses to clarify the order of evaluation that is guaranteed
by C's grammar rules. They only use them in the cases where they feel
that there's a significant chance of confusion.

Those are the cases we're talking about! That is:

<< >> & | ^

Maybe add == != and < <= >= > is someone wants to take advantage of
their different levels, but I guess 99% wouldn't even know about what.

Most of the rest, there tends to be agreement across languages:

school arithmetic group - comparisons - logical and/or

I haven't included ?: as that's too weird.

So what is your question? I had thought that you meant to ask why
Ritchie defined the precedences that way, but apparently that's
not what you meant.

You seem to have a problem with context. I was replying to JK who was
replying to a quote of mine from within one of your posts (maybe he's killfiled me so couldn't respond directly).

There was no question posted. He suggested that most of the time,
parentheses are not used and gave examples using * and +.

My original remarks were about the widespread use of parentheses to
clarify the grouping of operators with the more obscure priorities, and
my reply addressed that.

See also the examples I posted a few minutes ago involving >> & and ^.

That is, if you are interested in my point, which I doubt. You seem more intent on some personal campaign.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Saturday, May 30, 2026 13:52:43

On 29/05/2026 22:16, BGB wrote:

On 5/29/2026 6:22 AM, David Brown wrote:

On 29/05/2026 12:20, BGB wrote:

On 5/29/2026 2:52 AM, Janis Papanagnou wrote:

On 2026-05-28 11:57, BGB wrote:

On 5/28/2026 2:18 AM, Janis Papanagnou wrote:

On 2026-05-28 01:49, BGB wrote:

[...]

But, not really an "easy" way to avoid bloat, other than to write
code specifically for what cases are relevant; while also avoiding
needless duplication and copy paste (where, overuse of copy/paste
can also lead to bloat; along with turning the code into an ugly
mess).

Hmm.. - as said, the during very early days there were issues; I
recall on one platform duplication of template code in more that
one source unit. And/or some environmental hacks (of the compiler)
to deposit template code for linking. In the later days I've not
seen such immature things anymore.

Possibly, a lot could depend on how one is counting things as well.

In a lot of cases when using GCC, I end up using:
�� -ffunction-sections -fdata-sections -Wl,-gc-sections

On many targets, "-fdata-sections" can lead to noticeably larger and
slower code because it effectively eliminates section anchor
optimisations.� It does not negatively affect x86 AFAICS, because x86
does not use section anchors.

<https://godbolt.org/z/zeoq41Y7d>

With -fsection-anchors (enabled with optimisation on targets that
support it - generally RISCy load/store architectures), program-
lifetime variables are kept together in a lump (as though they were in
a struct) and often addressed by a pointer to that pretend struct.
Thus if a function accesses two variables "a" and "b", instead of
having to load the addresses of each of "a" and "b" into separate
registers, it loads an "anchor" into one register and accesses the
variables with reg+offset addressing.

I've seen "-fdata-sections" used regularly in embedded systems - it is
almost always a bad idea.

("-ffunction-sections" is often very helpful to reduce code image
size, so keep that one.)

Both seem to help on x86, x86-64, and also on RISC-V, at making GCC's
output at least sorta space-comparable to my own compilers.

The merit of "-fdata-sections" is mostly that it eliminates unused
global variables; whereas "-ffunction-sections" eliminates unreachable functions.

That is the point of them, yes. "-ffunction-sections" can be useful at removing unused code from more general code. For microcontrollers,
SDK's and manufacturers' driver code will normally contain a large
number of functions that can be eliminated in this way, saving a lot of
code space.

However, in practice, "-fdata-sections" rarely eliminates a significant
amount - most programs do not have large amounts of statically-allocated
data that is not used. Gcc, and I think most other compilers, put the
static lifetime data for each translation unit in its own section, so if
no data from a translation unit is used it will be eliminated at link
time even with -fno-data-sections. And of course it makes no difference
for heap data or stack data.

In my testing, "-ffunction-sections" is absolutely worth using (on
targets where code space is relevant - there's no need for PC software).
On some targets, it may mean a few lost opportunities for shorter
jump/call instructions between functions in the same translation unit,
but the cost is rarely anything more than a slightly longer link time.
But "-fdata-sections" typically gives almost no ram space savings, and
makes code bigger and slower.

As I noted, gcc on x86 does not support section anchors, so there is not likely to be much code cost for -ffdata-sections.

Where section anchors shine - and where -fdata-sections therefore has
cost - is when a function needs to access more than one piece of static lifetime data defined in the same translation unit (or another
translation unit if you are using LTO). That happens a lot in embedded
ARM programming at least. I don't know about RISC-V. If the target
normally uses a "small data section" for ram (I know this is common on PowerPC), then there is, in effect, a program-wide section anchor
already. So it is possible that it relatively few targets have section anchors - but the 32-bit ARM on gcc is a vastly popular choice in the
embedded world, so it is important to understand the cost of this
compiler flag for that target at least.

Neither is needed with my own compiler, which compiles things in a way
such that it eliminates anything that is unreachable.

[...]

That might be the case for a very simplistic compiler.� With an
optimising compiler, these extra variables will quickly be eliminated.
If the compiler has a good scheduling model of the device, it do
whatever instruction scheduling works best for that processor.� If the
model is not good enough, it will be suboptimal.� I would not,
however, expect any different in the generated code for the two code
snippets.

Sometimes this kind of "manual optimisation" is helpful when you have
to try to get efficient results from a weak compiler, however.

Possibly, but this sort of thing can help with both BGBCC and with MSVC
IME

I don't tend to think of MSVC as a highly optimising compiler - but it
is not a tool I have much use for, as it does not handle the targets I
need. When I have sometimes looked at the generated code on godbolt, it
has not impressed me at all. So it could well fall into the "helpful
when using a weaker compiler" category.

Usual strategy is to try to limit how much code is written, and
also to avoid doing things in ways that result in too much code, or >>>>> too much cruft.

Best to avoid both copy paste when reasonable, and sticking
anything non-trivial in macros.

We avoided macros if possible.

They are de-facto for constants and similar, but for longer stuff is
better avoided.

Macros are rarely the best way to define constants.� They are needed
if you are using the constants for pre-processor stuff like
conditional compilation.� But generally you get clearer code, better
typing, and potentially several other benefits from using alternative
choices like "enum" (even for stand-alone integer constants), "static
const" variables, and in C23, "constexpr" variables.� There's no doubt
that a lot of code /does/ use macros for constants, but I view it as a
relic of the past rather than good coding practice.

They are traditional...

Like:
� static const double M_PI = 3.14159265358979;

Could also make sense, but people don't do usually this, they usually
use macros...

They should not do so (IMHO, of course). Yes, macros are traditional -
but there are no plus sides to using them for this kind of thing.
(There are no plus sides to using all-caps either, but people do that too.)

(I'm snipping all the details of your own C compiler, because there is
very little I can comment on.)

But, things can be considered in relative terms:
Like, C++ may carry various penalties vs C.

I don't find C++ carries noticeably penalties compared to C, for my
embedded work.� But I do disable exceptions and RTTI - exceptions may
have very little run-time time overhead, but the unwind tables can be
significant when code size is important in small systems.

Yes, that is the main thing.
� They carry zero performance penalty in practice;
� But, have a non-zero penalty for image size.

Not enough to be a deal-breaker towards using them if they are used, but enough that one wants them disabled if not used...

Agreed.

(I could also note that I make heavy use of templates in C++ code - it
often leads to smaller and faster results.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Saturday, May 30, 2026 14:07:02

On 29/05/2026 19:53, Bart wrote:

On 29/05/2026 18:29, tTh wrote:

On 5/29/26 15:22, Bart wrote:

Something you might do when you have time (as I'm busy), is to
analyse the expressions in some C codebases, and isolate those where
removal of parentheses that group terms, would result in exactly the
same shape of expressions, and are therefore redundant.

�� This is a strange exercice. When I write complex expression,
�� I sometime use redondant parenthesis for the clarity of
�� my intentions about this computation. I'm thinking that
�� those extra (()) are a sort of in-line comments.

Sure, but some here like to say that such expressions, if they still
work without parentheses, are unambiguous anyway.

They are obviously unambiguous to compilers, and they are unambiguous to people who either know the precedence rules, or are able to look them up
to be sure of them at the time. For those that don't know the rules and
would rather guess randomly, they might misinterpret the expressions but
the expressions themselves are still unambiguous. So yes, complex
expressions without parentheses /are/ unambiguous.

However, being unambiguous does not mean people will not make mistakes
when reading or writing them, or that they can read and write them
correctly without effort. As you say, people are not compilers.

Parentheses can certainly reduce the cognitive effort for people reading
or writing complex expressions, and can significantly reduce the risk of errors. Extra local variables for sub-expressions can do this too
(especially when the language has good scoping rules and allows
variables to be declared when you need them).

So it is wrong to suggest that expressions written in C without extra parentheses are somehow "ambiguous" - but it is correct to say that
adding extra parentheses (within reason) can often help the readability
of code.

And this applies equally in all languages, no matter what precedence the operators have, or how many levels there are. I can certainly agree
with you that C would have been slightly nicer if the bitwise operators
and equality operators were at different precedences. There are a
number of changes I would have preferred - some of which you would agree
with, some not. But even if C were to have those changes overnight, it
would not change anything about what I wrote above. Regardless of the operator precedence, expressions are not ambiguous, but parentheses or splitting into sub-expressions can make code clearer.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, May 30, 2026 12:29:15

In article <10vd1tu$ekvl$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 29/05/2026 21:56, Keith Thompson wrote:

[snip]
Upthread, you asked a question:

And then the point becomes, if you always add the parentheses, what
was the point of having that particular precedence level?

You've made it clear that you were never interested in an answer.

You said this:

"You're asking why C is designed the way it is. We could waste a
great deal of time and effort answering that for you. There are
numerous documents about the design and history of C, and of
its ancestor languages. I could provide you with links."

Actually I'm not asking why C is like that. We're already there.

I'm saying that there is no value in those extra levels, some people
think is, and I'm arging about that. I was replying to tTh.

As for my question, what /is/ the point? I'm still waiting!

To clarify: the question is, what is the point of those levels?

How is that different from asking "why C is like that"?

Of course, I want the answer to be that there isn't any point if
parentheses will be used anyway.

There is a point, but it is history. That is, the "point" is of
those precedence levels is the history and evolution of the
language.

In PL/1 and early C, `|` and `&` were logical operators. The
short-circuiting `||` and `&&` came later, but the usage low
precedence for `|` and `&` was already baked in.

That's the point: the precedence reflects the original use as
boolean operators, not how things evolved for use almost purely
as bitwise operators.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Saturday, May 30, 2026 14:40:57

On 2026-05-30 13:52, David Brown wrote:

On 29/05/2026 22:16, BGB wrote:

On 5/29/2026 6:22 AM, David Brown wrote:

On 29/05/2026 12:20, BGB wrote:

On 5/29/2026 2:52 AM, Janis Papanagnou wrote:

We avoided macros if possible.

They are de-facto for constants and similar, but for longer stuff is
better avoided.

Macros are rarely the best way to define constants.� They are needed
if you are using the constants for pre-processor stuff like
conditional compilation.� But generally you get clearer code, better
typing, and potentially several other benefits from using alternative
choices like "enum" (even for stand-alone integer constants), "static
const" variables, and in C23, "constexpr" variables.� There's no
doubt that a lot of code /does/ use macros for constants, but I view
it as a relic of the past rather than good coding practice.

They are traditional...

Like:
�� static const double M_PI = 3.14159265358979;

Could also make sense, but people don't do usually this, they usually
use macros...

They should not do so (IMHO, of course).� Yes, macros are traditional -
but there are no plus sides to using them for this kind of thing. (There
are no plus sides to using all-caps either, but people do that too.)

Because in early days Cpp constants have been used and Cpp-stuff often capitalized[*]. Our C++ coding rules back then had mandated lowercase
also for constants, but strangely some folks were so used to uppercase
Cpp literals that they disliked to write constants (as other objects)
in lowercase, and stated opinions were sometimes heated like religious
topics.

I wonder what lexical convention regular "C" (or C++) programmers here
use for constants nowadays.

Curiously I inspected my latest C-source to see what convention I've
actually followed recently. But I noticed that I had no hard constants
used at all; all parameters came from a configuration file and through
the command line interface. (That makes sense, I guess.)

Janis

[*] Strangely there were C-function-macros that were written lowercase,
though.

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Saturday, May 30, 2026 13:56:48

On 30/05/2026 13:29, Dan Cross wrote:

In article <10vd1tu$ekvl$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 29/05/2026 21:56, Keith Thompson wrote:

[snip]
Upthread, you asked a question:

And then the point becomes, if you always add the parentheses, what >>> was the point of having that particular precedence level?

You've made it clear that you were never interested in an answer.

You said this:

"You're asking why C is designed the way it is. We could waste a
great deal of time and effort answering that for you. There are
numerous documents about the design and history of C, and of
its ancestor languages. I could provide you with links."

Actually I'm not asking why C is like that. We're already there.

I'm saying that there is no value in those extra levels, some people
think is, and I'm arging about that. I was replying to tTh.

As for my question, what /is/ the point? I'm still waiting!

To clarify: the question is, what is the point of those levels?

How is that different from asking "why C is like that"?

My question is actually independent of C or its history.

I accept those levels exist. I was asking do they currently serve a
useful purpose.

If not, people can choose to ignore those them when writing C code, for example like this where all () are technically superfluous:

crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

And they can choose to not adopt them when devising new languages,
however many still do faithfully recreate the same pattern, with a few
notable exceptions such as Go lang.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Saturday, May 30, 2026 16:36:12

On 30/05/2026 14:40, Janis Papanagnou wrote:

On 2026-05-30 13:52, David Brown wrote:

On 29/05/2026 22:16, BGB wrote:

On 5/29/2026 6:22 AM, David Brown wrote:

On 29/05/2026 12:20, BGB wrote:

On 5/29/2026 2:52 AM, Janis Papanagnou wrote:

We avoided macros if possible.

They are de-facto for constants and similar, but for longer stuff
is better avoided.

Macros are rarely the best way to define constants.� They are needed
if you are using the constants for pre-processor stuff like
conditional compilation.� But generally you get clearer code, better
typing, and potentially several other benefits from using
alternative choices like "enum" (even for stand-alone integer
constants), "static const" variables, and in C23, "constexpr"
variables.� There's no doubt that a lot of code /does/ use macros
for constants, but I view it as a relic of the past rather than good
coding practice.

They are traditional...

Like:
�� static const double M_PI = 3.14159265358979;

Could also make sense, but people don't do usually this, they usually
use macros...

They should not do so (IMHO, of course).� Yes, macros are traditional
- but there are no plus sides to using them for this kind of thing.
(There are no plus sides to using all-caps either, but people do that
too.)

Because in early days Cpp constants have been used and Cpp-stuff often capitalized[*]. Our C++ coding rules back then had mandated lowercase
also for constants, but strangely some folks were so used to uppercase
Cpp literals that they disliked to write constants (as other objects)
in lowercase, and stated opinions were sometimes heated like religious topics.

I wonder what lexical convention regular "C" (or C++) programmers here
use for constants nowadays.

Curiously I inspected my latest C-source to see what convention I've
actually followed recently. But I noticed that I had no hard constants
used at all; all parameters came from a configuration file and through
the command line interface. (That makes sense, I guess.)

Janis

[*] Strangely there were C-function-macros that were written lowercase, though.

I think there is a reasonable case for all-caps for macros that are
doing something "weird", as a warning to users. You know that it's
risky trying to write "MAX(a++, b++)", as it might evaluate one or both
of the parameter expressions twice. But if you also use all-caps for well-behaved macros, that dilutes the warning effect.

I use all-caps for define names that I expect to come from outside the
source files - like a command line flag "-DPROG_VARIANT=2" in a
makefile, and that kind of thing. That, to me, counts as a "weird" macro.

I am happy to use macros where they make sense, but I would not use a
macro if a static const, enum, static inline function, or constexpr
variable will do just as well.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Saturday, May 30, 2026 15:48:28

On 5/30/2026 6:52 AM, David Brown wrote:

On 29/05/2026 22:16, BGB wrote:

On 5/29/2026 6:22 AM, David Brown wrote:

On 29/05/2026 12:20, BGB wrote:

On 5/29/2026 2:52 AM, Janis Papanagnou wrote:

On 2026-05-28 11:57, BGB wrote:

On 5/28/2026 2:18 AM, Janis Papanagnou wrote:

On 2026-05-28 01:49, BGB wrote:

[...]

But, not really an "easy" way to avoid bloat, other than to write >>>>>> code specifically for what cases are relevant; while also avoiding >>>>>> needless duplication and copy paste (where, overuse of copy/paste >>>>>> can also lead to bloat; along with turning the code into an ugly
mess).

Hmm.. - as said, the during very early days there were issues; I
recall on one platform duplication of template code in more that
one source unit. And/or some environmental hacks (of the compiler)
to deposit template code for linking. In the later days I've not
seen such immature things anymore.

Possibly, a lot could depend on how one is counting things as well.

In a lot of cases when using GCC, I end up using:
�� -ffunction-sections -fdata-sections -Wl,-gc-sections

On many targets, "-fdata-sections" can lead to noticeably larger and
slower code because it effectively eliminates section anchor
optimisations.� It does not negatively affect x86 AFAICS, because x86
does not use section anchors.

<https://godbolt.org/z/zeoq41Y7d>

With -fsection-anchors (enabled with optimisation on targets that
support it - generally RISCy load/store architectures), program-
lifetime variables are kept together in a lump (as though they were
in a struct) and often addressed by a pointer to that pretend struct.
Thus if a function accesses two variables "a" and "b", instead of
having to load the addresses of each of "a" and "b" into separate
registers, it loads an "anchor" into one register and accesses the
variables with reg+offset addressing.

I've seen "-fdata-sections" used regularly in embedded systems - it
is almost always a bad idea.

("-ffunction-sections" is often very helpful to reduce code image
size, so keep that one.)

Both seem to help on x86, x86-64, and also on RISC-V, at making GCC's
output at least sorta space-comparable to my own compilers.

The merit of "-fdata-sections" is mostly that it eliminates unused
global variables; whereas "-ffunction-sections" eliminates unreachable
functions.

That is the point of them, yes.� "-ffunction-sections" can be useful at removing unused code from more general code.� For microcontrollers,
SDK's and manufacturers' driver code will normally contain a large
number of functions that can be eliminated in this way, saving a lot of
code space.

However, in practice, "-fdata-sections" rarely eliminates a significant amount - most programs do not have large amounts of statically-allocated data that is not used.� Gcc, and I think most other compilers, put the static lifetime data for each translation unit in its own section, so if
no data from a translation unit is used it will be eliminated at link
time even with -fno-data-sections.� And of course it makes no difference
for heap data or stack data.

The main place it makes a difference is global arrays from a translation
unit that is included, but for functions that are not included.

Also functions with large static arrays.

void SomeFunc()
{
static char buf[4096];
...
}

Where, say, eliminating SomeFunc does not necessarily eliminate buf.

In my testing, "-ffunction-sections" is absolutely worth using (on
targets where code space is relevant - there's no need for PC software).
�On some targets, it may mean a few lost opportunities for shorter jump/call instructions between functions in the same translation unit,
but the cost is rarely anything more than a slightly longer link time.
But "-fdata-sections" typically gives almost no ram space savings, and
makes code bigger and slower.

As I noted, gcc on x86 does not support section anchors, so there is not likely to be much code cost for -ffdata-sections.

Where section anchors shine - and where -fdata-sections therefore has
cost - is when a function needs to access more than one piece of static lifetime data defined in the same translation unit (or another
translation unit if you are using LTO).� That happens a lot in embedded
ARM programming at least.� I don't know about RISC-V.� If the target normally uses a "small data section" for ram (I know this is common on PowerPC), then there is, in effect, a program-wide section anchor
already.� So it is possible that it relatively few targets have section anchors - but the 32-bit ARM on gcc is a vastly popular choice in the embedded world, so it is important to understand the cost of this
compiler flag for that target at least.

It depends on the way it is built.

A lot of times though (for non-relocatable static-linked binaries) it
mostly tends to use AUIPC+LD or AUIPC+ST pairs to access global
variables. There is a Global Pointer that needs to be loaded when the
binary is started, unclear what it is used for exactly.

in PIC/PIE binaries, it uses AUIPC+ADDI to get a GOT pointer and then
uses the GOT pointers to access global variables (via fetching the
address of the variable from the GOT).

Can note that BGBCC targeting RV works differently, instead using GP to
access global variables, and clustering the commonly accessed global
variables around GP (GP is initialized to point towards at the start of
the ".data" section for the main EXE at program startup, though in my
ABI this may actually be a copy allocated elsewhere in RAM, and not
actually pointing at the version of the section located in the original
PE image; note that the loader also applies base relocs for the data
section separately when locating it; in effect the base relocs being internally partitioned per-section, rather than the per-page
partitioning scheme used in the original PE/COFF).

For my target, I mostly end up needing to use PIE binaries with GCC, as
it needs to be able to load the binary at different locations.

However, I am using a custom C library, as I (still) haven't managed to
get the "ld-linux.so" stuff working. Not yet figured out whatever poorly-documented arcane dark magic is needed to get this part working.

As noted, my compiler's output (including for plain RISC-V) using
PE/COFF, which was also the native format for the OS.

Note that for Linux binaries in this case it would mimics the Linux
syscall interface; though as I hadn't gotten very far with the PIE
loader, most of the syscalls are still not implemented.

My own makeshift OS has a different syscall mechanism, ironically using
the same registers, but a syscall number of -1 (Linux uses positive
syscall numbers).

They work in different ways, IIRC:
X10..X15: Arg1..Arg6
X16: Unused, 0
X17: Syscall Number (always positive)

In my case, syscalls took a different form, IIRC:
X10: Object ID (Handle)
X11: Method Number (Integer)
X12: Method Args List (Pointer)
X13: Return Value (Pointer)
X14..X16: Unused, 0 (RV)
X17: Holds -1 (RV).

In this case, system calls and many OS APIs take the form of object
method calls, with a special range of low-numbered object IDs (Eg, 0 or
NULL) mapping to core/basic syscalls.

But, yeah, some OS APIs would take the form of objects which would be
wrapped in a VTable struct, say:
SomeApi_Vt **api;
(*Api)->ApiMethod(api, arg1, arg2);

In this case, well, there were two major ways of requesting APIs:
Pairs of EIGHTCC values, for some public APIs
Or as FOURCC's for shorthand (zero padded to 64 bits).
As a UUID / GUID:
Primarily used for local / private interfaces.

Well, people probably can't guess where this mechanism originally came
from...

Well, not exactly the same as the inspiration, as there is no IDL
compiler involved, mostly just bare C structs representing the VTables.
There is essentially a blob of generic reusable method-wrappers (and a
whole generic reusable VTable) that is shared across many of these
objects, so calling a method on an object then just sorta translates it
into the corresponding system call to invoke that method slot.

Well, and this mechanism being part of why (for RV) I stuck with an ABI variant that passes everything in X registers (separate X and F
registers would make a big ugly mess for this whenever a method has a floating-point argument).

Neither is needed with my own compiler, which compiles things in a way
such that it eliminates anything that is unreachable.

[...]

That might be the case for a very simplistic compiler.� With an
optimising compiler, these extra variables will quickly be
eliminated. If the compiler has a good scheduling model of the
device, it do whatever instruction scheduling works best for that
processor.� If the model is not good enough, it will be suboptimal.
I would not, however, expect any different in the generated code for
the two code snippets.

Sometimes this kind of "manual optimisation" is helpful when you have
to try to get efficient results from a weak compiler, however.

Possibly, but this sort of thing can help with both BGBCC and with
MSVC IME

I don't tend to think of MSVC as a highly optimising compiler - but it
is not a tool I have much use for, as it does not handle the targets I
need.� When I have sometimes looked at the generated code on godbolt, it
has not impressed me at all.� So it could well fall into the "helpful
when using a weaker compiler" category.

Depends on what target I am building for:
Windows Native: Typically MSVC
WSL: Usually GCC or Clang
Seems to have: GCC 13.2.0; Clang 18.1.3
RISC-V GCC: Also 13.2.0 (also via WSL)
Linux: Typically GCC

I rarely much use Cygwin anymore, as it was mostly rendered obsolete by
WSL (on Win10 or similar).
Though, Cygwin may still be relevant on Win7 or WinXP systems.

For BGBCC, it can build both on native Windows and on Linux/WSL (though recently noted that this build was broken, mostly by GCC and Clang being
more pedantic about missing prototypes, and a few prototypes were being
missed by my function-prototype mining tool). Went and fixed this, but
haven't posted this yet.

As for optimizing in MSVC, yeah, it is in the area of not terrible, but
not super clever either.

If one expects the sort of high-level code-rewriting cleverness that GCC
or Clang often does, one will be disappointed.

But, sometimes, the main "heavy hitter" optimizations are things like constant-folding and register allocation, which it does do effectively.

Though, both MSVC and BGBCC seem to use one sort of strategy for
register allocation:

Static assign things to callee-save registers and use remaining
registers for dynamic allocation within basic-blocks. Variables with
finite non-overlapping lifetimes (that do not cross basic-block
boundaries) may potentially share a register (this more generally
applies to things like temporaries).

And, GCC and Clang use another: Assign dynamically but carry values
across basic-block boundaries along control-flow paths.

Both tend to give different patterns though, and seem to favor different
types of code.

Usual strategy is to try to limit how much code is written, and
also to avoid doing things in ways that result in too much code,
or too much cruft.

Best to avoid both copy paste when reasonable, and sticking
anything non-trivial in macros.

We avoided macros if possible.

They are de-facto for constants and similar, but for longer stuff is
better avoided.

Macros are rarely the best way to define constants.� They are needed
if you are using the constants for pre-processor stuff like
conditional compilation.� But generally you get clearer code, better
typing, and potentially several other benefits from using alternative
choices like "enum" (even for stand-alone integer constants), "static
const" variables, and in C23, "constexpr" variables.� There's no
doubt that a lot of code /does/ use macros for constants, but I view
it as a relic of the past rather than good coding practice.

They are traditional...

Like:
�� static const double M_PI = 3.14159265358979;

Could also make sense, but people don't do usually this, they usually
use macros...

They should not do so (IMHO, of course).� Yes, macros are traditional -
but there are no plus sides to using them for this kind of thing. (There
are no plus sides to using all-caps either, but people do that too.)

It is more tradition...

My conventions, as noted, are sorta like:
Macros / Constants: All caps;
Functions:
LIBNAME_SysSys_FirstLetterCaps //externally callable within LIBNAME
libname_subsys_nocaps //usually private to a subsystem
LIBNAME_FirstLetterCaps //main API for a private library
somefunction //C library convention
libname_somefunction //some OS API stuff
libFirstLetterCaps //GL-like, common for public APIs
FirstLetterCaps //was common in Win32 API, unused
Conventions are looser for standalone programs, but:
somename, some_name, ... //small programs
S_SomeFunc //id Software like, letter for major subsystem

In my own makeshift OS, had used OpenGL like naming for a lot of OS APIs.
tkWhatever //TestKern OS APIs
tkgdiWhatever //TKGDI: Basically Graphics/GUI stuff

As noted, some amount of these are implemented as object wrappers.
Likewise for my OpenGL implementation.

Though, OpenGL is more annoying in that it usually works via a "GetProcAddress()" type mechanism, so you need to fetch each function
pointer internally (and provide a lookup mechanism for each function).
Where, the main (static linked) part of the GL API is effectively
wrapper functions over function pointers gained via said
"GetProcAddress()" mechanism, which then go into the userland
implementation of those functions (typically, with part of the GL API
running in userland, and a backend part that runs "elsewhere", such as
in the GUI process, and is reached over an Object / COM interface).

FWIW, I didn't design this part of the GL API, but if I had, probably
would have just used COM objects internally.

Still better IMO to provide a nice C API wrapper over said COM objects
though, rather than go the DirectX route and be like "Hey, application
code, have fun with these here bare COM objects!".

Exposing bare COM objects and GUIDs in a public API is poor design IMO.

Well, even if in effect many of the API calls are just:
void apiDoSomething()
{ (*someapi_context)->DoSomething(someapi_context); }

This differs some from Linux APIs, which often like sharing bare
functions and variables across API boundaries (so, no real wall of
separation between the library and application in this sense).

These partly runs into an issue in that in my case, BGBCC (like MSVC)
requires being explicit about DLL imports and exports, and sharing
global variables across DLL boundaries is generally discouraged.

Note that the DLL mechanism doesn't actually support sharing global
variables directly, so if you try to share a global across a DLL
boundary, what you actually get is a hidden function-call that returns a pointer to the variable.

__declspec(dllimport) int somevar; //not actually a variable.

x=somevar;
Is more like:
x=*(int *)(__get_somevar());

But, generally discouraged.

Sharing variables across DLLs is bad practice IMO, and ideally only
sharing functions that represent a public API (and not, "whatever random
stuff happens to be in the library"). Contrast to the Linux "shared
object" approach which does tend to take more of a "share everything" approach, and libraries tend to not maintain as string of a library/application separation.

Well, and then Cygwin goes and tries to fake Linux behavior on top of
DLLs (in which case a large library can also find itself running into
the hard limit on the maximum number of DLL exports).

...

Can note that I had approached C library linking in a different way from MSVC/Windows:
Windows:
Main EXE and every DLL get their own static-linked C library.
Can opt into a shared DLL for the C library, but this adds wonk.
BGBCC+TestKern:
Main EXE gets a static-linked C library;
Exports a COM interface that DLLs can use;
DLLs get a static linked C-library stub;
Invokes main C library via a hidden COM interface.

This basically allows things like malloc/free and stdio to work across
DLL boundaries (unlike Windows where each gets their own local heap and
stdio, and trying to invoke a pointer from one DLL in another tends to
cause stuff to explode).

Granted, the DLLs effectively pulling the C library from the main EXE
via COM objects may seem a little unorthodox, but it seemed like the
best way to address my use cases.

(I'm snipping all the details of your own C compiler, because there is
very little I can comment on.)

But, things can be considered in relative terms:
Like, C++ may carry various penalties vs C.

I don't find C++ carries noticeably penalties compared to C, for my
embedded work.� But I do disable exceptions and RTTI - exceptions may
have very little run-time time overhead, but the unwind tables can be
significant when code size is important in small systems.

Yes, that is the main thing.
�� They carry zero performance penalty in practice;
�� But, have a non-zero penalty for image size.

Not enough to be a deal-breaker towards using them if they are used,
but enough that one wants them disabled if not used...

Agreed.

(I could also note that I make heavy use of templates in C++ code - it
often leads to smaller and faster results.)

Curious...

I had tended to use the "write everything one off for the task at hand" approach, but this is a higher-effort approach.

...

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Saturday, May 30, 2026 16:43:16

Bart <bc@freeuk.com> writes:

On 30/05/2026 13:29, Dan Cross wrote:

In article <10vd1tu$ekvl$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 29/05/2026 21:56, Keith Thompson wrote:

[snip]
Upthread, you asked a question:

And then the point becomes, if you always add the parentheses, what >>>> was the point of having that particular precedence level?

You've made it clear that you were never interested in an answer.

You said this:

"You're asking why C is designed the way it is. We could waste a
great deal of time and effort answering that for you. There are
numerous documents about the design and history of C, and of
its ancestor languages. I could provide you with links."

Actually I'm not asking why C is like that. We're already there.

I'm saying that there is no value in those extra levels, some people
think is, and I'm arging about that. I was replying to tTh.

As for my question, what /is/ the point? I'm still waiting!

To clarify: the question is, what is the point of those levels?
How is that different from asking "why C is like that"?

My question is actually independent of C or its history.

I accept those levels exist. I was asking do they currently serve a
useful purpose.

That's very different from your original question, which is quoted
above. Your original question, with its use of the past tense,
seemed clearly (to me) to be about how C was originally designed.

I don't have a straightforward yes or no answer to your restated
question.

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently. A simpler set of rules,
with fewer levels, *might* have been better. I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are. I would accept them if they had
been defined differently.

Nothing about the current rules particularly bothers me. There are
no objective criteria for deciding what the rules *should* be.
Even having multiplication bind more tightly than addition is
fundamentally an arbitrary choice (though one that's almost
universally recognized, even outside the context of programming
languages).

Of course all C implementations must implement the expression
syntax as it's defined by the standard, and any changes in future
editions of the standard would be impractical. As a programmer,
I don't have to be as strict; I can add parentheses when writing
code, and I can look up the rules as needed when reading code.

If not, people can choose to ignore those them when writing C code,
for example like this where all () are technically superfluous:

crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

Yes, they can, and I personally tend to agree that they should.

And they can choose to not adopt them when devising new languages,
however many still do faithfully recreate the same pattern, with a few notable exceptions such as Go lang.

When designing a new language, there are real advantages in strictly
imitating C's rules, just because so many programmers are familiar
with them. (I would have been silly for C++ or Objective-C to
change the precedence rules, even to improve them.) But there
are also real advantages in using precedence rules that are better
(e.g., simpler) than C's. It depends on the nature of the language.
It could be an interesting discussion for comp.lang.misc.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Sunday, May 31, 2026 00:29:40

On Sat, 30 May 2026 12:01:27 +0100, Bart wrote:

It doesn't beed << >> to be in a distinct group from multiply or add
groups.

But it is also not clear because the part after >> is sprawling.

It?s a counterexample to your claim that ?<< >> [don?t need] to be in
a distinct group?, isn?t it?

You'd want it like this:

Because they are in a distinct group, you don?t need it like this.

Remove ambiguity in the mind of the reader? Leader to fewer
surprises when a new term needs to be added?

The new terms will most likely fit into the existing ones in the
natural way.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Sunday, May 31, 2026 03:37:50

On 2026-05-31 01:43, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[...]

[...]

C's operator precedence rules are complicated and arguably flawed.

I'd say that just the (known) flaw makes them (slightly) complicated;
so you need to remember that "flaw" (or "inconsistency") to be safe.
The rest is completely sensible. And even if one doesn't have a table
to look up the precedences they mostly can be derived (presuming one
has a feeling for the underlying logic of these things or experiences
from other related areas).

They could have been defined differently. A simpler set of rules,
with fewer levels, *might* have been better. I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are. I would accept them if they had
been defined differently.

Nothing about the current rules particularly bothers me. There are
no objective criteria for deciding what the rules *should* be.

There are. (What I called above as "derived underlying logic".) Some
aspects have already been formulated in this and other threads here.
But maybe not obvious to recognize without background in mathematics,
logic, or CS.

Even having multiplication bind more tightly than addition is
fundamentally an arbitrary choice

(Now opinions are getting really strange; in the above stated sense.)

(though one that's almost
universally recognized, even outside the context of programming
languages).

[...]

If not, people can choose to ignore those them when writing C code,
for example like this where all () are technically superfluous:

crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

Yes, they can, and I personally tend to agree that they should.

The more complex the expressions are the more structure they need.

IMO, the parenthesis above make precedence clear (if unknown!), but
are not contributing to readability. It would have made more sense
to separate the sub-expression within the [...] in an own object to
enhance readability and to more easily understand what's going on.

To emphasize; not the precedences are the problem above, but the
complexity of the expression in connexion with lack of structuring.

[...]

When designing a new language, there are real advantages in strictly imitating C's rules, just because so many programmers are familiar
with them.

Huh? - How that? - Are you saying here that practically only C-like
languages are in common use? - But even if so; there's quite some
languages with differing precedence rules, not C-based, and without
such a flaw like the one being discussed. - When designing a *new*
language I'd certainly choose one of the sensible precedence rules,
and just without those obvious flaws. (And not use "C" as base, of
course.)

(I would have been silly for C++ or Objective-C to
change the precedence rules, even to improve them.) But there
are also real advantages in using precedence rules that are better
(e.g., simpler) than C's.

Or - with reference to that flaw - just more consistent.

Consistent systems are inherently simpler, in the sense of easier to
understand and thus more straightforward to use. A precondition for
that is, as said, at least a basic understanding of such things.

It depends on the nature of the language.
It could be an interesting discussion for comp.lang.misc.

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Saturday, May 30, 2026 19:53:40

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-05-31 01:43, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[...]

[...]
C's operator precedence rules are complicated and arguably flawed.

I'd say that just the (known) flaw makes them (slightly) complicated;
so you need to remember that "flaw" (or "inconsistency") to be safe.
The rest is completely sensible. And even if one doesn't have a table
to look up the precedences they mostly can be derived (presuming one
has a feeling for the underlying logic of these things or experiences
from other related areas).

Reasonable, but I feel the need to say that that's your personal
opinion. You seem to think that C's precedence rules have one and
only one flaw, and a set of rules with that flaw corrected would
be ideal.

I don't even necessarily disagree, but others are likely to have
different opinions, and those opinions might be perfectly valid.

I don't want to make a huge deal out of this. I honestly don't have
a strong opinion myself. I usually find dealing with the rules
as they exist to be a much better use of my time and attention --
and I don't mean that as a criticism of anyone who choose to think
about alternatives.

They could have been defined differently. A simpler set of rules,
with fewer levels, *might* have been better. I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are. I would accept them if they had
been defined differently.
Nothing about the current rules particularly bothers me. There are
no objective criteria for deciding what the rules *should* be.

There are. (What I called above as "derived underlying logic".) Some
aspects have already been formulated in this and other threads here.
But maybe not obvious to recognize without background in mathematics,
logic, or CS.

Even having multiplication bind more tightly than addition is
fundamentally an arbitrary choice

(Now opinions are getting really strange; in the above stated sense.)

Mathematical notation almost universally has multiplication binding
more tightly than addition. It's consistent because the consistency
itself has big advantages so that you can write x + y * z (or x +
y ? z) and everyone knows what you mean. Strict left-to-right
evaluation would also have been a valid choice. (I don't know the
history, but it probably goes back several centuries.)

(though one that's almost
universally recognized, even outside the context of programming
languages).
[...]

If not, people can choose to ignore those them when writing C code,
for example like this where all () are technically superfluous:

crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

Yes, they can, and I personally tend to agree that they should.

The more complex the expressions are the more structure they need.

IMO, the parenthesis above make precedence clear (if unknown!), but
are not contributing to readability. It would have made more sense
to separate the sub-expression within the [...] in an own object to
enhance readability and to more easily understand what's going on.

To emphasize; not the precedences are the problem above, but the
complexity of the expression in connexion with lack of structuring.

[...]

When designing a new language, there are real advantages in strictly
imitating C's rules, just because so many programmers are familiar
with them.

Huh? - How that? - Are you saying here that practically only C-like
languages are in common use?

Huh? No, I didn't say that at all.

I suggest that if you're designing a somewhat C-like language,
sticking to C's precedence rules has advantages due to programmer
familiarity. Even for a language that's not particularly C-like,
but that has C-like expressions, the designer might consider
following C's rules.

Or not.

- But even if so; there's quite some
languages with differing precedence rules, not C-based, and without
such a flaw like the one being discussed. - When designing a *new*
language I'd certainly choose one of the sensible precedence rules,
and just without those obvious flaws. (And not use "C" as base, of
course.)

Certainly.

(I would have been silly for C++ or Objective-C to
change the precedence rules, even to improve them.) But there
are also real advantages in using precedence rules that are better
(e.g., simpler) than C's.

Or - with reference to that flaw - just more consistent.

Consistent systems are inherently simpler, in the sense of easier to understand and thus more straightforward to use. A precondition for
that is, as said, at least a basic understanding of such things.

Ah, but consistent with what? Internal consistency and consistency
with existing practice are not necessarily the same thing.

It depends on the nature of the language.
It could be an interesting discussion for comp.lang.misc.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Harnden@3:633/10 to All on Sunday, May 31, 2026 09:12:31

On 31/05/2026 00:43, Keith Thompson wrote:

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently. A simpler set of rules,
with fewer levels,*might* have been better. I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are. I would accept them if they had
been defined differently.

Can't the compiler easily remove any parens that aren't necessary?
So - just write complex expressions in a way that a human can most
easily understand, it makes your intention clear and probable doesn't
increase the size of the executable.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 11:14:29

On 30/05/2026 22:48, BGB wrote:

On 5/30/2026 6:52 AM, David Brown wrote:

On 29/05/2026 22:16, BGB wrote:

On 5/29/2026 6:22 AM, David Brown wrote:

On 29/05/2026 12:20, BGB wrote:

On 5/29/2026 2:52 AM, Janis Papanagnou wrote:

On 2026-05-28 11:57, BGB wrote:

On 5/28/2026 2:18 AM, Janis Papanagnou wrote:

On 2026-05-28 01:49, BGB wrote:

[...]

But, not really an "easy" way to avoid bloat, other than to write >>>>>>> code specifically for what cases are relevant; while also
avoiding needless duplication and copy paste (where, overuse of >>>>>>> copy/paste can also lead to bloat; along with turning the code
into an ugly mess).

Hmm.. - as said, the during very early days there were issues; I
recall on one platform duplication of template code in more that
one source unit. And/or some environmental hacks (of the compiler) >>>>>> to deposit template code for linking. In the later days I've not
seen such immature things anymore.

Possibly, a lot could depend on how one is counting things as well.

In a lot of cases when using GCC, I end up using:
�� -ffunction-sections -fdata-sections -Wl,-gc-sections

On many targets, "-fdata-sections" can lead to noticeably larger and
slower code because it effectively eliminates section anchor
optimisations.� It does not negatively affect x86 AFAICS, because
x86 does not use section anchors.

<https://godbolt.org/z/zeoq41Y7d>

With -fsection-anchors (enabled with optimisation on targets that
support it - generally RISCy load/store architectures), program-
lifetime variables are kept together in a lump (as though they were
in a struct) and often addressed by a pointer to that pretend
struct. Thus if a function accesses two variables "a" and "b",
instead of having to load the addresses of each of "a" and "b" into
separate registers, it loads an "anchor" into one register and
accesses the variables with reg+offset addressing.

I've seen "-fdata-sections" used regularly in embedded systems - it
is almost always a bad idea.

("-ffunction-sections" is often very helpful to reduce code image
size, so keep that one.)

Both seem to help on x86, x86-64, and also on RISC-V, at making GCC's
output at least sorta space-comparable to my own compilers.

The merit of "-fdata-sections" is mostly that it eliminates unused
global variables; whereas "-ffunction-sections" eliminates
unreachable functions.

That is the point of them, yes.� "-ffunction-sections" can be useful
at removing unused code from more general code.� For microcontrollers,
SDK's and manufacturers' driver code will normally contain a large
number of functions that can be eliminated in this way, saving a lot
of code space.

However, in practice, "-fdata-sections" rarely eliminates a
significant amount - most programs do not have large amounts of
statically-allocated data that is not used.� Gcc, and I think most
other compilers, put the static lifetime data for each translation
unit in its own section, so if no data from a translation unit is used
it will be eliminated at link time even with -fno-data-sections.� And
of course it makes no difference for heap data or stack data.

The main place it makes a difference is global arrays from a translation unit that is included, but for functions that are not included.

Also functions with large static arrays.

void SomeFunc()
{
� static char buf[4096];
� ...
}

Where, say, eliminating SomeFunc does not necessarily eliminate buf.

Yes, if you have such code but want to eliminate it, then
-fdata-sections would definitely benefit. I have not seen such code in practice (at least not with very big static arrays, and that also was
not an essential part of the program). But of course I have only seen a microscopic part of all C code written - if you come across this sort of thing, then I appreciate your point.

(There are several ways to make this more "friendly" to builds that need
to be compact, such as putting the buffer and/or SomeFunc in a separate
file or giving it a specific section of its own.)

In my testing, "-ffunction-sections" is absolutely worth using (on
targets where code space is relevant - there's no need for PC
software). ��On some targets, it may mean a few lost opportunities for
shorter jump/call instructions between functions in the same
translation unit, but the cost is rarely anything more than a slightly
longer link time. But "-fdata-sections" typically gives almost no ram
space savings, and makes code bigger and slower.

As I noted, gcc on x86 does not support section anchors, so there is
not likely to be much code cost for -ffdata-sections.

Where section anchors shine - and where -fdata-sections therefore has
cost - is when a function needs to access more than one piece of
static lifetime data defined in the same translation unit (or another
translation unit if you are using LTO).� That happens a lot in
embedded ARM programming at least.� I don't know about RISC-V.� If the
target normally uses a "small data section" for ram (I know this is
common on PowerPC), then there is, in effect, a program-wide section
anchor already.� So it is possible that it relatively few targets have
section anchors - but the 32-bit ARM on gcc is a vastly popular choice
in the embedded world, so it is important to understand the cost of
this compiler flag for that target at least.

It depends on the way it is built.

A lot of times though (for non-relocatable static-linked binaries) it
mostly tends to use AUIPC+LD or AUIPC+ST pairs to access global
variables. There is a Global Pointer that needs to be loaded when the
binary is started, unclear what it is used for exactly.

If you have a global pointer, then it will probably be used for
gp+offset access to global data, eliminating the need for section anchors.

I have not used RISC-V, and am not familiar with its details. I can see
from godbolt that when -fdata-sections is in action and you are loading
from static lifetime variables, the compiler generates instructions like

lw a5, a_variable
lw a4, b_variable
lw a0, c_variable

When you do not have "-fdata-sections", it uses anchors :

lla a4, .LANCHOR0
lw a5, 0(a4)
lw a3, 4(a4)
lw a0, 8(a4)

From my (limited) understanding, RISC-V cannot use 32-bit absolute addressing. So the "lw a5, a_variable" must be a pseudo-instruction -
using register + offset addressing. If there is a global pointer, then presumably that is used here. Alternatively, the pseudo instruction
might assemble to two real instruction to support the 32-bit address. I
know both techniques are used in some targets, but don't know about RISC-V.

Certainly it would surprise me if the "lw a5, a_variable" version were
more efficient than using anchors - otherwise why would gcc generate
code with anchors when given a free choice? (Perhaps gcc is not well
tuned for RISC-V code generation - I am wary of making too many
assumptions about the processor just from some simple compiler outputs.)

(clang does not, apparently, support section anchors as an optimisation technique. Both with and without -fdata-sections, on RISC-V it first
uses two instructions to load ".L_MergedGlobals" into a register and
then uses that register plus offset to access data.)

I don't tend to think of MSVC as a highly optimising compiler - but it
is not a tool I have much use for, as it does not handle the targets I
need.� When I have sometimes looked at the generated code on godbolt,
it has not impressed me at all.� So it could well fall into the
"helpful when using a weaker compiler" category.

Depends on what target I am building for:
� Windows Native: Typically MSVC
� WSL: Usually GCC or Clang
�� Seems to have: GCC 13.2.0; Clang 18.1.3
�� RISC-V GCC: Also 13.2.0 (also via WSL)
� Linux: Typically GCC

I rarely much use Cygwin anymore, as it was mostly rendered obsolete by
WSL (on Win10 or similar).
Though, Cygwin may still be relevant on Win7 or WinXP systems.

Cygwin has its own wide range of complications. If you want to use gcc targeting native Windows, msys2 and mingw-64 are probably your best bet, either compiled natively under msys2 or as a cross-compile from Linux.
But don't place too much emphasis on my advice, as I very rarely compile
C or C++ code for Windows - most of my PC target (Linux or Windows)
coding is in Python.

For BGBCC, it can build both on native Windows and on Linux/WSL (though recently noted that this build was broken, mostly by GCC and Clang being more pedantic about missing prototypes, and a few prototypes were being missed by my function-prototype mining tool). Went and fixed this, but haven't posted this yet.

As for optimizing in MSVC, yeah, it is in the area of not terrible, but
not super clever either.

If one expects the sort of high-level code-rewriting cleverness that GCC
or Clang often does, one will be disappointed.

But, sometimes, the main "heavy hitter" optimizations are things like constant-folding and register allocation, which it does do effectively.

Though, both MSVC and BGBCC seem to use one sort of strategy for
register allocation:

Static assign things to callee-save registers and use remaining
registers for dynamic allocation within basic-blocks. Variables with
finite non-overlapping lifetimes (that do not cross basic-block
boundaries) may potentially share a register (this more generally
applies to things like temporaries).

And, GCC and Clang use another: Assign dynamically but carry values
across basic-block boundaries along control-flow paths.

Both tend to give different patterns though, and seem to favor different types of code.

[...]

(I could also note that I make heavy use of templates in C++ code - it
often leads to smaller and faster results.)

Curious...

I had tended to use the "write everything one off for the task at hand" approach, but this is a higher-effort approach.

A lot of code tends to fall into the category of shuffling data around
or doing simple checks or conversions. It's also common to have wrapper functions for libraries to get something nicer, safer and more
convenient than some API that belongs in the early 1990's. Good C++
templates (and sometimes even good macros in C) can make the use of
these things far nicer, and most of the code that the templates appear
to generate inline in the caller disappears in optimisation.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 11:47:50

On 31/05/2026 03:37, Janis Papanagnou wrote:

On 2026-05-31 01:43, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[...]

If not, people can choose to ignore those them when writing C code,
for example like this where all () are technically superfluous:

�� crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

Yes, they can, and I personally tend to agree that they should.

The more complex the expressions are the more structure they need.

IMO, the parenthesis above make precedence clear (if unknown!), but
are not contributing to readability. It would have made more sense
to separate the sub-expression within the [...] in an own object to
enhance readability and to more easily understand what's going on.

To emphasize; not the precedences are the problem above, but the
complexity of the expression in connexion with lack of structuring.

This is an example of how readability depends on the reader. To me,
there is no benefit in having a sub-expression here because the
structure is clear - this is how you do table-based crc's with 4-bit
chunks. But to someone unfamiliar with CRC calculations, splitting the expression up might make it clearer. (Alternatively, a comment block
with an explanation could help.)

I /do/ think the parentheses here are helpful for readability, precisely because they emphasise the structure of the expression. You could write:

crcu32 = crcu32 >> 4 ^ s_crc32[crcu32 & 0xF ^ b & 0xF];

but that needs significantly more cognitive effort to parse when reading
it, could be misinterpreted, and has lost all the structure that makes
it easy to see what is going on.

(I regularly use bit-manipulation and shift instructions in my code -
but I still felt it best to check the details in a precedence table
before writing that.)

The expression as originally parenthesised is thus definitely easier for
/me/ to read, and is almost exactly the way I would write it myself :

crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

The only differences I would have are the names (why would anyone put
variable types into the names like "crcu32" ? We are not writing
BASIC), and I'd use a small case "0xf". Unlike almost every example
Bart has shown before, it even has nice spacing!

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 11:49:24

On 31/05/2026 10:12, Richard Harnden wrote:

On 31/05/2026 00:43, Keith Thompson wrote:

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently.� A simpler set of rules,
with fewer levels,*might* have been better.� I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are.� I would accept them if they had
been defined differently.

Can't the compiler easily remove any parens that aren't necessary?
So - just write complex expressions in a way that a human can most
easily understand, it makes your intention clear and probable doesn't increase the size of the executable.

Of course. Parentheses do not affect the generated code unless they
affect the semantics of the expression. (Some people think parentheses
affect the order of evaluation, but that is not the case for most
compilers.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Sunday, May 31, 2026 10:59:42

On 31/05/2026 01:29, Lawrence D?Oliveiro wrote:

On Sat, 30 May 2026 12:01:27 +0100, Bart wrote:

It doesn't beed << >> to be in a distinct group from multiply or add
groups.

But it is also not clear because the part after >> is sprawling.

It?s a counterexample to your claim that ?<< >> [don?t need] to be in
a distinct group?, isn?t it?

Sure, when an expression exactly suits how its current level works, such as:

a << b + c

WHEN you intend it to be 'a << (b + c)'. How about when you intend it to
be: '(a << b) + c'?

This is arguably more intuitive since << scales numbers in the same way
as '*'. As it is:

a << 3 + b means a << (3+b)
a * 3 + b means (a*3) + b

And also

a << 3 | b means (a<<3) | b
a << 3 + b means a << (3+b)

Both examples have a similar function but thanks to the odd priorities
are quite different when choosing between << and *, or | and +.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Sunday, May 31, 2026 11:10:31

On 31/05/2026 10:49, David Brown wrote:

On 31/05/2026 10:12, Richard Harnden wrote:

On 31/05/2026 00:43, Keith Thompson wrote:

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently.� A simpler set of rules,
with fewer levels,*might* have been better.� I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are.� I would accept them if they had
been defined differently.

Can't the compiler easily remove any parens that aren't necessary?
So - just write complex expressions in a way that a human can most
easily understand, it makes your intention clear and probable doesn't
increase the size of the executable.

Of course.� Parentheses do not affect the generated code unless they
affect the semantics of the expression.� (Some people think parentheses affect the order of evaluation,

They can do if they make a expression be parsed differently. Do you have
an example where they make no difference but people might think they do?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Sunday, May 31, 2026 03:45:50

Richard Harnden <richard.nospam@gmail.invalid> writes:

On 31/05/2026 00:43, Keith Thompson wrote:

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently. A simpler set of rules,
with fewer levels,*might* have been better. I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are. I would accept them if they had
been defined differently.

Can't the compiler easily remove any parens that aren't necessary?
So - just write complex expressions in a way that a human can most
easily understand, it makes your intention clear and probable doesn't increase the size of the executable.

Compilers generally remove *all* parens, necessary or not.
The output of a compiler is assembly or machine code. You almost
certainly can't tell from the generated code whether the input was,
for example, `a * b + c`, `(a * b) + c`, or `(((a) * (b)) + (c))`.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Sunday, May 31, 2026 04:02:25

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Richard Harnden <richard.nospam@gmail.invalid> writes:

On 31/05/2026 00:43, Keith Thompson wrote:

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently. A simpler set of rules,
with fewer levels,*might* have been better. I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are. I would accept them if they had
been defined differently.

Can't the compiler easily remove any parens that aren't necessary?
So - just write complex expressions in a way that a human can most
easily understand, it makes your intention clear and probable doesn't
increase the size of the executable.

Compilers generally remove *all* parens, necessary or not.
The output of a compiler is assembly or machine code. You almost
certainly can't tell from the generated code whether the input was,
for example, `a * b + c`, `(a * b) + c`, or `(((a) * (b)) + (c))`.

I realize I missed part of the point of your question.

Adding parentheses to an expression in a way that yields
an equivalent expression almost certainly will not affect the
generated code. Any parentheses that "restate" the precedence
rules are only for the convenience of human readers.

Ideally, you should always use exactly the right number of
parentheses to optimize readability. But since humans are not
compilers, there is no one way to do that. I would probably
add parentheses to `x == y & z`, assuming I really wanted the
semantics of `(x == y) & z` for some reason, but I would find the
superfluous parentheses in `x + (y * z)` or `x = (y + z)` annoying.
(Almost as annoying as the poor choice of variable names.)

It's possible to have too few parentheses or too many.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 13:18:56

On 31/05/2026 12:10, Bart wrote:

On 31/05/2026 10:49, David Brown wrote:

On 31/05/2026 10:12, Richard Harnden wrote:

On 31/05/2026 00:43, Keith Thompson wrote:

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently.� A simpler set of rules,
with fewer levels,*might* have been better.� I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are.� I would accept them if they had
been defined differently.

Can't the compiler easily remove any parens that aren't necessary?
So - just write complex expressions in a way that a human can most
easily understand, it makes your intention clear and probable doesn't
increase the size of the executable.

Of course.� Parentheses do not affect the generated code unless they
affect the semantics of the expression.� (Some people think
parentheses affect the order of evaluation,

They can do if they make a expression be parsed differently.

As I said, they "do not affect the generated code unless they affect the semantics of the expression." Obviously that only applies to extra parentheses. If that's what you mean by "parsed differently", then we
agree - clearly "(a + b) * c" gives different code from "a + (b * c)".

But you might consider "(a + b) + c" to be "parsed differently" than "a
+ (b + c)", because of how a particular compiler implements its parser.
It's possible that this results in different code for a particular
compiler, but there is no difference in the meaning for the C language.

I perhaps expressed it poorly when I said extra parentheses "do not"
affect the generated code - extra parentheses do not change the meaning
of the C code, and so compilers don't have to consider them in any way
(and optimising compilers generally don't). But a compiler is, of
course, free to be influenced by them and generate code that varies with
extra parentheses, as long as the final results match those required by
the standard.

Do you have
an example where they make no difference but people might think they do?

People might think they affect the order of evaluation, such as when you
have function calls :

u = foo(x) + (foo(y) + foo(z));

Some people might think the use of parentheses means that "foo(y)" and "foo(z)" are called before "foo(x)", when the order of all these calls
(and the additions) is unspecified. (Again, a given compiler might be influenced by the parentheses, but the language does not require it.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 13:24:03

On 31/05/2026 02:29, Lawrence D?Oliveiro wrote:

On Sat, 30 May 2026 12:01:27 +0100, Bart wrote:

It doesn't beed << >> to be in a distinct group from multiply or add
groups.

But it is also not clear because the part after >> is sprawling.

It?s a counterexample to your claim that ?<< >> [don?t need] to be in
a distinct group?, isn?t it?

No, it is not. If << and >> had been in the same group as
multiplication and division, your code (the snippets that Bart
referenced) would have had the same semantics. Other code might have different semantics, but Bart was entirely correct in this case.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Sunday, May 31, 2026 10:15:01

On 2026-05-31 05:49, David Brown wrote:
...

Of course. Parentheses do not affect the generated code unless they
affect the semantics of the expression. (Some people think parentheses affect the order of evaluation, but that is not the case for most compilers.)

I assume that last sentence is meant to apply only to parentheses which
don't change the semantics? Otherwise it seems manifestly false.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Sunday, May 31, 2026 10:24:30

On 2026-05-31 07:18, David Brown wrote:

On 31/05/2026 12:10, Bart wrote:

On 31/05/2026 10:49, David Brown wrote:

On 31/05/2026 10:12, Richard Harnden wrote:

On 31/05/2026 00:43, Keith Thompson wrote:

...

But you might consider "(a + b) + c" to be "parsed differently" than "a
+ (b + c)", because of how a particular compiler implements its parser.
It's possible that this results in different code for a particular
compiler, but there is no difference in the meaning for the C language.

(a + b) + c mandates adding a to b, then adding the result to c. a + (b
+ c) mandates adding b to c then adding the result to a. As far as
mathematics is concerned, that's the same thing, but in computer math it
can make a difference if one of the two results in overflow or
unnecessary loss of precision, and the other does not.

...

Do you have
an example where they make no difference but people might think they do?

People might think they affect the order of evaluation, such as when you have function calls :

u = foo(x) + (foo(y) + foo(z));

Some people might think the use of parentheses means that "foo(y)" and "foo(z)" are called before "foo(x)", when the order of all these calls
(and the additions) is unspecified. (Again, a given compiler might be influenced by the parentheses, but the language does not require it.

You're correct with regard to the function calls, but the parenthesized addition must be performed first, and the other one second, which may
make a difference, for the same reasons given in my previous paragraph.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 16:29:38

On 31/05/2026 16:15, James Kuyper wrote:

On 2026-05-31 05:49, David Brown wrote:
...

Of course. Parentheses do not affect the generated code unless they
affect the semantics of the expression. (Some people think parentheses
affect the order of evaluation, but that is not the case for most
compilers.)

I assume that last sentence is meant to apply only to parentheses which
don't change the semantics? Otherwise it seems manifestly false.

Yes. I thought I was quite clear in this, given that I wrote almost
exactly that in the previous sentence (which you also quoted above).

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 17:35:49

On 31/05/2026 16:24, James Kuyper wrote:

On 2026-05-31 07:18, David Brown wrote:

On 31/05/2026 12:10, Bart wrote:

On 31/05/2026 10:49, David Brown wrote:

On 31/05/2026 10:12, Richard Harnden wrote:

On 31/05/2026 00:43, Keith Thompson wrote:

...

But you might consider "(a + b) + c" to be "parsed differently" than "a
+ (b + c)", because of how a particular compiler implements its parser.
It's possible that this results in different code for a particular
compiler, but there is no difference in the meaning for the C language.

(a + b) + c mandates adding a to b, then adding the result to c. a + (b
+ c) mandates adding b to c then adding the result to a. As far as mathematics is concerned, that's the same thing, but in computer math it
can make a difference if one of the two results in overflow or
unnecessary loss of precision, and the other does not.

...

Do you have
an example where they make no difference but people might think they do? >>>

People might think they affect the order of evaluation, such as when you
have function calls :

u = foo(x) + (foo(y) + foo(z));

Some people might think the use of parentheses means that "foo(y)" and
"foo(z)" are called before "foo(x)", when the order of all these calls
(and the additions) is unspecified. (Again, a given compiler might be
influenced by the parentheses, but the language does not require it.

You're correct with regard to the function calls, but the parenthesized addition must be performed first, and the other one second, which may
make a difference, for the same reasons given in my previous paragraph.

The parentheses do not dictate the order of evaluation. But you are
correct - and it's worth pointing out, so thank you for doing that -
that for floating point operations, the grouping of operations can
affect the result.

If you are talking about floating point arithmetic (I was thinking of
integer arithmetic, but did not specify), then the operations are not necessarily commutative or associative, and the compiler cannot then re-arrange the operations unless it knows that doing so does not affect
the result.

But except for specific cases, the order of evaluation - both for the
values and side-effects - of sub-expressions is unspecified. Indeed,
they are unsequenced - the evaluations can interleave.

Usually, both sub-expressions of a binary operator will be evaluated
before the operator itself, simply because usually the results of the
operator cannot be calculated until the sub-expression's values are
known. But this is not a requirement of the language - if the compiler
can get the same results without doing so, it is free to pick a
different order. "(a + b) * 0" does not need to evaluate "a", "b", or
"a + b" at all unless there is a possibility of a side-effect - and it
can perform the side-effects in any order. "a + (b + c)" can check "a"
for a trap representation and deal with that before looking at "b" and
"c" or the results of "b + c", even though it cannot (for floating point operations) re-arrange the code to do "a + b" first.

If an implementation provides additional semantics to signed integer arithmetic, such as saturating or trapping overflow, then signed integer arithmetic operations are no longer associative. But normal C undefined behaviour on overflow is fully associative (as is wrapping semantics,
for addition, subtraction and multiplication).

So for non-associative operations, parentheses can affect the semantics
- and therefore the most likely (but not required) order of evaluation
of at least some parts of the sub-expressions. However, that also then
means we are not longer talking about parentheses that do not affect the semantics of the expression, which is what this thread branch is about.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Sunday, May 31, 2026 09:04:43

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood. To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Sunday, May 31, 2026 12:46:24

On 2026-05-31 11:35, David Brown wrote:
...

Usually, both sub-expressions of a binary operator will be evaluated
before the operator itself, simply because usually the results of the operator cannot be calculated until the sub-expression's values are
known. But this is not a requirement of the language

"The value computations of the operands of an operator are sequenced
before the value computation of the result of the operator." (6.5.1p3)

- if the compiler

can get the same results without doing so, it is free to pick a
different order.

Correct - but "same results" is crucial; it allows you to invoke the
"as-if" rule. Otherwise, the sequencing specified by 6.5.1p3 must be
honored.

...

If an implementation provides additional semantics to signed integer arithmetic, such as saturating or trapping overflow, then signed integer arithmetic operations are no longer associative. But normal C undefined behaviour on overflow is fully associative (as is wrapping semantics,
for addition, subtraction and multiplication).

I don't follow that. I believe that overflow is guaranteed for (5 +
INT_MAX) + INT_MIN, and completely avoided by 5 + (INT_MAX + INT_MIN),
which differ only by association. Are you saying they both have the
same chance of overflowing?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Sunday, May 31, 2026 18:11:05

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood. To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

I don't think they are needed for the three main groups (unless you need
to override the normal behaviour):

* Arithmetic ops that everyone knows

* Comparison ops which can be considered a single level
(in C there are two, but they are rarely chained)

* Logical AND/OR ops

Most involved in coding should know the order of these groups and will
know that AND takes precedence over OR because that is common.

The leaves the following which are not used in the real world and which
are diverse across languages:

<< >> & | ^

There it makes sense to use parentheses to make things clear when any of
these appear, but only if there is more than one and they are mixed.

I don't think that is particularly onerous to have to write, or too much clutter to read.

I wouldn't call anyone stupid for using () in such cases; more pragmatic.

There are some odd ones such as "." (not even considered a binary
operator in some languages), and assignment, but these also commonly
behave the same way across languages.

And then there is ?: :

a > b ? c : d # (a>b)?c:d
a + b ? c : d # (a+b)?c:d

The grouping of the first is probably what is intended. But in the
second, the intent might have been (a+b)?c:d, or a+(b?c:c); we don't
know for sure that the author didn't make a mistake or we don't know outselves.

Another candidate for parentheses when there are leading or trailing
binary ops involved.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From BGB@3:633/10 to All on Sunday, May 31, 2026 13:25:51

On 5/31/2026 4:14 AM, David Brown wrote:

On 30/05/2026 22:48, BGB wrote:

On 5/30/2026 6:52 AM, David Brown wrote:

On 29/05/2026 22:16, BGB wrote:

On 5/29/2026 6:22 AM, David Brown wrote:

On 29/05/2026 12:20, BGB wrote:

On 5/29/2026 2:52 AM, Janis Papanagnou wrote:

On 2026-05-28 11:57, BGB wrote:

On 5/28/2026 2:18 AM, Janis Papanagnou wrote:

On 2026-05-28 01:49, BGB wrote:

[...]

But, not really an "easy" way to avoid bloat, other than to
write code specifically for what cases are relevant; while also >>>>>>>> avoiding needless duplication and copy paste (where, overuse of >>>>>>>> copy/paste can also lead to bloat; along with turning the code >>>>>>>> into an ugly mess).

Hmm.. - as said, the during very early days there were issues; I >>>>>>> recall on one platform duplication of template code in more that >>>>>>> one source unit. And/or some environmental hacks (of the compiler) >>>>>>> to deposit template code for linking. In the later days I've not >>>>>>> seen such immature things anymore.

Possibly, a lot could depend on how one is counting things as well. >>>>>>

In a lot of cases when using GCC, I end up using:
�� -ffunction-sections -fdata-sections -Wl,-gc-sections

On many targets, "-fdata-sections" can lead to noticeably larger
and slower code because it effectively eliminates section anchor
optimisations.� It does not negatively affect x86 AFAICS, because
x86 does not use section anchors.

<https://godbolt.org/z/zeoq41Y7d>

With -fsection-anchors (enabled with optimisation on targets that
support it - generally RISCy load/store architectures), program-
lifetime variables are kept together in a lump (as though they were >>>>> in a struct) and often addressed by a pointer to that pretend
struct. Thus if a function accesses two variables "a" and "b",
instead of having to load the addresses of each of "a" and "b" into >>>>> separate registers, it loads an "anchor" into one register and
accesses the variables with reg+offset addressing.

I've seen "-fdata-sections" used regularly in embedded systems - it >>>>> is almost always a bad idea.

("-ffunction-sections" is often very helpful to reduce code image
size, so keep that one.)

Both seem to help on x86, x86-64, and also on RISC-V, at making
GCC's output at least sorta space-comparable to my own compilers.

The merit of "-fdata-sections" is mostly that it eliminates unused
global variables; whereas "-ffunction-sections" eliminates
unreachable functions.

That is the point of them, yes.� "-ffunction-sections" can be useful
at removing unused code from more general code.� For
microcontrollers, SDK's and manufacturers' driver code will normally
contain a large number of functions that can be eliminated in this
way, saving a lot of code space.

However, in practice, "-fdata-sections" rarely eliminates a
significant amount - most programs do not have large amounts of
statically-allocated data that is not used.� Gcc, and I think most
other compilers, put the static lifetime data for each translation
unit in its own section, so if no data from a translation unit is
used it will be eliminated at link time even with -fno-data-
sections.� And of course it makes no difference for heap data or
stack data.

The main place it makes a difference is global arrays from a
translation unit that is included, but for functions that are not
included.

Also functions with large static arrays.

void SomeFunc()
{
�� static char buf[4096];
�� ...
}

Where, say, eliminating SomeFunc does not necessarily eliminate buf.

Yes, if you have such code but want to eliminate it, then -fdata-
sections would definitely benefit.� I have not seen such code in
practice (at least not with very big static arrays, and that also was
not an essential part of the program).� But of course I have only seen a microscopic part of all C code written - if you come across this sort of thing, then I appreciate your point.

(There are several ways to make this more "friendly" to builds that need
to be compact, such as putting the buffer and/or SomeFunc in a separate
file or giving it a specific section of its own.)

I have seen this pattern sometimes, though usually in "medium old" code,
with newer code more often assuming that the stack is really big and so
can handle putting 1MB or more in a local array. Though, this is not
great on a target which doesn't have a huge stack.

In my case, I usually had 128K as the default stack size in my project.

In my testing, "-ffunction-sections" is absolutely worth using (on
targets where code space is relevant - there's no need for PC
software). ��On some targets, it may mean a few lost opportunities
for shorter jump/call instructions between functions in the same
translation unit, but the cost is rarely anything more than a
slightly longer link time. But "-fdata-sections" typically gives
almost no ram space savings, and makes code bigger and slower.

As I noted, gcc on x86 does not support section anchors, so there is
not likely to be much code cost for -ffdata-sections.

Where section anchors shine - and where -fdata-sections therefore has
cost - is when a function needs to access more than one piece of
static lifetime data defined in the same translation unit (or another
translation unit if you are using LTO).� That happens a lot in
embedded ARM programming at least.� I don't know about RISC-V.� If
the target normally uses a "small data section" for ram (I know this
is common on PowerPC), then there is, in effect, a program-wide
section anchor already.� So it is possible that it relatively few
targets have section anchors - but the 32-bit ARM on gcc is a vastly
popular choice in the embedded world, so it is important to
understand the cost of this compiler flag for that target at least.

It depends on the way it is built.

A lot of times though (for non-relocatable static-linked binaries) it
mostly tends to use AUIPC+LD or AUIPC+ST pairs to access global
variables. There is a Global Pointer that needs to be loaded when the
binary is started, unclear what it is used for exactly.

If you have a global pointer, then it will probably be used for
gp+offset access to global data, eliminating the need for section anchors.

I have not used RISC-V, and am not familiar with its details.� I can see from godbolt that when -fdata-sections is in action and you are loading
from static lifetime variables, the compiler generates instructions like

��lw a5, a_variable
��lw a4, b_variable
��lw a0, c_variable

When you do not have "-fdata-sections", it uses anchors :

��lla a4, .LANCHOR0
��lw a5, 0(a4)
��lw a3, 4(a4)
��lw a0, 8(a4)

From my (limited) understanding, RISC-V cannot use 32-bit absolute addressing.� So the "lw a5, a_variable" must be a pseudo-instruction -
using register + offset addressing.� If there is a global pointer, then presumably that is used here.� Alternatively, the pseudo instruction
might assemble to two real instruction to support the 32-bit address.� I know both techniques are used in some targets, but don't know about RISC-V.

It can use one of two strategies for these (after breaking up pseudo-instructions):
LUI a5, HiAddr //Abs32, Low 2GB only
LW a5, LoAddr(a5)
Or:
AUIPC a5, HiAddr //PC-Rel
LW a5, LoAddr(a5)

IIRC, LLA is similar, just using an ADDI as the second instruction.
But, yeah, the latter sequence would be more efficient.

I would expect something different if building with -fPIC or -fPIE, but
this depends on if it is a version of GCC built with support for these
(if using a version of GCC built for non-hosted targets, it ignores
these). Where, one effectively needs different GCC builds for bare-metal
(like OS kernels) and for hosted Linux development, for whatever bizarre reason...

Certainly it would surprise me if the "lw a5, a_variable" version were
more efficient than using anchors - otherwise why would gcc generate
code with anchors when given a free choice?� (Perhaps gcc is not well
tuned for RISC-V code generation - I am wary of making too many
assumptions about the processor just from some simple compiler outputs.)

It is not, it is a 2-op sequence usually.

Plain RISC-V has a bigger problem with 64-bit constants though,
generally needs to either load these from memory (more typical in GCC)
or build them in-place (which needs roughly 6 instructions in RISC-V).

Say (possible, but GCC doesn't do this):
LUI t0, ValHiA
LUI t1, valHiB
ADDI t0, t0, valLoA
ADDI t1, t1, valLoB
SLLI t1, t1, 32
ADD a0, t0, t1

In my case, I have extensions for RV that can turn a lot of this stuff
into single instructions (albeit with larger 8 and 12 byte encodings).

In some cases, it can save bytes, for example:
LW a1, Disp33s(a0)
As a 64-bit / 8-byte encoding, vs:
LUI t0, DispHi
ADD t0, t0, a0
LW a1, DispLo(a0)
Needing 12 bytes.

My own (more drastic) extensions can save more, by having a few Disp16 instructions, which can access 256K or 512K past GP within a single
32-bit instruction.

But, if/when any of this would end up in mainline RISC-V is uncertain.
Weirdly, there is a lot more emphasis there on big/fancy features (with
niche applicability), rather than on smaller things that can improve the properties of the base ISA (and that could more generally benefit nearly
all code built for the ISA).

(clang does not, apparently, support section anchors as an optimisation technique.� Both with and without -fdata-sections, on RISC-V it first
uses two instructions to load ".L_MergedGlobals" into a register and
then uses that register plus offset to access data.)

Yeah.

As noted, BGBCC mostly use the GP register to access globals for RV
based targets; sorting them out so that the most common ones come first
and so are typically a single instruction.

This is one merit though of not using separate compilation.
However, the approach used by my compiler is much more memory intensive.

I don't tend to think of MSVC as a highly optimising compiler - but
it is not a tool I have much use for, as it does not handle the
targets I need.� When I have sometimes looked at the generated code
on godbolt, it has not impressed me at all.� So it could well fall
into the "helpful when using a weaker compiler" category.

Depends on what target I am building for:
�� Windows Native: Typically MSVC
�� WSL: Usually GCC or Clang
�� Seems to have: GCC 13.2.0; Clang 18.1.3
�� RISC-V GCC: Also 13.2.0 (also via WSL)
�� Linux: Typically GCC

I rarely much use Cygwin anymore, as it was mostly rendered obsolete
by WSL (on Win10 or similar).
Though, Cygwin may still be relevant on Win7 or WinXP systems.

Cygwin has its own wide range of complications.� If you want to use gcc targeting native Windows, msys2 and mingw-64 are probably your best bet, either compiled natively under msys2 or as a cross-compile from Linux.
But don't place too much emphasis on my advice, as I very rarely compile
C or C++ code for Windows - most of my PC target (Linux or Windows)
coding is in Python.

Yes, I had used MinGW for a while, before mostly moving over to MSVC for native Windows.

The tradeoff is mostly:
MinGW is closer to native for Windows;
Cygwin could give a closer approximation of Linux on Windows, so one can
build a lot of Linux software and use "./configure" scripts and similar.

But, as noted, Cygwin's role was mostly displaced by WSL, which
effectively runs a Linux userland on Windows.

There was WSL1, which basically mapped Linux syscalls over to the
Windows kernel, and WSL2, which runs the Linux kernel in a VM.

Though, in my case I was using WSL1 as seemingly MS had decided that my
PC can't do virtualization (and sees it as necessary for WSL2), even
despite having a CPU that can do so, and it is enabled in the BIOS.

For BGBCC, it can build both on native Windows and on Linux/WSL
(though recently noted that this build was broken, mostly by GCC and
Clang being more pedantic about missing prototypes, and a few
prototypes were being missed by my function-prototype mining tool).
Went and fixed this, but haven't posted this yet.

As for optimizing in MSVC, yeah, it is in the area of not terrible,
but not super clever either.

If one expects the sort of high-level code-rewriting cleverness that
GCC or Clang often does, one will be disappointed.

But, sometimes, the main "heavy hitter" optimizations are things like
constant-folding and register allocation, which it does do effectively.

Though, both MSVC and BGBCC seem to use one sort of strategy for
register allocation:

Static assign things to callee-save registers and use remaining
registers for dynamic allocation within basic-blocks. Variables with
finite non-overlapping lifetimes (that do not cross basic-block
boundaries) may potentially share a register (this more generally
applies to things like temporaries).

And, GCC and Clang use another: Assign dynamically but carry values
across basic-block boundaries along control-flow paths.

Both tend to give different patterns though, and seem to favor
different types of code.

[...]

(I could also note that I make heavy use of templates in C++ code -
it often leads to smaller and faster results.)

Curious...

I had tended to use the "write everything one off for the task at
hand" approach, but this is a higher-effort approach.

A lot of code tends to fall into the category of shuffling data around
or doing simple checks or conversions.� It's also common to have wrapper functions for libraries to get something nicer, safer and more
convenient than some API that belongs in the early 1990's.� Good C++ templates (and sometimes even good macros in C) can make the use of
these things far nicer, and most of the code that the templates appear
to generate inline in the caller disappears in optimisation.

OK.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Sunday, May 31, 2026 19:11:57

In article <10vemqf$r5qe$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 30/05/2026 13:29, Dan Cross wrote:

In article <10vd1tu$ekvl$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 29/05/2026 21:56, Keith Thompson wrote:

[snip]
Upthread, you asked a question:

And then the point becomes, if you always add the parentheses, what >>>> was the point of having that particular precedence level?

You've made it clear that you were never interested in an answer.

You said this:

"You're asking why C is designed the way it is. We could waste a
great deal of time and effort answering that for you. There are
numerous documents about the design and history of C, and of
its ancestor languages. I could provide you with links."

Actually I'm not asking why C is like that. We're already there.

I'm saying that there is no value in those extra levels, some people
think is, and I'm arging about that. I was replying to tTh.

As for my question, what /is/ the point? I'm still waiting!

To clarify: the question is, what is the point of those levels?

How is that different from asking "why C is like that"?

My question is actually independent of C or its history.

I accept those levels exist. I was asking do they currently serve a
useful purpose.

That is a distinction without a difference: I do not see how the
two can be separated from one another.

The useful purpose the C rules serve is allowing existing code
to compile unmodified; the reason that existing code was written
that way is because that's how the language was defined; the
language was defined that way due to the aforementioned history.

If not, people can choose to ignore those them when writing C code, for >example like this where all () are technically superfluous:

crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

And they can choose to not adopt them when devising new languages,
however many still do faithfully recreate the same pattern, with a few >notable exceptions such as Go lang.

Languages should, presumably, do what makes sense for them.
Lots of languages echo parts of C's syntax where that has proven
to be convenient and popular; curly braces for grouping might be
an example there. Others have not, or have purposely discarded
parts of C syntax that have proven awkward or unpopular. An
example there might be the variable declaration syntax, or the
structure of `typedef`.

I can't think of many languages that keep the exact same parsing
rules with respect to operator precedence. You mentioned Go;
neither Rust nor Zig follow C's rules, either.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Sunday, May 31, 2026 19:34:00

In article <10vhq39$1lpo1$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood. To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

I was working on some code in a Unix-like kernel the other day
where the original author wrote, `if ((a == 0) && (b == 1))`
type expressions. The inner parentheses were totally
superfluous. I removed them.

As Tim wrote, there's obviously a balance to be struck between
excessive verbosity and extreme concision. Over time,
programmers working in a language (or a code base) do tend to
internalize that some operations are more frequently
misunderstood than others, and parenthesize accordingly.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 22:14:48

On 31/05/2026 20:25, BGB wrote:

On 5/31/2026 4:14 AM, David Brown wrote:

On 30/05/2026 22:48, BGB wrote:

On 5/30/2026 6:52 AM, David Brown wrote:

On 29/05/2026 22:16, BGB wrote:

On 5/29/2026 6:22 AM, David Brown wrote:

On 29/05/2026 12:20, BGB wrote:

On 5/29/2026 2:52 AM, Janis Papanagnou wrote:

On 2026-05-28 11:57, BGB wrote:

On 5/28/2026 2:18 AM, Janis Papanagnou wrote:

On 2026-05-28 01:49, BGB wrote:

[...]

Also functions with large static arrays.

void SomeFunc()
{
�� static char buf[4096];
�� ...
}

Where, say, eliminating SomeFunc does not necessarily eliminate buf.

Yes, if you have such code but want to eliminate it, then -fdata-
sections would definitely benefit.� I have not seen such code in
practice (at least not with very big static arrays, and that also was
not an essential part of the program).� But of course I have only seen
a microscopic part of all C code written - if you come across this
sort of thing, then I appreciate your point.

(There are several ways to make this more "friendly" to builds that
need to be compact, such as putting the buffer and/or SomeFunc in a
separate file or giving it a specific section of its own.)

I have seen this pattern sometimes, though usually in "medium old" code, with newer code more often assuming that the stack is really big and so
can handle putting 1MB or more in a local array. Though, this is not
great on a target which doesn't have a huge stack.

In my case, I usually had 128K as the default stack size in my project.

OK. My code typically has a stack of 1 KB or less per thread. It is
not inconceivable that I would have a static array like this, but it
would not be in code that is likely to be unused.

Where section anchors shine - and where -fdata-sections therefore
has cost - is when a function needs to access more than one piece of
static lifetime data defined in the same translation unit (or
another translation unit if you are using LTO).� That happens a lot
in embedded ARM programming at least.� I don't know about RISC-V.
If the target normally uses a "small data section" for ram (I know
this is common on PowerPC), then there is, in effect, a program-wide
section anchor already.� So it is possible that it relatively few
targets have section anchors - but the 32-bit ARM on gcc is a vastly
popular choice in the embedded world, so it is important to
understand the cost of this compiler flag for that target at least.

It depends on the way it is built.

A lot of times though (for non-relocatable static-linked binaries) it
mostly tends to use AUIPC+LD or AUIPC+ST pairs to access global
variables. There is a Global Pointer that needs to be loaded when the
binary is started, unclear what it is used for exactly.

If you have a global pointer, then it will probably be used for
gp+offset access to global data, eliminating the need for section
anchors.

I have not used RISC-V, and am not familiar with its details.� I can
see from godbolt that when -fdata-sections is in action and you are
loading from static lifetime variables, the compiler generates
instructions like

��lw a5, a_variable
��lw a4, b_variable
��lw a0, c_variable

When you do not have "-fdata-sections", it uses anchors :

��lla a4, .LANCHOR0
��lw a5, 0(a4)
��lw a3, 4(a4)
��lw a0, 8(a4)

�From my (limited) understanding, RISC-V cannot use 32-bit absolute
addressing.� So the "lw a5, a_variable" must be a pseudo-instruction -
using register + offset addressing.� If there is a global pointer,
then presumably that is used here.� Alternatively, the pseudo
instruction might assemble to two real instruction to support the 32-
bit address.� I know both techniques are used in some targets, but
don't know about RISC-V.

It can use one of two strategies for these (after breaking up pseudo- instructions):
� LUI�� a5, HiAddr�� //Abs32, Low 2GB only
� LW�� a5, LoAddr(a5)
Or:
� AUIPC� a5, HiAddr�� //PC-Rel
� LW�� a5, LoAddr(a5)

IIRC, LLA is similar, just using an ADDI as the second instruction.
But, yeah, the latter sequence would be more efficient.

Thanks. That clears things up for me. And in particular, it shows that section anchors (and therefore no "-fdata-sections") can make a
significant difference to gcc code for RISC-V.

I would expect something different if building with -fPIC or -fPIE, but
this depends on if it is a version of GCC built with support for these
(if using a version of GCC built for non-hosted targets, it ignores
these). Where, one effectively needs different GCC builds for bare-metal (like OS kernels) and for hosted Linux development, for whatever bizarre reason...

Certainly it would surprise me if the "lw a5, a_variable" version were
more efficient than using anchors - otherwise why would gcc generate
code with anchors when given a free choice?� (Perhaps gcc is not well
tuned for RISC-V code generation - I am wary of making too many
assumptions about the processor just from some simple compiler outputs.)

It is not, it is a 2-op sequence usually.

Plain RISC-V has a bigger problem with 64-bit constants though,
generally needs to either load these from memory (more typical in GCC)
or build them in-place (which needs roughly 6 instructions in RISC-V).

Say (possible, but GCC doesn't do this):
� LUI�� t0, ValHiA
� LUI�� t1, valHiB
� ADDI� t0, t0, valLoA
� ADDI� t1, t1, valLoB
� SLLI� t1, t1, 32
� ADD�� a0, t0, t1

In my case, I have extensions for RV that can turn a lot of this stuff
into single instructions (albeit with larger 8 and 12 byte encodings).

In some cases, it can save bytes, for example:
� LW�� a1, Disp33s(a0)
As a 64-bit / 8-byte encoding, vs:
� LUI� t0, DispHi
� ADD� t0, t0, a0
� LW�� a1, DispLo(a0)
Needing 12 bytes.

My own (more drastic) extensions can save more, by having a few Disp16 instructions, which can access 256K or 512K past GP within a single 32-
bit instruction.

But, if/when any of this would end up in mainline RISC-V is uncertain. Weirdly, there is a lot more emphasis there on big/fancy features (with niche applicability), rather than on smaller things that can improve the properties of the base ISA (and that could more generally benefit nearly
all code built for the ISA).

[...]

Cygwin has its own wide range of complications.� If you want to use
gcc targeting native Windows, msys2 and mingw-64 are probably your
best bet, either compiled natively under msys2 or as a cross-compile
from Linux. But don't place too much emphasis on my advice, as I very
rarely compile C or C++ code for Windows - most of my PC target (Linux
or Windows) coding is in Python.

Yes, I had used MinGW for a while, before mostly moving over to MSVC for native Windows.

The tradeoff is mostly:
MinGW is closer to native for Windows;
Cygwin could give a closer approximation of Linux on Windows, so one can build a lot of Linux software and use "./configure" scripts and similar.

Note that MinGW and Mingw-w64 are very, very different. (And the corresponding environments and utility collections, msys and msys2, are equally different.) Mingw-w64, as I understand it, is somewhat of a
balance between old MinGW and Cygwin in being close to native for most purposes, but providing more POSIX compliance than MinGW. It is also
much newer, much better maintained, with modern language support in its
tools (last I heard, with MinGW you did not even get C99 support in the standard library). And of course it has 64-bit support.

You may well find WSL or MSVC to be a better choice for your
requirements, but don't mistake Mingw-w64 for MinGW.

But, as noted, Cygwin's role was mostly displaced by WSL, which
effectively runs a Linux userland on Windows.

There was WSL1, which basically mapped Linux syscalls over to the
Windows kernel, and WSL2, which runs the Linux kernel in a VM.

Though, in my case I was using WSL1 as seemingly MS had decided that my
PC can't do virtualization (and sees it as necessary for WSL2), even
despite having a CPU that can do so, and it is enabled in the BIOS.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, May 31, 2026 22:24:51

On 31/05/2026 18:46, James Kuyper wrote:

On 2026-05-31 11:35, David Brown wrote:
...

Usually, both sub-expressions of a binary operator will be evaluated
before the operator itself, simply because usually the results of the
operator cannot be calculated until the sub-expression's values are
known. But this is not a requirement of the language

"The value computations of the operands of an operator are sequenced
before the value computation of the result of the operator." (6.5.1p3)

- if the compiler

can get the same results without doing so, it is free to pick a
different order.

Correct - but "same results" is crucial; it allows you to invoke the
"as-if" rule. Otherwise, the sequencing specified by 6.5.1p3 must be
honored.

OK.

...

If an implementation provides additional semantics to signed integer
arithmetic, such as saturating or trapping overflow, then signed integer
arithmetic operations are no longer associative. But normal C undefined
behaviour on overflow is fully associative (as is wrapping semantics,
for addition, subtraction and multiplication).

I don't follow that. I believe that overflow is guaranteed for (5 +
INT_MAX) + INT_MIN, and completely avoided by 5 + (INT_MAX + INT_MIN),
which differ only by association. Are you saying they both have the
same chance of overflowing?

No - I see now what you are saying. Overflow is never guaranteed to do anything, including to exist, because it is UB. So the compiler can
happily treat "(5 + INT_MAX) + INT_MIN" as though you had written "5 + (INT_MAX + INT_MIN)". It can freely re-arrange an expression like this
that has a potential overflow into one without risk of overflow, as long
as the same results are given for all values that do not overflow. (The overflow is not part of the observable behaviour.) But it cannot
re-arrange the other way unless it knows that intermediary overflows
have no effect. (And the compiler usually does know this.)

What I am trying to say - but described inaccurately - is that
expressions can be re-arranged by the compiler without preserving
overflow behaviour, but it must avoid introducing /new/ overflow risks
if they can affect the results. It may, however, introduce new
intermediary overflows if they do not affect the results.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Sunday, May 31, 2026 18:26:53

On 2026-05-31 16:24, David Brown wrote:

On 31/05/2026 18:46, James Kuyper wrote:

On 2026-05-31 11:35, David Brown wrote:

...

If an implementation provides additional semantics to signed integer
arithmetic, such as saturating or trapping overflow, then signed integer >>> arithmetic operations are no longer associative. But normal C undefined >>> behaviour on overflow is fully associative (as is wrapping semantics,
for addition, subtraction and multiplication).

I don't follow that. I believe that overflow is guaranteed for (5 +
INT_MAX) + INT_MIN, and completely avoided by 5 + (INT_MAX + INT_MIN),
which differ only by association. Are you saying they both have the
same chance of overflowing?

No - I see now what you are saying. Overflow is never guaranteed to do anything, including to exist, because it is UB. So the compiler can

I only meant that overflow was guaranteed, and that the behavior was
therefore guaranteed to be undefined. I didn't mean to imply that any particular behavior was guaranteed.

happily treat "(5 + INT_MAX) + INT_MIN" as though you had written "5 + (INT_MAX + INT_MIN)". It can freely re-arrange an expression like this
that has a potential overflow into one without risk of overflow, as long
as the same results are given for all values that do not overflow. (The overflow is not part of the observable behaviour.) But it cannot
re-arrange the other way unless it knows that intermediary overflows
have no effect. (And the compiler usually does know this.)

That's what I was mainly concerned about - if I've carefully arranged to
make sure that overflow is impossible, I'd be rather upset by a compiler
which, because "normal C undefined behaviour on overflow is fully
associative", rearranges the associations in my code to make overflow
possible. I interpreted that comment as meaning that "whether or not the behavior is undefined is fully associative". I guess that what you
actually meant was "if the behavior is undefined, the compiler is free
to rearrange the associations".

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Sunday, May 31, 2026 15:54:47

David Brown <david.brown@hesbynett.no> writes:

On 31/05/2026 16:24, James Kuyper wrote:

On 2026-05-31 07:18, David Brown wrote:

[...]

People might think they affect the order of evaluation, such as when you >>> have function calls :

u = foo(x) + (foo(y) + foo(z));

Some people might think the use of parentheses means that "foo(y)" and
"foo(z)" are called before "foo(x)", when the order of all these calls
(and the additions) is unspecified. (Again, a given compiler might be
influenced by the parentheses, but the language does not require it.

You're correct with regard to the function calls, but the
parenthesized addition must be performed first, and the other one
second, which may make a difference, for the same reasons given in my
previous paragraph.

The parentheses do not dictate the order of evaluation. But you are
correct - and it's worth pointing out, so thank you for doing that -
that for floating point operations, the grouping of operations can
affect the result.

The parentheses do not dictate the order of evaluation *of the
operands*. Each "+" can be evaluated (the addition performed)
only after the values of its operands are known. But regardless
of parentheses or operator precedence, the three operands foo(x),
foo(y), and foo(z) can be evaluated in any of 6 possible orders.
(It's different when you have operations like "&&", "||", and ",",
which imposes additional sequence points.)

If you are talking about floating point arithmetic (I was thinking of
integer arithmetic, but did not specify), then the operations are not necessarily commutative or associative, and the compiler cannot then re-arrange the operations unless it knows that doing so does not
affect the result.

It's not just floating-point. Signed integer overflow is also relevant.

(INT_MIN + INT_MAX) + 1 is well defined. (INT_MIN + INT_MAX) +1
is equivalent, and is also well defined. INT_MIN + (INT_MAX +1)
has undefined behavior.

But except for specific cases, the order of evaluation - both for the
values and side-effects - of sub-expressions is unspecified. Indeed,
they are unsequenced - the evaluations can interleave.

Usually, both sub-expressions of a binary operator will be evaluated
before the operator itself, simply because usually the results of the operator cannot be calculated until the sub-expression's values are
known. But this is not a requirement of the language - if the
compiler can get the same results without doing so, it is free to pick
a different order. "(a + b) * 0" does not need to evaluate "a", "b",
or "a + b" at all unless there is a possibility of a side-effect - and
it can perform the side-effects in any order. "a + (b + c)" can check
"a" for a trap representation and deal with that before looking at "b"
and "c" or the results of "b + c", even though it cannot (for floating
point operations) re-arrange the code to do "a + b" first.

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Sunday, May 31, 2026 16:08:04

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In [...] early C, `|` and `&` were logical operators. The
short-circuiting `||` and `&&` came later, but the usage low
precedence for `|` and `&` was already baked in.

That's the point: the precedence reflects the original use as
boolean operators, not how things evolved for use almost purely
as bitwise operators.

Surely even in pre-K&R C the & and | operators were used for
bitwise-and and bitwise-or as well as logical connectors.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Sunday, May 31, 2026 16:32:17

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In [...] early C, `|` and `&` were logical operators. The
short-circuiting `||` and `&&` came later, but the usage low
precedence for `|` and `&` was already baked in.

That's the point: the precedence reflects the original use as
boolean operators, not how things evolved for use almost purely
as bitwise operators.

Surely even in pre-K&R C the & and | operators were used for
bitwise-and and bitwise-or as well as logical connectors.

They were used for both (and that was the problem).

The "original use" being referred to is in BCPL and B, and in *very*
early C.

Reference:
https://www.nokia.com/bell-labs/about/dennis-m-ritchie/chist.pdf

Neonatal C

Rapid changes continued after the language had been named,
for example the introduction of the && and || operators. In
BCPL and B, the evaluation of expressions depends on context:
within if and other conditional statements that compare an
expression?s value with zero, these languages place a special
interpretation on the and (&) and or (|) operators. In ordinary
contexts, they operate bitwise, but in the B statement

if (e1 & e2) ...

the compiler must evaluate e1 and if it is non-zero, evaluate e2,
and if it too is non-zero, elaborate the statement dependent on
the if. The requirement descends recursively on & and | operators
within e1 and e2. The short-circuit semantics of the Boolean
operators in such ?truth-value? context seemed desirable,
but the overloading of the operators was difficult to explain
and use. At the suggestion of Alan Snyder, I introduced the &&
and || operators to make the mechanism more explicit.

Their tardy introduction explains an infelicity of C?s
precedence rules. In B one writes

if (a==b & c) ...

to check whether a equals b and c is non-zero; in such a
conditional expression it is better that & have lower precedence
than ==. In converting from B to C, one wants to replace & by
&& in such a statement; to make the conversion less painful,
we decided to keep the precedence of the & operator the same
relative to ==, and merely split the precedence of && slightly
from &. Today, it seems that it would have been preferable to
move the relative precedences of & and ==, and thereby simplify
a common C idiom: to test a masked value against another value,
one must write

if ((a&mask) == b) ...

where the inner parentheses are required but easily forgotten.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Sunday, May 31, 2026 17:12:23

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In [...] early C, `|` and `&` were logical operators. The
short-circuiting `||` and `&&` came later, but the usage low
precedence for `|` and `&` was already baked in.

That's the point: the precedence reflects the original use as
boolean operators, not how things evolved for use almost purely
as bitwise operators.

Surely even in pre-K&R C the & and | operators were used for
bitwise-and and bitwise-or as well as logical connectors.

They were used for both [...]

That's all I was saying.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lawrence D?Oliveiro@3:633/10 to All on Monday, June 01, 2026 00:33:03

On Sun, 31 May 2026 10:59:42 +0100, Bart wrote:

How about when you intend it to be: '(a << b) + c'?

I gave real-world examples of the usage that you asked for, how about
you do the same?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Monday, June 01, 2026 02:26:17

On 01/06/2026 01:33, Lawrence D?Oliveiro wrote:

On Sun, 31 May 2026 10:59:42 +0100, Bart wrote:

How about when you intend it to be: '(a << b) + c'?

I gave real-world examples of the usage that you asked for, how about
you do the same?

Can do, but they wouldn't be in C. The problems are when I port to C or
port from C or simply try to understand it.

Examples:

hsum := hsum << 4 - hsum + c

lxvalue := lxvalue << 8 + (pstart+i-1)^

macro makemodrm(mode, opc, rm) = mode<<6 + opc<<3 + rm

genxrm(0xD9 + mf << 1, code, a)

am.sib := scaletable[scale]<<6 + index<<3 + base

p++^ := r>>5<<5 + g>>5<<2 + b>>6

scale := (sib>>6 + 1| 1, 2, 4, 8 |0)

hdr.usedc[i+1] := t>>4 + 1

index := r<<5 + g<<2 + b

rgb := b<<16 + g<<8 + r

shortopc := ttt<<3 + rmopc

Here, '<< >>' have same precedence as '* /'.

Notice I like to use '+' rather than '|', which in my syntax is 'ior'.
But '+ -' and 'iand ior ixor' all have the same precedence so I'd never
have to worry about it anyway

In C, '+ -' and '& | ^' are on opposite sides of '<< >>'.

And yes sometime I need to use parentheses to override; it is no big deal.

Generally, C seems to need at least 20% more parentheses (as a
proportion of all tokens) than code written in my syntax despite all
these extra levels to help you write fewer.

Bear in mind that C uses {...} to enclose data where I'd need to use
(...), but those {} aren't counted.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Sunday, May 31, 2026 19:10:13

Bart <bc@freeuk.com> writes:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood. To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

And then there is ?: :

a > b ? c : d # (a>b)?c:d
a + b ? c : d # (a+b)?c:d

The grouping of the first is probably what is intended. But in the
second, the intent might have been (a+b)?c:d, or a+(b?c:c); we don't
know for sure that the author didn't make a mistake or we don't know outselves.

This example is so addlebrained that it's hard to imagine anyone
being confused about it. Or that it's worth any expenditure of
thought wondering what to do about people who are.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 01, 2026 08:28:53

On 01/06/2026 00:26, James Kuyper wrote:

On 2026-05-31 16:24, David Brown wrote:

On 31/05/2026 18:46, James Kuyper wrote:

On 2026-05-31 11:35, David Brown wrote:

...

If an implementation provides additional semantics to signed integer
arithmetic, such as saturating or trapping overflow, then signed integer >>>> arithmetic operations are no longer associative. But normal C undefined >>>> behaviour on overflow is fully associative (as is wrapping semantics,
for addition, subtraction and multiplication).

I don't follow that. I believe that overflow is guaranteed for (5 +
INT_MAX) + INT_MIN, and completely avoided by 5 + (INT_MAX + INT_MIN),
which differ only by association. Are you saying they both have the
same chance of overflowing?

No - I see now what you are saying. Overflow is never guaranteed to do
anything, including to exist, because it is UB. So the compiler can

I only meant that overflow was guaranteed, and that the behavior was therefore guaranteed to be undefined. I didn't mean to imply that any particular behavior was guaranteed.

That's a useful distinction.

happily treat "(5 + INT_MAX) + INT_MIN" as though you had written "5 +
(INT_MAX + INT_MIN)". It can freely re-arrange an expression like this
that has a potential overflow into one without risk of overflow, as long
as the same results are given for all values that do not overflow. (The
overflow is not part of the observable behaviour.) But it cannot
re-arrange the other way unless it knows that intermediary overflows
have no effect. (And the compiler usually does know this.)

That's what I was mainly concerned about - if I've carefully arranged to
make sure that overflow is impossible, I'd be rather upset by a compiler which, because "normal C undefined behaviour on overflow is fully associative", rearranges the associations in my code to make overflow possible.

I am quite happy for the compiler to make such re-arrangements, as long
as it knows that doing so gives the same results on the target. I too
would be most upset if it made these re-arrangements when I had the flag "-fsanitize=signed-overflow" (or equivalent on other compilers) in
action and halted my program when it "discovered" and overflow bug. But
I am quite happy for it to make these re-arrangements for the code it generates - I am always happy with more efficient object code that
follows the "as if" rule.

(If the expression is floating point, or the target has unusual
capabilities, so that an overflow is detectable then of course such re-arrangements are not valid as they would break the "as-if" rule.)

I interpreted that comment as meaning that "whether or not the
behavior is undefined is fully associative". I guess that what you
actually meant was "if the behavior is undefined, the compiler is free
to rearrange the associations".

That is better phrasing than I used. Thanks.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 01, 2026 08:39:03

On 01/06/2026 00:54, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 31/05/2026 16:24, James Kuyper wrote:

On 2026-05-31 07:18, David Brown wrote:

[...]

People might think they affect the order of evaluation, such as when you >>>> have function calls :

u = foo(x) + (foo(y) + foo(z));

Some people might think the use of parentheses means that "foo(y)" and >>>> "foo(z)" are called before "foo(x)", when the order of all these calls >>>> (and the additions) is unspecified. (Again, a given compiler might be >>>> influenced by the parentheses, but the language does not require it.

You're correct with regard to the function calls, but the
parenthesized addition must be performed first, and the other one
second, which may make a difference, for the same reasons given in my
previous paragraph.

The parentheses do not dictate the order of evaluation. But you are
correct - and it's worth pointing out, so thank you for doing that -
that for floating point operations, the grouping of operations can
affect the result.

The parentheses do not dictate the order of evaluation *of the
operands*. Each "+" can be evaluated (the addition performed)
only after the values of its operands are known. But regardless
of parentheses or operator precedence, the three operands foo(x),
foo(y), and foo(z) can be evaluated in any of 6 possible orders.
(It's different when you have operations like "&&", "||", and ",",
which imposes additional sequence points.)

Yes. And I have seen code where the author believed that the
parentheses /did/ affect the order of evaluation of the "foo" calls. It
is definitely a misunderstanding people can make, though I of course
have no idea how often people make it.

If you are talking about floating point arithmetic (I was thinking of
integer arithmetic, but did not specify), then the operations are not
necessarily commutative or associative, and the compiler cannot then
re-arrange the operations unless it knows that doing so does not
affect the result.

It's not just floating-point. Signed integer overflow is also relevant.

(INT_MIN + INT_MAX) + 1 is well defined. (INT_MIN + INT_MAX) +1
is equivalent, and is also well defined. INT_MIN + (INT_MAX +1)
has undefined behavior.

Compilers can re-arrange integer arithmetic, despite new overflows, if
they know the result is the same. On pretty much any current processor,
a compiler generating code for integer "a + b + c" could do the
additions in any order - treating the operations as commutative and
fully associative. The final result will be the same in every case
where the original expression did not overflow (i.e., every case with
defined behaviour).

If the implementation makes overflow detectable in some way (such as by "-fsanitize=signed-arithmetic-overflow"), or the hardware does something
that gives different results from overflow (saturating, hardware traps),
then it's an entirely different matter.

But except for specific cases, the order of evaluation - both for the
values and side-effects - of sub-expressions is unspecified. Indeed,
they are unsequenced - the evaluations can interleave.

Usually, both sub-expressions of a binary operator will be evaluated
before the operator itself, simply because usually the results of the
operator cannot be calculated until the sub-expression's values are
known. But this is not a requirement of the language - if the
compiler can get the same results without doing so, it is free to pick
a different order. "(a + b) * 0" does not need to evaluate "a", "b",
or "a + b" at all unless there is a possibility of a side-effect - and
it can perform the side-effects in any order. "a + (b + c)" can check
"a" for a trap representation and deal with that before looking at "b"
and "c" or the results of "b + c", even though it cannot (for floating
point operations) re-arrange the code to do "a + b" first.

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

Sure. (And it's a good point to make.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 01, 2026 09:52:08

On 31/05/2026 19:11, Bart wrote:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood.� To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading.� Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

Any source code written in LISP :-)

(And for too few parentheses, any source code in Forth.)

From a quick grep of an SDK in a project I am working on, I saw this
example :

if ((((pData1 == NULL) || (pData2 == NULL))) || (Length == 0U))

The number of parentheses there is so high it's hard to see that not
only is there an unnecessary extra parentheses for the first ||
operator, but there is a second set of extra parentheses around it. Eliminating these would give :

if ((pData1 == NULL) || (pData2 == NULL) || (Length == 0U))

or, with an extra space for clarity,

if ( (pData1 == NULL) || (pData2 == NULL) || (Length == 0U) )

That still leaves extra parentheses around the equality operators, but
the decision to keep or remove them is subjective (as is the choice of
"pData1 == NULL" vs. "!pData1").

But IMHO, the original line had at least two sets of completely
redundant and unhelpful parentheses which made it harder to read - the
reader is left wondering whether these parentheses are there for a
purpose and have an effect on what should have been a simple and clear expression.

The SDK also contains examples of parentheses used because it mixes
relatively rare operators (shifts and binary operators). Parentheses
around such sub-expressions are not uncommon, and can definitely be
helpful, but the quantity here makes things hard to read. Ironically,
though it is a macro, there are not "safety" parentheses around the
argument in the expression.

And yes, these really are the names of the macro in this code.

#define CONVERTARGB88882ARGB4444(Color) \
((((Color & 0xFFU) >> 4) & 0xFU) |\
(((((Color & 0xFF00U) >> 8) >> 4) & 0xFU) << 4) |\
(((((Color & 0xFF0000U) >> 16) >> 4) & 0xFU) << 8) | \
(((((Color & 0xFF000000U) >> 24) >> 4) & 0xFU) << 12))

#define CONVERTRGB5652ARGB8888(Color) \
(((((((Color >> 11) & 0x1FU) * 527) + 23) >> 6) << 16) |\
((((((Color >> 5) & 0x3FU) * 259) + 33) >> 6) << 8) |\
((((Color & 0x1FU) * 527) + 23) >> 6) | 0xFF000000)

It can be argued that the parentheses themselves are not the problem
here - it is doing too much in one expression. Static inline functions
would make things clearer, as would a separation of the steps of
breaking down the original colour format into parts, scaling or
conversions, then building up the new colour format. Different named
types for the different formats would go a long way towards usability
and safety - at least using typedefs, but preferably using structs to
make real different types. And surely nicer names could have been found!

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 01, 2026 02:33:39

David Brown <david.brown@hesbynett.no> writes:

On 01/06/2026 00:54, Keith Thompson wrote:

[...]

(INT_MIN + INT_MAX) + 1 is well defined. (INT_MIN + INT_MAX) +1
is equivalent, and is also well defined. INT_MIN + (INT_MAX +1)
has undefined behavior.

Oops, I forgot to delete some parentheses. I meant to write that
INT_MIN + INT_MAX + 1 is equivalent to (INT_MIN + INT_MAX) + 1.
The redundant parentheses don't impose any change in semantics.

Compilers can re-arrange integer arithmetic, despite new overflows, if
they know the result is the same. On pretty much any current
processor, a compiler generating code for integer "a + b + c" could do
the additions in any order - treating the operations as commutative
and fully associative. The final result will be the same in every
case where the original expression did not overflow (i.e., every case
with defined behaviour).

Right, good point. Since (INT_MIN + INT_MAX) + 1 is well defined,
if a compiler rearranges it so it evaluates INT_MAX + 1 as an
intermediate result, that's permitted **if** the result is the same.
(It makes more sense if the operands are variables with those values
rather than constants.) A compiler can take advantage of how the
hardware works. UB applies to the C source code, not (necessarily)
to the operations that are performed by the actual machine. If it
generates code that yields the correct result by consulting a Ouija
board, that's still conforming.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 01, 2026 02:42:20

David Brown <david.brown@hesbynett.no> writes:

On 31/05/2026 19:11, Bart wrote:

[...]

Actual examples of too many parentheses?

Any source code written in LISP :-)

(And for too few parentheses, any source code in Forth.)

From a quick grep of an SDK in a project I am working on, I saw this
example :

if ((((pData1 == NULL) || (pData2 == NULL))) || (Length == 0U))

The number of parentheses there is so high it's hard to see that not
only is there an unnecessary extra parentheses for the first ||
operator, but there is a second set of extra parentheses around
it. Eliminating these would give :

if ((pData1 == NULL) || (pData2 == NULL) || (Length == 0U))

or, with an extra space for clarity,

if ( (pData1 == NULL) || (pData2 == NULL) || (Length == 0U) )

That still leaves extra parentheses around the equality operators, but
the decision to keep or remove them is subjective (as is the choice of "pData1 == NULL" vs. "!pData1").

Yeah, I'd write that as

if (pData1 == NULL || pData2 == NULL || Length == 0U)

The fact that || binds more loosely than == is one of those things
that I arbitrarily find sufficiently intuitive.

[...]

And yes, these really are the names of the macro in this code.

#define CONVERTARGB88882ARGB4444(Color) \
((((Color & 0xFFU) >> 4) & 0xFU) |\
(((((Color & 0xFF00U) >> 8) >> 4) & 0xFU) << 4) |\
(((((Color & 0xFF0000U) >> 16) >> 4) & 0xFU) << 8) | \
(((((Color & 0xFF000000U) >> 24) >> 4) & 0xFU) << 12))

#define CONVERTRGB5652ARGB8888(Color) \
(((((((Color >> 11) & 0x1FU) * 527) + 23) >> 6) << 16) |\
((((((Color >> 5) & 0x3FU) * 259) + 33) >> 6) << 8) |\
((((Color & 0x1FU) * 527) + 23) >> 6) | 0xFF000000)

In a macro definition, I'd parenthesize each occurrence of Color,
in case the argument is a more complicated expression, as well as parenthesizing the entire definition (the latter was done here).
The rest of the parentheses feel excessive, but I frankly can't be
bothered to figure out which can be omitted without hurting clarity.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Monday, June 01, 2026 11:12:00

On 01/06/2026 03:10, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood. To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

My point was that it could be objective, at least for too many. So (a*a)
+ (b*b) would be commonly agreed to have too many, and I was extending
that to other examples in computing.

And then there is ?: :

a > b ? c : d # (a>b)?c:d
a + b ? c : d # (a+b)?c:d

The grouping of the first is probably what is intended. But in the
second, the intent might have been (a+b)?c:d, or a+(b?c:c); we don't
know for sure that the author didn't make a mistake or we don't know
outselves.

This example is so addlebrained that it's hard to imagine anyone
being confused about it. Or that it's worth any expenditure of
thought wondering what to do about people who are.

I don't understand what the problem is with my examples. There can be ambiguity in the mind of the person looking at such code as to how the
first terms are grouped.

These are more or less real examples, I just simplified the terms. Here
are some from MZLIB:

return (status == MZ_OK) ? MZ_BUF_ERROR : status;

return (pL == pE) ? (l_len < r_len) : (l < r);

sym = (match_dist < 512) ? s0 : s1;

return ((pState->m_last_status == TINFL_STATUS_DONE) && (!pState->m_dict_avail)) ? MZ_STREAM_END : MZ_OK;

I believe that in the first three, all parentheses are superflous, but
they are used anyway. Why is that?

(My preferences for ?: are that the whole thing is syntax, outside of
the precedence scheme, and that it has mandatory parentheses. That
second line would then look like this:

return (pL == pE ? l_len < r_len : l < r);

There are fewer parentheses in all, and less potential confusion. You
can even have assignments in each branch; they will not interfere with ?:.)

As for the last one, I haven't figured it out yet. But simplifying the
terms:

return ((a == b) && (!c)) ? d : e;

then the same applies: this could be:

return a == b && !c ? d : e;

However, I had to confirm this by comparing the ASTs for both.

I'd say that MZLIB is doing the right thing by not being too clever.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 01, 2026 12:36:39

On 01/06/2026 12:12, Bart wrote:

On 01/06/2026 03:10, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood.� To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading.� Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

My point was that it could be objective, at least for too many. So (a*a)
+ (b*b) would be commonly agreed to have too many, and I was extending
that to other examples in computing.

No, it is all still subjective. But the more levels of parentheses, the
more consensus you are likely to get on the subjective opinions.

To be "objective", you would have to have some kind of measure, with statistically significant results. If someone were to conduct a survey
and measure the accuracy and thinking time for people to understand expressions written in different ways with different levels of
parentheses, then there would be a basis for calling things "objective".

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Monday, June 01, 2026 11:47:04

On 01/06/2026 08:52, David Brown wrote:

On 31/05/2026 19:11, Bart wrote:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood.� To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading.� Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

Any source code written in LISP :-)

(And for too few parentheses, any source code in Forth.)

From a quick grep of an SDK in a project I am working on, I saw this example :

��if ((((pData1 == NULL) || (pData2 == NULL))) || (Length == 0U))

The number of parentheses there is so high it's hard to see that not
only is there an unnecessary extra parentheses for the first ||
operator, but there is a second set of extra parentheses around it. Eliminating these would give :

��if ((pData1 == NULL) || (pData2 == NULL) || (Length == 0U))

or, with an extra space for clarity,

��if ( (pData1 == NULL) || (pData2 == NULL) || (Length == 0U) )

That still leaves extra parentheses around the equality operators, but
the decision to keep or remove them is subjective (as is the choice of "pData1 == NULL" vs. "!pData1").

Maybe it's due to || being a symbol; compare:

if (pData1 == NULL || pData2 == NULL || Length == 0U)

if (pData1 == NULL or pData2 == NULL or Length == 0U)

To me, || seems to draw in the terms on either side as strongly as ==.
That happens less using 'or'.

(Both are valid C if using iso646.h.)

But IMHO, the original line had at least two sets of completely
redundant and unhelpful parentheses which made it harder to read - the reader is left wondering whether these parentheses are there for a
purpose and have an effect on what should have been a simple and clear expression.

The pattern seems to be '((a || b)) || c) || d' so maybe the author
didn't understand that || is parsed LTR anyway.

The SDK also contains examples of parentheses used because it mixes relatively rare operators (shifts and binary operators).� Parentheses
around such sub-expressions are not uncommon, and can definitely be
helpful, but the quantity here makes things hard to read.� Ironically, though it is a macro, there are not "safety" parentheses around the
argument in the expression.

And yes, these really are the names of the macro in this code.

#define CONVERTARGB88882ARGB4444(Color) \
��((((Color & 0xFFU) >> 4) & 0xFU) |\
��(((((Color & 0xFF00U) >> 8) >> 4) & 0xFU) << 4) |\
��(((((Color & 0xFF0000U) >> 16) >> 4) & 0xFU) << 8) | \
��(((((Color & 0xFF000000U) >> 24) >> 4) & 0xFU) << 12))
#define CONVERTRGB5652ARGB8888(Color) \
��(((((((Color >> 11) & 0x1FU) * 527) + 23) >> 6) << 16) |\
��((((((Color >> 5) & 0x3FU) * 259) + 33) >> 6) << 8) |\
��((((Color & 0x1FU) * 527) + 23) >> 6) | 0xFF000000)

It can be argued that the parentheses themselves are not the problem
here - it is doing too much in one expression.� Static inline functions would make things clearer, as would a separation of the steps of
breaking down the original colour format into parts, scaling or
conversions, then building up the new colour format.� Different named
types for the different formats would go a long way towards usability
and safety - at least using typedefs, but preferably using structs to
make real different types.� And surely nicer names could have been found!

Your examples actually look reasonable. In fact, it could probably do
with more parentheses around 'Color'... (I've just seen you've already mentioned this!)

The first part of the second has to apply 6 operations to 'Color' in
strict LTR order. Using parentheses ensures not having to worry about precedence, since the ops are '>> & * + >> <<'

The macro names seem self-explanatory too, although they could do with
some underscores.

But anything involving macros probably doesn't count; you expect () to
be heavily used in the expansion.

This is an example from Lua:

op_arith(L, l_addi, luai_numadd);

On the face of it, perfectly reasonable. But it expands to this:

{TValue*v1=(&((base+(((void)0),((((int)((((i)>>((((0+7)+8)+1)))& ((~((~(Instruction)0)<<(8)))<<(0))))))))))->val);TValue*v2=(&(( base+(((void)0),((((int)((((i)>>(((((0+7)+8)+1)+8)))&((~((~( Instruction)0)<<(8)))<<(0))))))))))->val);{StkId ra=(base+(((int) ((((i)>>((0+7)))&((~((~(Instruction)0)<<(8)))<<(0)))))));if(((((v1) )->tt_)==(((3)|((0)<<4))))&&((((v2))->tt_)==(((3)|((0)<<4))))){
lua_Integer i1=(((void)0),(((v1)->value_).i));lua_Integer i2=(((void) 0),(((v2)->value_).i));pc++;{TValue*io=((&(ra)->val));((io)->value_) .i=(((lua_Integer)(((lua_Unsigned)(i1))+((lua_Unsigned)(i2)))));((io) ->tt_=(((3)|((0)<<4))));};}else{lua_Number n1;lua_Number n2;if((((((v1)) ->tt_)==(((3)|((1)<<4))))?((n1)=(((void)0),(((v1)->value_).n)),1):((((( v1))->tt_)==(((3)|((0)<<4))))?((n1)=((lua_Number)(((((void)0),(((v1)-> value_).i))))),1):0))&&(((((v2))->tt_)==(((3)|((1)<<4))))?((n2)=(((void) 0),(((v2)->value_).n)),1):(((((v2))->tt_)==(((3)|((0)<<4))))?((n2)=(( lua_Number)(((((void)0),(((v2)->value_).i))))),1):0))){pc++;{TValue* io=((&(ra)->val));((io)->value_).n=(((n1)+(n2)));((io)->tt_=(((3)| ((1)<<4))));};}};};};

(I had fun debugging this at one time in my compiler. I've no idea how
the original developer did so.)

Not too many () in the macro definitions, but I can only see the top
level; here deeply nested macros are used.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 01, 2026 12:50:34

On 01/06/2026 11:42, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 31/05/2026 19:11, Bart wrote:

[...]

Actual examples of too many parentheses?

Any source code written in LISP :-)

(And for too few parentheses, any source code in Forth.)

From a quick grep of an SDK in a project I am working on, I saw this
example :

if ((((pData1 == NULL) || (pData2 == NULL))) || (Length == 0U))

The number of parentheses there is so high it's hard to see that not
only is there an unnecessary extra parentheses for the first ||
operator, but there is a second set of extra parentheses around
it. Eliminating these would give :

if ((pData1 == NULL) || (pData2 == NULL) || (Length == 0U))

or, with an extra space for clarity,

if ( (pData1 == NULL) || (pData2 == NULL) || (Length == 0U) )

That still leaves extra parentheses around the equality operators, but
the decision to keep or remove them is subjective (as is the choice of
"pData1 == NULL" vs. "!pData1").

Yeah, I'd write that as

if (pData1 == NULL || pData2 == NULL || Length == 0U)

The fact that || binds more loosely than == is one of those things
that I arbitrarily find sufficiently intuitive.

Yes, the precedence levels of "==" and "||" (and "&&") are clearly intentional, and I think a lot of C programmers are happy with skipping
the parentheses here. But some people would prefer to have the sub-expressions parenthesised, and I think that is fair enough too -
it's not going to cause anyone extra difficulties in reading the line.

[...]

And yes, these really are the names of the macro in this code.

#define CONVERTARGB88882ARGB4444(Color) \
((((Color & 0xFFU) >> 4) & 0xFU) |\
(((((Color & 0xFF00U) >> 8) >> 4) & 0xFU) << 4) |\
(((((Color & 0xFF0000U) >> 16) >> 4) & 0xFU) << 8) | \
(((((Color & 0xFF000000U) >> 24) >> 4) & 0xFU) << 12))

#define CONVERTRGB5652ARGB8888(Color) \
(((((((Color >> 11) & 0x1FU) * 527) + 23) >> 6) << 16) |\
((((((Color >> 5) & 0x3FU) * 259) + 33) >> 6) << 8) |\
((((Color & 0x1FU) * 527) + 23) >> 6) | 0xFF000000)

In a macro definition, I'd parenthesize each occurrence of Color,
in case the argument is a more complicated expression, as well as parenthesizing the entire definition (the latter was done here).
The rest of the parentheses feel excessive, but I frankly can't be
bothered to figure out which can be omitted without hurting clarity.

That's the problem with code like that. People will think "that's a
mess - I'll just assume / hope that it is correct". It is very
difficult to check in code reviews, or to maintain, modify or adapt, so
no one will bother figuring it out. It is "write-only" code.

But while I know there are certainly some of the parentheses that could
be removed, I am not sure that would actually improve the readability significantly. Like many people, I prefer not to rely on knowledge of
the relative precedences of shifts and bitwise operators. My preference
would be for major refactoring, not for removing (or adding) parentheses.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 01, 2026 12:55:59

On 01/06/2026 12:47, Bart wrote:

On 01/06/2026 08:52, David Brown wrote:

On 31/05/2026 19:11, Bart wrote:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood.� To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading.� Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

Any source code written in LISP :-)

(And for too few parentheses, any source code in Forth.)

�From a quick grep of an SDK in a project I am working on, I saw this
example :

��if ((((pData1 == NULL) || (pData2 == NULL))) || (Length == 0U))

The number of parentheses there is so high it's hard to see that not
only is there an unnecessary extra parentheses for the first ||
operator, but there is a second set of extra parentheses around it.
Eliminating these would give :

��if ((pData1 == NULL) || (pData2 == NULL) || (Length == 0U))

or, with an extra space for clarity,

��if ( (pData1 == NULL) || (pData2 == NULL) || (Length == 0U) )

That still leaves extra parentheses around the equality operators, but
the decision to keep or remove them is subjective (as is the choice of
"pData1 == NULL" vs. "!pData1").

Maybe it's due to || being a symbol; compare:

�� if (pData1 == NULL || pData2 == NULL || Length == 0U)

�� if (pData1 == NULL or pData2 == NULL or Length == 0U)

To me, || seems to draw in the terms on either side as strongly as ==.
That happens less using 'or'.

(Both are valid C if using iso646.h.)

But IMHO, the original line had at least two sets of completely
redundant and unhelpful parentheses which made it harder to read - the
reader is left wondering whether these parentheses are there for a
purpose and have an effect on what should have been a simple and clear
expression.

The pattern seems to be '((a || b)) || c) || d' so maybe the author
didn't understand that || is parsed LTR anyway.

The SDK also contains examples of parentheses used because it mixes
relatively rare operators (shifts and binary operators).� Parentheses
around such sub-expressions are not uncommon, and can definitely be
helpful, but the quantity here makes things hard to read.� Ironically,
though it is a macro, there are not "safety" parentheses around the
argument in the expression.

And yes, these really are the names of the macro in this code.

#define CONVERTARGB88882ARGB4444(Color) \
��((((Color & 0xFFU) >> 4) & 0xFU) |\
��(((((Color & 0xFF00U) >> 8) >> 4) & 0xFU) << 4) |\
��(((((Color & 0xFF0000U) >> 16) >> 4) & 0xFU) << 8) | \
��(((((Color & 0xFF000000U) >> 24) >> 4) & 0xFU) << 12))
#define CONVERTRGB5652ARGB8888(Color) \
��(((((((Color >> 11) & 0x1FU) * 527) + 23) >> 6) << 16) |\
��((((((Color >> 5) & 0x3FU) * 259) + 33) >> 6) << 8) |\
��((((Color & 0x1FU) * 527) + 23) >> 6) | 0xFF000000)

It can be argued that the parentheses themselves are not the problem
here - it is doing too much in one expression.� Static inline
functions would make things clearer, as would a separation of the
steps of breaking down the original colour format into parts, scaling
or conversions, then building up the new colour format.� Different
named types for the different formats would go a long way towards
usability and safety - at least using typedefs, but preferably using
structs to make real different types.� And surely nicer names could
have been found!

Your examples actually look reasonable. In fact, it could probably do
with more parentheses around 'Color'... (I've just seen you've already mentioned this!)

The first part of the second has to apply 6 operations to 'Color' in
strict LTR order. Using parentheses ensures not having to worry about precedence, since the ops are '>> & * + >> <<'

The macro names seem self-explanatory too, although they could do with
some underscores.

Indeed.

But anything involving macros probably doesn't count; you expect () to
be heavily used in the expansion.

I think macro definitions "count", as do how the macros are used in
code. But the full expansions do not "count" as they are not something normally read or written by the programmer. (I appreciate that you need
to see such things sometimes when implementing a compiler, and
occasionally people look at the output of a pre-processor, but in normal
use, the appearance of a macro expansion does not matter.)

This is an example from Lua:

�� op_arith(L, l_addi, luai_numadd);

On the face of it, perfectly reasonable. But it expands to this:

{TValue*v1=(&((base+(((void)0),((((int)((((i)>>((((0+7)+8)+1)))& ((~((~(Instruction)0)<<(8)))<<(0))))))))))->val);TValue*v2=(&(( base+(((void)0),((((int)((((i)>>(((((0+7)+8)+1)+8)))&((~((~( Instruction)0)<<(8)))<<(0))))))))))->val);{StkId ra=(base+(((int) ((((i)>>((0+7)))&((~((~(Instruction)0)<<(8)))<<(0)))))));if(((((v1) )->tt_)==(((3)|((0)<<4))))&&((((v2))->tt_)==(((3)|((0)<<4))))){
lua_Integer i1=(((void)0),(((v1)->value_).i));lua_Integer i2=(((void) 0),(((v2)->value_).i));pc++;{TValue*io=((&(ra)->val));((io)->value_) .i=(((lua_Integer)(((lua_Unsigned)(i1))+((lua_Unsigned)(i2)))));((io) ->tt_=(((3)|((0)<<4))));};}else{lua_Number n1;lua_Number n2;if((((((v1)) ->tt_)==(((3)|((1)<<4))))?((n1)=(((void)0),(((v1)->value_).n)),1):((((( v1))->tt_)==(((3)|((0)<<4))))?((n1)=((lua_Number)(((((void)0),(((v1)-> value_).i))))),1):0))&&(((((v2))->tt_)==(((3)|((1)<<4))))?((n2)=(((void) 0),(((v2)->value_).n)),1):(((((v2))->tt_)==(((3)|((0)<<4))))?((n2)=(( lua_Number)(((((void)0),(((v2)->value_).i))))),1):0))){pc++;{TValue* io=((&(ra)->val));((io)->value_).n=(((n1)+(n2)));((io)->tt_=(((3)| ((1)<<4))));};}};};};

(I had fun debugging this at one time in my compiler. I've no idea how
the original developer did so.)

I assume the author did so built it up in parts. The readability is in
the source - and the source is "op_arith(L, l_addi, luai_numadd);" -
there are not too many parentheses there.

Not too many () in the macro definitions, but I can only see the top
level; here deeply nested macros are used.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 01, 2026 11:04:34

In article <10vjdn8$22tgu$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 31/05/2026 19:11, Bart wrote:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood.� To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading.� Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

Any source code written in LISP :-)

Hey now. Some of us have programmed in Lisp professionally, and
rather enjoy it.

Lisp is often maligned for its parentheses; I don't think that's
fair. They really aren't that onorus once you start working in
it, and they're unambiguous; one may of the structure of Lisp
code as a shorthand notation for the resulting program's AST.

(And for too few parentheses, any source code in Forth.)

No comment.

From a quick grep of an SDK in a project I am working on, I saw this
example :

if ((((pData1 == NULL) || (pData2 == NULL))) || (Length == 0U))

The number of parentheses there is so high it's hard to see that not
only is there an unnecessary extra parentheses for the first ||
operator, but there is a second set of extra parentheses around it. >Eliminating these would give :

if ((pData1 == NULL) || (pData2 == NULL) || (Length == 0U))

or, with an extra space for clarity,

if ( (pData1 == NULL) || (pData2 == NULL) || (Length == 0U) )

That still leaves extra parentheses around the equality operators, but
the decision to keep or remove them is subjective (as is the choice of >"pData1 == NULL" vs. "!pData1").

But IMHO, the original line had at least two sets of completely
redundant and unhelpful parentheses which made it harder to read - the >reader is left wondering whether these parentheses are there for a
purpose and have an effect on what should have been a simple and clear >expression.

I see code like this all the time; usually it comes from
hardware vendors (I take it this was from a BSP or something
similar?). I often wonder about vendor programming standards
when I run across things like it.

The SDK also contains examples of parentheses used because it mixes >relatively rare operators (shifts and binary operators). Parentheses
around such sub-expressions are not uncommon, and can definitely be
helpful, but the quantity here makes things hard to read. Ironically, >though it is a macro, there are not "safety" parentheses around the
argument in the expression.

And yes, these really are the names of the macro in this code.

#define CONVERTARGB88882ARGB4444(Color) \
((((Color & 0xFFU) >> 4) & 0xFU) |\
(((((Color & 0xFF00U) >> 8) >> 4) & 0xFU) << 4) |\
(((((Color & 0xFF0000U) >> 16) >> 4) & 0xFU) << 8) | \
(((((Color & 0xFF000000U) >> 24) >> 4) & 0xFU) << 12))

#define CONVERTRGB5652ARGB8888(Color) \
(((((((Color >> 11) & 0x1FU) * 527) + 23) >> 6) << 16) |\
((((((Color >> 5) & 0x3FU) * 259) + 33) >> 6) << 8) |\
((((Color & 0x1FU) * 527) + 23) >> 6) | 0xFF000000)

It can be argued that the parentheses themselves are not the problem
here - it is doing too much in one expression. Static inline functions >would make things clearer, as would a separation of the steps of
breaking down the original colour format into parts, scaling or
conversions, then building up the new colour format. Different named
types for the different formats would go a long way towards usability
and safety - at least using typedefs, but preferably using structs to
make real different types. And surely nicer names could have been found!

Not to mention symbolic names for the magic constants. :-/

This is exactly the sort of thing that, as you point out, a
`static inline` function is far better suited for. Some code
bases don't want to use them for a variety of reasons, usually
compatibility concerns with older code, compilers, or language
standards. Some variants of Unix, for instance, worry about
header compatibility with C90 [and in some cases K&R C] code.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 01, 2026 14:04:18

On 01/06/2026 13:04, Dan Cross wrote:

In article <10vjdn8$22tgu$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 31/05/2026 19:11, Bart wrote:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood.� To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading.� Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

Any source code written in LISP :-)

Hey now. Some of us have programmed in Lisp professionally, and
rather enjoy it.

Lisp is often maligned for its parentheses; I don't think that's
fair. They really aren't that onorus once you start working in
it, and they're unambiguous; one may of the structure of Lisp
code as a shorthand notation for the resulting program's AST.

I did include a smiley - I know there are people here who enjoy working
with LISP, and have probably heard a few too many jokes about parentheses!

(And for too few parentheses, any source code in Forth.)

No comment.

From a quick grep of an SDK in a project I am working on, I saw this
example :

if ((((pData1 == NULL) || (pData2 == NULL))) || (Length == 0U))

The number of parentheses there is so high it's hard to see that not
only is there an unnecessary extra parentheses for the first ||
operator, but there is a second set of extra parentheses around it.
Eliminating these would give :

if ((pData1 == NULL) || (pData2 == NULL) || (Length == 0U))

or, with an extra space for clarity,

if ( (pData1 == NULL) || (pData2 == NULL) || (Length == 0U) )

That still leaves extra parentheses around the equality operators, but
the decision to keep or remove them is subjective (as is the choice of
"pData1 == NULL" vs. "!pData1").

But IMHO, the original line had at least two sets of completely
redundant and unhelpful parentheses which made it harder to read - the
reader is left wondering whether these parentheses are there for a
purpose and have an effect on what should have been a simple and clear
expression.

I see code like this all the time; usually it comes from
hardware vendors (I take it this was from a BSP or something
similar?). I often wonder about vendor programming standards
when I run across things like it.

Yes, this was from a hardware vendor (who shall remain nameless to
protect the guilty - not that I have found other vendors to be much
better). They have a tendency to be obsessed with MISRA, with sticking
to C90, and with filling headers with huge Doxygen templates giving no information and obscuring the code. (I'm fine with Doxygen comments
that actually add useful information, but not a dozen lines repeating
the names and types from a function signature.)

The SDK also contains examples of parentheses used because it mixes
relatively rare operators (shifts and binary operators). Parentheses
around such sub-expressions are not uncommon, and can definitely be
helpful, but the quantity here makes things hard to read. Ironically,
though it is a macro, there are not "safety" parentheses around the
argument in the expression.

And yes, these really are the names of the macro in this code.

#define CONVERTARGB88882ARGB4444(Color) \
((((Color & 0xFFU) >> 4) & 0xFU) |\
(((((Color & 0xFF00U) >> 8) >> 4) & 0xFU) << 4) |\
(((((Color & 0xFF0000U) >> 16) >> 4) & 0xFU) << 8) | \
(((((Color & 0xFF000000U) >> 24) >> 4) & 0xFU) << 12))

#define CONVERTRGB5652ARGB8888(Color) \
(((((((Color >> 11) & 0x1FU) * 527) + 23) >> 6) << 16) |\
((((((Color >> 5) & 0x3FU) * 259) + 33) >> 6) << 8) |\
((((Color & 0x1FU) * 527) + 23) >> 6) | 0xFF000000)

It can be argued that the parentheses themselves are not the problem
here - it is doing too much in one expression. Static inline functions
would make things clearer, as would a separation of the steps of
breaking down the original colour format into parts, scaling or
conversions, then building up the new colour format. Different named
types for the different formats would go a long way towards usability
and safety - at least using typedefs, but preferably using structs to
make real different types. And surely nicer names could have been found!

Not to mention symbolic names for the magic constants. :-/

Names for magic constants can be good, but they are not always helpful -
if the magic number is only used once, its definition is far from its
use, and it is polluting the global name space, then it can be a lot
better to simply use the number directly and add a comment at the point
of use. But the shift-and-mask constants could be replaced by either a
struct with bit-fields, or inline functions for field extractions, or at separate local variables for the extracted fields.

This is exactly the sort of thing that, as you point out, a
`static inline` function is far better suited for. Some code
bases don't want to use them for a variety of reasons, usually
compatibility concerns with older code, compilers, or language
standards. Some variants of Unix, for instance, worry about
header compatibility with C90 [and in some cases K&R C] code.

Indeed. But even if they don't want to use "inline", a static function
is better - the compiler will do the inlining anyway (if it makes sense according to its heuristics).

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 01, 2026 18:48:37

In article <10vjsg2$259m3$3@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 01/06/2026 13:04, Dan Cross wrote:

In article <10vjdn8$22tgu$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 31/05/2026 19:11, Bart wrote:

[snip]
Actual examples of too many parentheses?

Any source code written in LISP :-)

Hey now. Some of us have programmed in Lisp professionally, and
rather enjoy it.

Lisp is often maligned for its parentheses; I don't think that's
fair. They really aren't that onorus once you start working in
it, and they're unambiguous; one may of the structure of Lisp
code as a shorthand notation for the resulting program's AST.

I did include a smiley - I know there are people here who enjoy working
with LISP, and have probably heard a few too many jokes about parentheses!

It's fine; many variants of Lisp are deserving of criticism, and
that community has a tendency to get too touchy about defending
the language's honor. People like Stallman are fond of calling
Lisp "the most powerful language" but I think that's nonsense.

A problem with many Lisp variants is that they're dynamically
typed; I once had to fix a production outage that happened with
a programmer converted a pair of integers into a triple. The
pair had been represented using a single `CONS` cell, but when
it became apparent that a triple was needed, it was changed into
a proper list. The operation for retrieving the first half of a
`CONS` cell is `CAR`; the operation for retrieving the second
half is `CDR`. Lisp hackers usually refer to the two halves as
"the CAR" and "the CDR" of the cell.

If a `CONS` cell just holds a pair of scalar values, as in this
example, these functions give back those scalars. However,
lists are built from `CONS` cells, where the CAR of the list is
the first element, and the CDR is the tail of the list, which is
itself a list.

Anyway, to access the values from the pair, the programmer used
`CAR` and `CDR`, but when the pair was converted to a list, this
was no longer correct; the first element was still accessible as
the CAR, but the CDR was now a list; to get the second element
of the list one would use `CADR` (or the better named `SECOND`).

Unfortunately, the programmer missed one place, and passed the
CDR of the list to a function that expected a `FIXNUM` and tried
to do arithmetic on it. Lisp is usually strongly typed, so you
can't just add a list to a number; that raises a "condition"
(which is like an exception, though that you can often restart
the thing that raised the condition). In this program, that
resulted in an ISE and an error returned to the user.

The fix was trivial, but it struck me at the time that in a
statically typed language it would have been a compile time
error.

(And for too few parentheses, any source code in Forth.)

No comment.

From a quick grep of an SDK in a project I am working on, I saw this
example :

if ((((pData1 == NULL) || (pData2 == NULL))) || (Length == 0U))

The number of parentheses there is so high it's hard to see that not
only is there an unnecessary extra parentheses for the first ||
operator, but there is a second set of extra parentheses around it.
Eliminating these would give :

if ((pData1 == NULL) || (pData2 == NULL) || (Length == 0U))

or, with an extra space for clarity,

if ( (pData1 == NULL) || (pData2 == NULL) || (Length == 0U) )

That still leaves extra parentheses around the equality operators, but
the decision to keep or remove them is subjective (as is the choice of
"pData1 == NULL" vs. "!pData1").

But IMHO, the original line had at least two sets of completely
redundant and unhelpful parentheses which made it harder to read - the
reader is left wondering whether these parentheses are there for a
purpose and have an effect on what should have been a simple and clear
expression.

I see code like this all the time; usually it comes from
hardware vendors (I take it this was from a BSP or something
similar?). I often wonder about vendor programming standards
when I run across things like it.

Yes, this was from a hardware vendor (who shall remain nameless to
protect the guilty - not that I have found other vendors to be much
better). They have a tendency to be obsessed with MISRA, with sticking
to C90, and with filling headers with huge Doxygen templates giving no >information and obscuring the code. (I'm fine with Doxygen comments
that actually add useful information, but not a dozen lines repeating
the names and types from a function signature.)

Yes. I see all of this, and it mystifies me; I have seen how
excessive abstraction can lead to opaque code, but many times
hardware people go in the opposite direction, and one hardly
ever sees useful abstraction; for example, often the same code
sequence could be trivially extracted into a function, but it is
instead repeated multiple times, inline.

The SDK also contains examples of parentheses used because it mixes
relatively rare operators (shifts and binary operators). Parentheses
around such sub-expressions are not uncommon, and can definitely be
helpful, but the quantity here makes things hard to read. Ironically,
though it is a macro, there are not "safety" parentheses around the
argument in the expression.

And yes, these really are the names of the macro in this code.

#define CONVERTARGB88882ARGB4444(Color) \
((((Color & 0xFFU) >> 4) & 0xFU) |\
(((((Color & 0xFF00U) >> 8) >> 4) & 0xFU) << 4) |\
(((((Color & 0xFF0000U) >> 16) >> 4) & 0xFU) << 8) | \
(((((Color & 0xFF000000U) >> 24) >> 4) & 0xFU) << 12))

#define CONVERTRGB5652ARGB8888(Color) \
(((((((Color >> 11) & 0x1FU) * 527) + 23) >> 6) << 16) |\
((((((Color >> 5) & 0x3FU) * 259) + 33) >> 6) << 8) |\
((((Color & 0x1FU) * 527) + 23) >> 6) | 0xFF000000)

It can be argued that the parentheses themselves are not the problem
here - it is doing too much in one expression. Static inline functions
would make things clearer, as would a separation of the steps of
breaking down the original colour format into parts, scaling or
conversions, then building up the new colour format. Different named
types for the different formats would go a long way towards usability
and safety - at least using typedefs, but preferably using structs to
make real different types. And surely nicer names could have been found! >>

Not to mention symbolic names for the magic constants. :-/

Names for magic constants can be good, but they are not always helpful -
if the magic number is only used once, its definition is far from its
use, and it is polluting the global name space, then it can be a lot
better to simply use the number directly and add a comment at the point
of use. But the shift-and-mask constants could be replaced by either a >struct with bit-fields, or inline functions for field extractions, or at >separate local variables for the extracted fields.

I don't mind some magic: the shift constants and the masks, for
instance, are fine. But the magic 527, 259, 23, and 33, and why
the subsequent values are shifted right by 6, could be better
explained by naming those constants.

Btw, with respect to this specific algorithm, I looked them up,
and they seem to be empirically discovered lore, though derived
from a relatively standard algorithm for projection of a
discrete value into a larger space. This stack overflow page
has some details: https://stackoverflow.com/questions/2442576/how-does-one-convert-16-bit-rgb565-to-24-bit-rgb888

Anyway, I don't think the constants have to be defined far away
from the code; I'd be happy with a local `const uint32_t FOO`,
though in this case it should probably just be a comment.
Here's my offering:

// Converts a 16-bit RGB16 (5-6-5) value to an ARGB32
// ("RGBA8888") value.
static inline uint32_t
rgb16_to_argb(uint16_t color)
{
const uint32_t blue5 = (color >> 0) & 0x1F;
const uint32_t green6 = (color >> 5) & 0x3F;
const uint32_t red5 = (color >> 11) & 0x1F;

// Map from a 5 or 6 bit space into an 8 bit space. A
// 5-bit number has 32 possibilities; a 6 bit number
// has 64. We can calculate the projected 8-bit
// value for a k-bit number v, we can use the formula,
// v_8 = (v*2^8-1 + (k - 1)/2)/(2^k-1), or
// (v*255 + 15)/31 (for k=5) or (v*255 + 31)/63 (for
// k=6.
//
// To remove division by a prime and turn it into a
// shift, the constants below were empirically
// discovered to generate good results. See
// https://stackoverflow.com/questions/2442576/how-does-one-convert-16-bit-rgb565-to-24-bit-rgb888
// for details.
const uint32_t blue = (blue5 * 527 + 23) >> 6;
const uint32_t green = (green6 * 259 + 33) >> 6;
const uint32_t red = (red5 * 527 + 23) >> 6;
const uint32_t alpha = 0xFF000000;

return blue | (green << 8) | (red << 16) | alpha;
}

It's longer, yes, but I'd argue it's much easier to understand.
On my compiler, it generates almost identical code, except that
some instructions are in a different order.

This is exactly the sort of thing that, as you point out, a
`static inline` function is far better suited for. Some code
bases don't want to use them for a variety of reasons, usually
compatibility concerns with older code, compilers, or language
standards. Some variants of Unix, for instance, worry about
header compatibility with C90 [and in some cases K&R C] code.

Indeed. But even if they don't want to use "inline", a static function
is better - the compiler will do the inlining anyway (if it makes sense >according to its heuristics).

Assuming the compiler they're working with is known to do so,
then I agree.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Monday, June 01, 2026 21:04:01

On 01/06/2026 19:48, Dan Cross wrote:

In article <10vjsg2$259m3$3@dont-email.me>,

Names for magic constants can be good, but they are not always helpful -
if the magic number is only used once, its definition is far from its
use, and it is polluting the global name space, then it can be a lot
better to simply use the number directly and add a comment at the point
of use. But the shift-and-mask constants could be replaced by either a
struct with bit-fields, or inline functions for field extractions, or at
separate local variables for the extracted fields.

I don't mind some magic: the shift constants and the masks, for
instance, are fine. But the magic 527, 259, 23, and 33, and why
the subsequent values are shifted right by 6, could be better
explained by naming those constants.

Btw, with respect to this specific algorithm, I looked them up,
and they seem to be empirically discovered lore, though derived
from a relatively standard algorithm for projection of a
discrete value into a larger space. This stack overflow page
has some details: https://stackoverflow.com/questions/2442576/how-does-one-convert-16-bit-rgb565-to-24-bit-rgb888

Anyway, I don't think the constants have to be defined far away
from the code; I'd be happy with a local `const uint32_t FOO`,
though in this case it should probably just be a comment.
Here's my offering:

// Converts a 16-bit RGB16 (5-6-5) value to an ARGB32
// ("RGBA8888") value.
static inline uint32_t
rgb16_to_argb(uint16_t color)
{
const uint32_t blue5 = (color >> 0) & 0x1F;
const uint32_t green6 = (color >> 5) & 0x3F;
const uint32_t red5 = (color >> 11) & 0x1F;

// Map from a 5 or 6 bit space into an 8 bit space. A
// 5-bit number has 32 possibilities; a 6 bit number
// has 64. We can calculate the projected 8-bit
// value for a k-bit number v, we can use the formula,
// v_8 = (v*2^8-1 + (k - 1)/2)/(2^k-1), or
// (v*255 + 15)/31 (for k=5) or (v*255 + 31)/63 (for
// k=6.
//
// To remove division by a prime and turn it into a
// shift, the constants below were empirically
// discovered to generate good results. See
// https://stackoverflow.com/questions/2442576/how-does-one-convert-16-bit-rgb565-to-24-bit-rgb888
// for details.
const uint32_t blue = (blue5 * 527 + 23) >> 6;
const uint32_t green = (green6 * 259 + 33) >> 6;
const uint32_t red = (red5 * 527 + 23) >> 6;
const uint32_t alpha = 0xFF000000;

return blue | (green << 8) | (red << 16) | alpha;
}

It's longer, yes, but I'd argue it's much easier to understand.
On my compiler, it generates almost identical code, except that
some instructions are in a different order.

The speed probably isn't that important. This can be table-driven: you
use those formulae once to populate some tables (and with the shifts built-in). Then the routine can be simplified to this:

uint32_t rgb16_to_argb_bc(uint16_t color) {
const uint32_t blue5 = (color >> 0) & 0x1F;
const uint32_t green6 = (color >> 5) & 0x3F;
const uint32_t red5 = (color >> 11) & 0x1F;

return bluetab[blue5] | greentab[green6] | redtab[red5] |
0xFF000000;
}

On a test I did (one billion conversions cycling over 1M precalculated
random 16-bit numbers), the table version was twice as fast. Maybe a bit faster if the Alpha value is pre-added to the red-table.

(Results were merely summed, but if writing into a new buffer, then
memory access is probably more dominant.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 01, 2026 14:26:44

Bart <bc@freeuk.com> writes:
[...]

These are more or less real examples, I just simplified the
terms. Here are some from MZLIB:

return (status == MZ_OK) ? MZ_BUF_ERROR : status;

return (pL == pE) ? (l_len < r_len) : (l < r);

sym = (match_dist < 512) ? s0 : s1;

return ((pState->m_last_status == TINFL_STATUS_DONE) &&
(!pState->m_dict_avail)) ? MZ_STREAM_END : MZ_OK;

I believe that in the first three, all parentheses are superflous, but
they are used anyway. Why is that?

Obviously it's because the author of the code thought it was
clearer with the parentheses (or was working under a coding standard
written by someone who thought so). I don't think there are any
deeper conclusions to be drawn. I would have written most of them
differently, but it's not a big deal.

(My preferences for ?: are that the whole thing is syntax, outside of
the precedence scheme, and that it has mandatory parentheses. That
second line would then look like this:

return (pL == pE ? l_len < r_len : l < r);

There are fewer parentheses in all, and less potential confusion. You
can even have assignments in each branch; they will not interfere with
?:.)

But the precedence scheme *is* syntax. If you prefer to think of ?:
as something other than an operator, something that that doesn't
follow the same set of rules as other operators, and if that works
for you, then that's fine. But then how do you know that
return (pL == pE ? l_len < r_len : l < r);
means
return ((pL == pE) ? (l_len < r_len) : l < r);
and not
return (pL == (pE ? l_len < r_len : l < r));
?

I know that because I know that ?: is an operator that binds more
loosely than "==".

In any case, however you think about ?:, it's clear that
"pL == pE ? l_len < r_len : l < r" is an expression, and "return"
*is* outside of the precedence scheme. The outer parentheses are
superfluous but harmless. (Personally, I dislike parenthesizing
the expression in a return statement.)

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 01, 2026 14:39:04

Bart <bc@freeuk.com> writes:

On 01/06/2026 08:52, David Brown wrote:

[...]

That still leaves extra parentheses around the equality operators,
but the decision to keep or remove them is subjective (as is the
choice of "pData1 == NULL" vs. "!pData1").

Maybe it's due to || being a symbol; compare:

if (pData1 == NULL || pData2 == NULL || Length == 0U)

if (pData1 == NULL or pData2 == NULL or Length == 0U)

To me, || seems to draw in the terms on either side as strongly as
==. That happens less using 'or'.

(Both are valid C if using iso646.h.)

The "and" macro in <iso646.h> is exactly equivalent to "||".
If your intuition tells you they have different precedences, that
could be a problem. On the other hand, if you choose to use them
differently in ways that don't break anything, that's fine.

Digression: Perl borrows most or all of C's operators, and keeps
the same precedences. "Operators borrowed from C keep the same
precedence relationship with each other, even where C's precedence
is slightly screwy." But Perl has "and" and "or" operators that
work like "&&" and "||" but have lower precedence (that turns out
to be convenient in some contexts).

I vaguely recall that there's some language that uses the ?: syntax
for the conditional operator, but with a different precedence and/or associativity than C. I can't remember which language it is.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 01, 2026 15:11:21

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

I vaguely recall that there's some language that uses the ?: syntax
for the conditional operator, but with a different precedence and/or associativity than C. I can't remember which language it is.

The language I was thinking of is PHP. C's ?: operator associates right-to-left, which makes it possible to write chained conditional
expressions like:

cond1 ? expr1 :
cond2 ? expr2 :
cond3 ? expr3 :
default_expr

PHP's ?: operator originally associated right-to-left.
Newer versions of PHP require parentheses.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Monday, June 01, 2026 15:23:09

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

The "and" macro in <iso646.h> is exactly equivalent to "||".

I don't think so.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Monday, June 01, 2026 23:24:10

On 01/06/2026 22:39, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 01/06/2026 08:52, David Brown wrote:

[...]

That still leaves extra parentheses around the equality operators,
but the decision to keep or remove them is subjective (as is the
choice of "pData1 == NULL" vs. "!pData1").

Maybe it's due to || being a symbol; compare:

if (pData1 == NULL || pData2 == NULL || Length == 0U)

if (pData1 == NULL or pData2 == NULL or Length == 0U)

To me, || seems to draw in the terms on either side as strongly as
==. That happens less using 'or'.

(Both are valid C if using iso646.h.)

The "and" macro in <iso646.h> is exactly equivalent to "||".

I don't think so.

If your intuition tells you they have different precedences, that
could be a problem.

I'm not saying that, just that having a named operators helps to
separate that expression into three groups better than a symbolic operator.

At least for me.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 01, 2026 16:06:52

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

The "and" macro in <iso646.h> is exactly equivalent to "||".

I don't think so.

Right, that was a typo/thinko.

The "and" macro is (almost) exactly equivalent to "&&".

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tuesday, June 02, 2026 08:41:49

On 02/06/2026 00:11, Keith Thompson wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

I vaguely recall that there's some language that uses the ?: syntax
for the conditional operator, but with a different precedence and/or
associativity than C. I can't remember which language it is.

The language I was thinking of is PHP. C's ?: operator associates right-to-left, which makes it possible to write chained conditional expressions like:

cond1 ? expr1 :
cond2 ? expr2 :
cond3 ? expr3 :
default_expr

PHP's ?: operator originally associated right-to-left.
Newer versions of PHP require parentheses.

I thought you were thinking of C++, where ? has the same precedence as assignment, while in C it has higher precedence. It does not make a lot
of difference, and if you are writing an expression where it matters,
then I think parentheses would be a good idea.

<https://cppreference.com/c/language/operator_precedence> <https://cppreference.com/cpp/language/operator_precedence>

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tuesday, June 02, 2026 09:09:02

On 01/06/2026 20:48, Dan Cross wrote:

In article <10vjsg2$259m3$3@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 01/06/2026 13:04, Dan Cross wrote:

In article <10vjdn8$22tgu$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 31/05/2026 19:11, Bart wrote:

[snip]
Actual examples of too many parentheses?

[Snipping the LISP stuff - fun, but OT and not really relevant to the
thread branch. And I have never used the language.]

I see code like this all the time; usually it comes from
hardware vendors (I take it this was from a BSP or something
similar?). I often wonder about vendor programming standards
when I run across things like it.

Yes, this was from a hardware vendor (who shall remain nameless to
protect the guilty - not that I have found other vendors to be much
better). They have a tendency to be obsessed with MISRA, with sticking
to C90, and with filling headers with huge Doxygen templates giving no
information and obscuring the code. (I'm fine with Doxygen comments
that actually add useful information, but not a dozen lines repeating
the names and types from a function signature.)

Yes. I see all of this, and it mystifies me; I have seen how
excessive abstraction can lead to opaque code, but many times
hardware people go in the opposite direction, and one hardly
ever sees useful abstraction; for example, often the same code
sequence could be trivially extracted into a function, but it is
instead repeated multiple times, inline.

Indeed. There is just /so/ much that is done badly in these SDK's - I
am not going to go into details as it would take all day. I get the impression that software libraries are very much an afterthought for
most microcontroller design groups - I don't think they ever bother
talking to developers who will use them. In fact, I don't think they
talk much to the software folks when designing the microcontrollers either.

Sometimes, however, they do have abstractions - sometimes multiple
layers of HALs ("Hardware Abstraction Layer"), drivers, interfaces, etc.
Each layer has a completely different way of viewing things - one will
use #define'd constants for everything, another will use a struct with
30 fields passed as a pointer in order to turn a GPIO pin on or off, and
the next layer will use a macro TURN_GPIO_PIN_A14_ON. When you have
figured out which API you are expected to use, toggling a GPIO leads to
a half-dozen nested calls (not including macros) up and down theses
stacks when all the hardware needs is a single write to a particular
register. And if you are really lucky, a global HAL_LOCK_MUTEX is
acquired and released along the way.

The most extreme example I saw was working on a very small 8-bit microcontroller - 2 KB of code flash. I wanted to use the ADC, and
thought I'd save reading the datasheet and reference manual by using the "wizard" and SDK. The result needed 4 KB of flash - twice what the chip
had - and half of its ram. I looked in the manual and got the same
results I needed with a single line of C code that compiled to just one assembly instruction.

(Time to cut the rant short.)

Not to mention symbolic names for the magic constants. :-/

Names for magic constants can be good, but they are not always helpful -
if the magic number is only used once, its definition is far from its
use, and it is polluting the global name space, then it can be a lot
better to simply use the number directly and add a comment at the point
of use. But the shift-and-mask constants could be replaced by either a
struct with bit-fields, or inline functions for field extractions, or at
separate local variables for the extracted fields.

I don't mind some magic: the shift constants and the masks, for
instance, are fine. But the magic 527, 259, 23, and 33, and why
the subsequent values are shifted right by 6, could be better
explained by naming those constants.

Agreed - those are the "magic" ones, and need explaining (or perhaps calculating, at compile time, from something that makes sense to the
reader and maintainer).

Btw, with respect to this specific algorithm, I looked them up,
and they seem to be empirically discovered lore, though derived
from a relatively standard algorithm for projection of a
discrete value into a larger space. This stack overflow page
has some details: https://stackoverflow.com/questions/2442576/how-does-one-convert-16-bit-rgb565-to-24-bit-rgb888

A URL in comments in the code would be a lot better than just the numbers.

Anyway, I don't think the constants have to be defined far away
from the code; I'd be happy with a local `const uint32_t FOO`,
though in this case it should probably just be a comment.
Here's my offering:

// Converts a 16-bit RGB16 (5-6-5) value to an ARGB32
// ("RGBA8888") value.
static inline uint32_t
rgb16_to_argb(uint16_t color)
{
const uint32_t blue5 = (color >> 0) & 0x1F;
const uint32_t green6 = (color >> 5) & 0x3F;
const uint32_t red5 = (color >> 11) & 0x1F;

// Map from a 5 or 6 bit space into an 8 bit space. A
// 5-bit number has 32 possibilities; a 6 bit number
// has 64. We can calculate the projected 8-bit
// value for a k-bit number v, we can use the formula,
// v_8 = (v*2^8-1 + (k - 1)/2)/(2^k-1), or
// (v*255 + 15)/31 (for k=5) or (v*255 + 31)/63 (for
// k=6.
//
// To remove division by a prime and turn it into a
// shift, the constants below were empirically
// discovered to generate good results. See
// https://stackoverflow.com/questions/2442576/how-does-one-convert-16-bit-rgb565-to-24-bit-rgb888
// for details.
const uint32_t blue = (blue5 * 527 + 23) >> 6;
const uint32_t green = (green6 * 259 + 33) >> 6;
const uint32_t red = (red5 * 527 + 23) >> 6;
const uint32_t alpha = 0xFF000000;

return blue | (green << 8) | (red << 16) | alpha;
}

It's longer, yes, but I'd argue it's much easier to understand.
On my compiler, it generates almost identical code, except that
some instructions are in a different order.

Yes, that would be vastly better. (I would still prefer to have
different named types for colours in the different encoding schemes.)

This is exactly the sort of thing that, as you point out, a
`static inline` function is far better suited for. Some code
bases don't want to use them for a variety of reasons, usually
compatibility concerns with older code, compilers, or language
standards. Some variants of Unix, for instance, worry about
header compatibility with C90 [and in some cases K&R C] code.

Indeed. But even if they don't want to use "inline", a static function
is better - the compiler will do the inlining anyway (if it makes sense
according to its heuristics).

Assuming the compiler they're working with is known to do so,
then I agree.

If a compiler is not capable of inlining static functions without them
being labelled "inline", then you are unlikely to get efficient results anyway. (Or the user has not enabled optimisation, and again cannot
expect efficient results.) I don't see the point in pandering to poorly optimising compilers (including good compilers with optimisation
disabled) in order to produce marginally less big and slow code. There
was a time when a good optimising compiler was a significant investment
and not always within the budget for a project, but such times are far
in the past. I can understand that some developers are hamstrung by
daft C90 restrictions, but I have little sympathy for people wanting
good results from poor tools.

(The exception, perhaps, is people who have to use Microchip development tools.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tuesday, June 02, 2026 09:17:01

On 01/06/2026 22:04, Bart wrote:

On 01/06/2026 19:48, Dan Cross wrote:

In article <10vjsg2$259m3$3@dont-email.me>,

Names for magic constants can be good, but they are not always helpful - >>> if the magic number is only used once, its definition is far from its
use, and it is polluting the global name space, then it can be a lot
better to simply use the number directly and add a comment at the point
of use.� But the shift-and-mask constants could be replaced by either a
struct with bit-fields, or inline functions for field extractions, or at >>> separate local variables for the extracted fields.

I don't mind some magic: the shift constants and the masks, for
instance, are fine.� But the magic 527, 259, 23, and 33, and why
the subsequent values are shifted right by 6, could be better
explained by naming those constants.

Btw, with respect to this specific algorithm, I looked them up,
and they seem to be empirically discovered lore, though derived
from a relatively standard algorithm for projection of a
discrete value into a larger space.� This stack overflow page
has some details:
https://stackoverflow.com/questions/2442576/how-does-one-convert-16-
bit-rgb565-to-24-bit-rgb888

Anyway, I don't think the constants have to be defined far away
from the code; I'd be happy with a local `const uint32_t FOO`,
though in this case it should probably just be a comment.
Here's my offering:

// Converts a 16-bit RGB16 (5-6-5) value to an ARGB32
// ("RGBA8888") value.
static inline uint32_t
rgb16_to_argb(uint16_t color)
{
��const uint32_t blue5� = (color >>� 0) & 0x1F;
��const uint32_t green6 = (color >>� 5) & 0x3F;
��const uint32_t red5�� = (color >> 11) & 0x1F;

��// Map from a 5 or 6 bit space into an 8 bit space.� A
��// 5-bit number has 32 possibilities; a 6 bit number
��// has 64.�� We can calculate the projected 8-bit
��// value for a k-bit number v, we can use the formula,
��// v_8 = (v*2^8-1 + (k - 1)/2)/(2^k-1), or
��// (v*255 + 15)/31 (for k=5) or (v*255 + 31)/63 (for
��// k=6.
��//
��// To remove division by a prime and turn it into a
��// shift, the constants below were empirically
��// discovered to generate good results.� See
��// https://stackoverflow.com/questions/2442576/how-does-one-
convert-16-bit-rgb565-to-24-bit-rgb888
��// for details.
��const uint32_t blue� = (blue5 * 527 + 23) >> 6;
��const uint32_t green = (green6 * 259 + 33) >> 6;
��const uint32_t red�� = (red5 * 527 + 23) >> 6;
��const uint32_t alpha = 0xFF000000;

��return blue | (green << 8) | (red << 16) | alpha;
}

It's longer, yes, but I'd argue it's much easier to understand.
On my compiler, it generates almost identical code, except that
some instructions are in a different order.

The speed probably isn't that important. This can be table-driven: you
use those formulae once to populate some tables (and with the shifts built-in). Then the routine can be simplified to this:

� uint32_t rgb16_to_argb_bc(uint16_t color) {
�� const uint32_t blue5� = (color >>� 0) & 0x1F;
�� const uint32_t green6 = (color >>� 5) & 0x3F;
�� const uint32_t red5�� = (color >> 11) & 0x1F;

�� return bluetab[blue5] | greentab[green6] | redtab[red5] |
0xFF000000;
� }

On a test I did (one billion conversions cycling over 1M precalculated random 16-bit numbers), the table version was twice as fast. Maybe a bit faster if the Alpha value is pre-added to the red-table.

(Results were merely summed, but if writing into a new buffer, then
memory access is probably more dominant.)

Such timing results are, as they stand, totally useless - the best
choice of algorithm is entirely dependent on the target device,
tradeoffs for speed and code/table space, and on how the code is used in practice.

It is absolutely true that a table-based approach can give faster
results. (And there's no doubt that your table code here is vastly
clearer than the original macro.) On some microcontrollers, avoiding
the multiplications would give code that is an order of magnitude
faster. On others, table lookup would be a lot slower.

So it is definitely worth thinking about alternative approaches such as
this, but testing on a PC gives very little information about real-world speed.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 02, 2026 02:07:48

David Brown <david.brown@hesbynett.no> writes:

On 02/06/2026 00:11, Keith Thompson wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

I vaguely recall that there's some language that uses the ?: syntax
for the conditional operator, but with a different precedence and/or
associativity than C. I can't remember which language it is.

The language I was thinking of is PHP. C's ?: operator associates
right-to-left, which makes it possible to write chained conditional
expressions like:
cond1 ? expr1 :
cond2 ? expr2 :
cond3 ? expr3 :
default_expr
PHP's ?: operator originally associated right-to-left.
Newer versions of PHP require parentheses.

I thought you were thinking of C++, where ? has the same precedence as assignment, while in C it has higher precedence. It does not make a
lot of difference, and if you are writing an expression where it
matters, then I think parentheses would be a good idea.

<https://cppreference.com/c/language/operator_precedence> <https://cppreference.com/cpp/language/operator_precedence>

Hmm. I'm not sure I either follow or trust those tables.

Looking at the grammar in the C++ standard, there is a difference.
C has:

conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression

while C++ has:

conditional-expression:
logical-or-expression
logical-or-expression ? expression : assignment-expression

But the difference isn't mentioned in the Compatibility annex of the C++ standard.

I'd be interested in seeing a conditional expression whose legality or semantics differs between C and C++.

(Digression: I hate the fact that such a long and sometimes
informative thread has such a stupid subject header.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tuesday, June 02, 2026 11:35:57

On 2026-06-02 00:24, Bart wrote:

On 01/06/2026 22:39, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 01/06/2026 08:52, David Brown wrote:

[...]

That still leaves extra parentheses around the equality operators,
but the decision to keep or remove them is subjective (as is the
choice of "pData1 == NULL" vs. "!pData1").

Maybe it's due to || being a symbol; compare:

�� if (pData1 == NULL || pData2 == NULL || Length == 0U)

�� if (pData1 == NULL or pData2 == NULL or Length == 0U)

To me, || seems to draw in the terms on either side as strongly as
==. That happens less using 'or'.

(Both are valid C if using iso646.h.)

[...]

[...]

I'm not saying that, just that having a named operators helps to
separate that expression into three groups better than a symbolic operator.

At least for me.

I suppose because, as words, they stand out, are easier to distinguish, especially in that mass of in "C" existing punctuation characters, and psychologically suggest their dominance? - Yes, maybe.[*] - But don't
count on such "perception logic"; you generally won't get happy.[**]

Janis

[*] In the above quoted example where there's identifiers around the
operators it appears to me that the 'and'/'or' variants would be worse,
though, concerning visual perceivability.
It would certainly be different if the example had used numbers, like
if ( pData1 == 0 || pData2 == 0 || Length == 0 )
or the '!var' variant (for those who prefer that)
if ( !pData1 || !pData2 || !Length )

[**] For example, with Pascal. While that language has only very few
precedence groups - as you said you'd prefer fewer to many groups! -
they put the 'and' together with arithmetic * and / , and the 'or'
together with + and - . Given that all the comparisons are in the
lowest precedence group you will have to use parenthesis, say, for
'a < b && c == d' (in "C") would be '(a < b) and (c = d)' (in Pascal).
The keyword didn't help given the precedence groups' design with only
few precedence groups [in Pascal].
Back these days, that demand of parenthesis sightly annoyed me, since I
thought (and still think) that the boolean keywords would sufficiently
hint on the semantic intention, and with more precedence levels in the
language design they could easily have simplified those common cases.
To each [language] its own [flaw].

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tuesday, June 02, 2026 11:38:38

On 2026-06-02 11:07, Keith Thompson wrote:

[...]

(Digression: I hate the fact that such a long and sometimes
informative thread has such a stupid subject header.)

And what did prevent you from changing it? :-}

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tuesday, June 02, 2026 11:46:36

On 02/06/2026 11:07, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 02/06/2026 00:11, Keith Thompson wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

I vaguely recall that there's some language that uses the ?: syntax
for the conditional operator, but with a different precedence and/or
associativity than C. I can't remember which language it is.

The language I was thinking of is PHP. C's ?: operator associates
right-to-left, which makes it possible to write chained conditional
expressions like:
cond1 ? expr1 :
cond2 ? expr2 :
cond3 ? expr3 :
default_expr
PHP's ?: operator originally associated right-to-left.
Newer versions of PHP require parentheses.

I thought you were thinking of C++, where ? has the same precedence as
assignment, while in C it has higher precedence. It does not make a
lot of difference, and if you are writing an expression where it
matters, then I think parentheses would be a good idea.

<https://cppreference.com/c/language/operator_precedence>
<https://cppreference.com/cpp/language/operator_precedence>

Hmm. I'm not sure I either follow or trust those tables.

cppreference.com is normally very accurate - it is linked from the
isocpp.org website and AFAIUI maintained or checked by people involved
in the C++ standards. Mistakes here are definitely something that
should be taken seriously.

Looking at the grammar in the C++ standard, there is a difference.
C has:

conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression

while C++ has:

conditional-expression:
logical-or-expression
logical-or-expression ? expression : assignment-expression

But the difference isn't mentioned in the Compatibility annex of the C++ standard.

I'd be interested in seeing a conditional expression whose legality or semantics differs between C and C++.

There is a little information in the "discussion" page of the C++ side
linked above. An example is

true ? a : b = 7;

In C, the ternary operator has higher precedence than assignment and
this therefore parses as :

(true ? a : b) = 7;

In C, the ternary operator does not return an lvalue, so this is a
constraint error.

In C++, the precedence of ternary and assignment are the same, with right-to-left associativity, so this is parsed as :

true ? a : (b = 7)

and evaluates as the value of "a", leaving "b" untouched.

I am not confident enough in my standardese, especially for C++, to
judge if the above explanation is correct according to the standards.
But a quick test on godbolt shows that both gcc and clang follow that
line of reasoning. (It is possible that they are both wrong, but that
would be surprising.)

The difference in precedences here is, I think, related to the ternary operator being able to evaluate to an lvalue in C++ but not in C - and
that /is/ mentioned in the C++ compatibility annex.

(Digression: I hate the fact that such a long and sometimes
informative thread has such a stupid subject header.)

Agreed.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tuesday, June 02, 2026 11:48:35

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? And despite all
that the latter might not even get triggered because it's probably
optimized away? - I can't help, this sounds really crude.

Is there any rationale from the _software designer_'s perspective?

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Tuesday, June 02, 2026 11:09:12

On 02/06/2026 10:46, David Brown wrote:

On 02/06/2026 11:07, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 02/06/2026 00:11, Keith Thompson wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

I vaguely recall that there's some language that uses the ?: syntax
for the conditional operator, but with a different precedence and/or >>>>> associativity than C.� I can't remember which language it is.

The language I was thinking of is PHP.� C's ?: operator associates
right-to-left, which makes it possible to write chained conditional
expressions like:
�� cond1 ? expr1 :
�� cond2 ? expr2 :
�� cond3 ? expr3 :
�� default_expr
PHP's ?: operator originally associated right-to-left.
Newer versions of PHP require parentheses.

I thought you were thinking of C++, where ? has the same precedence as
assignment, while in C it has higher precedence.� It does not make a
lot of difference, and if you are writing an expression where it
matters, then I think parentheses would be a good idea.

<https://cppreference.com/c/language/operator_precedence>
<https://cppreference.com/cpp/language/operator_precedence>

Hmm.� I'm not sure I either follow or trust those tables.

cppreference.com is normally very accurate - it is linked from the isocpp.org website and AFAIUI maintained or checked by people involved
in the C++ standards.� Mistakes here are definitely something that
should be taken seriously.

Looking at the grammar in the C++ standard, there is a difference.
C has:

�� conditional-expression:
�� logical-OR-expression
�� logical-OR-expression ? expression : conditional-expression

while C++ has:

�� conditional-expression:
�� logical-or-expression
�� logical-or-expression ? expression : assignment-expression

But the difference isn't mentioned in the Compatibility annex of the C++
standard.

I'd be interested in seeing a conditional expression whose legality or
semantics differs between C and C++.

There is a little information in the "discussion" page of the C++ side linked above.� An example is

��true ? a : b = 7;

In C, the ternary operator has higher precedence than assignment and
this therefore parses as :

��(true ? a : b) = 7;

In C, the ternary operator does not return an lvalue, so this is a constraint error.

In C++, the precedence of ternary and assignment are the same, with right-to-left associativity, so this is parsed as :

��true ? a : (b = 7)

and evaluates as the value of "a", leaving "b" untouched.

I am not confident enough in my standardese, especially for C++, to
judge if the above explanation is correct according to the standards.
But a quick test on godbolt shows that both gcc and clang follow that
line of reasoning.� (It is possible that they are both wrong, but that
would be surprising.)

The difference in precedences here is, I think, related to the ternary operator being able to evaluate to an lvalue in C++ but not in C - and
that /is/ mentioned in the C++ compatibility annex.

I was surprised that there would be such a subtle difference given that
the languages are that close; there are totally unrelated languages that slavishly follow C precedence rules more closely!

But the behaviour only seems to vary in code that would be invalid in C
anyway (unbracketed ?: term on LHS of '=' operator).

Your table however also shows || had same precedence as both ?: and =.
There, I couldn't find an example that made a difference.

Still, I'd find that unsettling; I would rather that ?: was distinct
from bother, either with its own level, or via other language rules. (In
my stuff it is always written with parentheses.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tuesday, June 02, 2026 12:16:09

On 2026-05-31 04:53, Keith Thompson wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-05-31 01:43, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[...]

[...]
C's operator precedence rules are complicated and arguably flawed.

I'd say that just the (known) flaw makes them (slightly) complicated;
so you need to remember that "flaw" (or "inconsistency") to be safe.
The rest is completely sensible. And even if one doesn't have a table
to look up the precedences they mostly can be derived (presuming one
has a feeling for the underlying logic of these things or experiences
from other related areas).

Reasonable, but I feel the need to say that that's your personal
opinion. You seem to think that C's precedence rules have one and
only one flaw, and a set of rules with that flaw corrected would
be ideal.

Erm, no. Not "ideal". (This is just another formulation for what
Bart expressed with the word "perfect" that he'd put in my mouth.)

What I'm saying is that with the knowledge of the contexts of the
underlying models (mathematical and logical calculi, based on a
sensible definition) the possible (sensible) options are sparse.

We should always keep in mind that there's an inherent difference
of subjective opinions and knowledge based sensible conventions.
Such conventions are not as universal as natural laws, they can't
be because they are human-made, but they can be sensibly defined
(often due to practical reasoning). Arithmetic is such a case, and
the hierarchy of lower-level operations to higher-level (+, *, **) straightforwardly defined. Practical consideration, like usage in
positional notation systems add to the form of such conventions.
That's certainly far from an unfounded just "subjective opinion".

I don't even necessarily disagree, but others are likely to have
different opinions, and those opinions might be perfectly valid.

It's not about opinions. (See above.)

I don't want to make a huge deal out of this. I honestly don't have
a strong opinion myself. I usually find dealing with the rules
as they exist to be a much better use of my time and attention --
and I don't mean that as a criticism of anyone who choose to think
about alternatives.

Oh, I'm not handling that differently in practice. When I had read
my K&R translation (from 1983) to learn the C-language I just made
a small note in my book (http://volatile.gridbug.de/C-op-prec.png,
where the faint comment "sinnvoll" means "sensible") and carried on.

That comment was actually useful to immediately see that flaw when
looking up the precedences while programming in "C" to not make a
programming mistake.

[...]

[...]

When designing a new language, there are real advantages in strictly
imitating C's rules, just because so many programmers are familiar
with them.

Huh? - How that? - Are you saying here that practically only C-like
languages are in common use?

Huh? No, I didn't say that at all.

I suggest that if you're designing a somewhat C-like language,
sticking to C's precedence rules has advantages due to programmer familiarity. Even for a language that's not particularly C-like,
but that has C-like expressions, the designer might consider
following C's rules.

Oh, I see.

Or not.

[...]

(I would have been silly for C++ or Objective-C to
change the precedence rules, even to improve them.) But there
are also real advantages in using precedence rules that are better
(e.g., simpler) than C's.

Or - with reference to that flaw - just more consistent.

Consistent systems are inherently simpler, in the sense of easier to
understand and thus more straightforward to use. A precondition for
that is, as said, at least a basic understanding of such things.

Ah, but consistent with what? Internal consistency and consistency
with existing practice are not necessarily the same thing.

Right. And both should be considered when designing such things.

Janis

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Tuesday, June 02, 2026 06:37:17

On 2026-06-02 05:48, Janis Papanagnou wrote:

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? And despite all
that the latter might not even get triggered because it's probably
optimized away? - I can't help, this sounds really crude.

Is there any rationale from the _software designer_'s perspective?

Yes - the rationale is to keep things simple. The abstract machine has
exactly the semantics specified in the standard. Whether or not a given expression has undefined behavior depends only upon the operator and
it's operands, and not on the context in which it is invoked. It's only
when the abstract machine has defined observable behavior when executing
a program that it becomes meaningful to allow optimizations that
preserve that behavior.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tuesday, June 02, 2026 12:55:05

On 2026-05-31 11:47, David Brown wrote:

On 31/05/2026 03:37, Janis Papanagnou wrote:

On 2026-05-31 01:43, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[...]

If not, people can choose to ignore those them when writing C code,
for example like this where all () are technically superfluous:

�� crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

Yes, they can, and I personally tend to agree that they should.

The more complex the expressions are the more structure they need.

IMO, the parenthesis above make precedence clear (if unknown!), but
are not contributing to readability. It would have made more sense
to separate the sub-expression within the [...] in an own object to
enhance readability and to more easily understand what's going on.

To emphasize; not the precedences are the problem above, but the
complexity of the expression in connexion with lack of structuring.

This is an example of how readability depends on the reader.� To me,
there is no benefit in having a sub-expression here because the
structure is clear - this is how you do table-based crc's with 4-bit
chunks.

To me, the precedence is as clear as the structure. That's not the
issue I see with that expression.

It's the overloaded expression that is what makes it "unreadable".
(It's actually similar to those overloaded expressions that we saw
in another recent sub-/thread about color-conversions.)

But to someone unfamiliar with CRC calculations, splitting the
expression up might make it clearer.� (Alternatively, a comment block
with an explanation could help.)

And that has also nothing to do neither with table-based algorithms
(which are a triviality) nor with the CRC (or other) coding-programs.
(Note that I'm saying that as someone who has implemented a lot of
such things, various CRCs, directly calculated and table-based, and
a lot much more demanding coding software than these simple CRCs.)

I /do/ think the parentheses here are helpful for readability, precisely because they emphasise the structure of the expression.� You could write:

��crcu32 = crcu32 >> 4 ^ s_crc32[crcu32 & 0xF ^ b & 0xF];

but that needs significantly more cognitive effort to parse when reading
it, could be misinterpreted, and has lost all the structure that makes
it easy to see what is going on.

Yes, I recognize that in that example the parentheses help combining
parts. (But as said, I see the primary problem in the complexity of
the expression.)

(I regularly use bit-manipulation and shift instructions in my code -
but I still felt it best to check the details in a precedence table
before writing that.)

Agreed.

The expression as originally parenthesised is thus definitely easier
for /me/ to read, and is almost exactly the way I would write it myself :

��crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

Acknowledged.

The only differences I would have are the names (why would anyone put variable types into the names like "crcu32" ?

Given the more obvious problem I see with that expression I hadn't
commented on that; but you are right and I certainly agree.

The coding algorithms that I implemented had always been just plain
straight ("unsigned") registers. (So there's no need to reflect that
property in the names of such variable.)

We are not writing
BASIC), and I'd use a small case "0xf".� Unlike almost every example
Bart has shown before, it even has nice spacing!

I'm not that picky with the hexadecimal constants it seems; I seem to
use both forms, depending on subjective readability in the respective
context. A quick glimpse into my table-driven CRC C-code shows that I
used lower-case, and that seems generally to be the prevalent form.[*]

Janis

[*] Anecdotally: lowercase tables may have their issues though; once
I had implemented a DES-3 algorithm, which was full of tabular data.
(That was at times when we could obtain standards only as paper.)
My code failed against my tests in about 30% of the test cases. The
reason of the problem was a single table entry 'db' vs. 'bd' (or vv.)
It was a horror to find that typo in that huge stack of table data;
somehow it was difficult to find that even after having read several
times across all that constant data. I'm not sure the same could have
happened with uppercase, though...

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Tuesday, June 02, 2026 04:16:48

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[syntax for conditional expressions]

Looking at the grammar in the C++ standard, there is a difference.
C has:

conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression

while C++ has:

conditional-expression:
logical-or-expression
logical-or-expression ? expression : assignment-expression

But the difference isn't mentioned in the Compatibility annex of the
C++ standard.

Like I have said before, there are lots of differences between C
and C++ that aren't mentioned in the Compatibility annex of the
C++ standard. It isn't surprising to find another one.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 02, 2026 05:01:38

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-02 11:07, Keith Thompson wrote:

[...]
(Digression: I hate the fact that such a long and sometimes
informative thread has such a stupid subject header.)

And what did prevent you from changing it? :-}

Futility. At best, I could start a new subthread. The existing
subject line would live on.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Tuesday, June 02, 2026 05:06:18

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? [...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.
Even more than that, the program is strictly conforming, and must be
accepted by a conforming implementation.

Now let's change the program slightly:

#include <limits.h>

int
foo(){
static int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does transgress the bounds of undefined behavior. The
reason for the difference is that in the first program the semantics
of foo() is to evaluate the expression to be stored in 'zero' only
at runtime, whereas in the second program the semantics of foo() is
to evaluate the expression to be stored in 'zero' before program
startup (informally, "at compile time"). What matters is not
whether the offending expression /might/ be evaluated "at compile
time", but whether the offending expression /must/ be evaluated "at
compile time". Only in the second case is undefined behavior
inevitable (and thus it does not occur in the first program).

Fine point: strictly speaking, I believe the C standard allows even
the second program to complete translation phase 8 successfully, and
for any offending behavior to occur only when we actually try to run
the program. To say that another way, there is no requirement that
possible nasal demons be made manifest at any point before an actual
attempted execution. On the other hand, because that possibility is
there lurking in the background, there is no requirement that the
program be accepted, and could be rejected by a conforming compiler.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 02, 2026 12:07:42

In article <10vlvie$2ne3j$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 01/06/2026 20:48, Dan Cross wrote:

In article <10vjsg2$259m3$3@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 01/06/2026 13:04, Dan Cross wrote:

In article <10vjdn8$22tgu$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 31/05/2026 19:11, Bart wrote:

[snip]
Actual examples of too many parentheses?

[Snipping the LISP stuff - fun, but OT and not really relevant to the
thread branch. And I have never used the language.]

Fair!

[snip]
I see code like this all the time; usually it comes from
hardware vendors (I take it this was from a BSP or something
similar?). I often wonder about vendor programming standards
when I run across things like it.

Yes, this was from a hardware vendor (who shall remain nameless to
protect the guilty - not that I have found other vendors to be much
better). They have a tendency to be obsessed with MISRA, with sticking
to C90, and with filling headers with huge Doxygen templates giving no
information and obscuring the code. (I'm fine with Doxygen comments
that actually add useful information, but not a dozen lines repeating
the names and types from a function signature.)

Yes. I see all of this, and it mystifies me; I have seen how
excessive abstraction can lead to opaque code, but many times
hardware people go in the opposite direction, and one hardly
ever sees useful abstraction; for example, often the same code
sequence could be trivially extracted into a function, but it is
instead repeated multiple times, inline.

Indeed. There is just /so/ much that is done badly in these SDK's - I
am not going to go into details as it would take all day. I get the >impression that software libraries are very much an afterthought for
most microcontroller design groups - I don't think they ever bother
talking to developers who will use them. In fact, I don't think they
talk much to the software folks when designing the microcontrollers either.

I think that's exactly what happens: the uCtlr companies don't
have robust software development organizations, and it's seen as
a side-bag to their core business. The same is true for the
bigger hardware vendors, as well (lookin' at you, Intel). Cue
the famous story about Fred Brooks throwing Gene Amdahl out of
his office until the latter came with with a hardware design for
the IBM 360 with byte addressing and power of two widths for
primitive data types.

Sometimes, however, they do have abstractions - sometimes multiple
layers of HALs ("Hardware Abstraction Layer"), drivers, interfaces, etc.
Each layer has a completely different way of viewing things - one will
use #define'd constants for everything, another will use a struct with
30 fields passed as a pointer in order to turn a GPIO pin on or off, and
the next layer will use a macro TURN_GPIO_PIN_A14_ON. When you have
figured out which API you are expected to use, toggling a GPIO leads to
a half-dozen nested calls (not including macros) up and down theses
stacks when all the hardware needs is a single write to a particular >register. And if you are really lucky, a global HAL_LOCK_MUTEX is
acquired and released along the way.

Yes. The way one boots an AMD server SoC, for instance,
requires shipping a bunch of binary data structures around to
little microcontrollers spread across a bunch of AXI buses, that
are then responsible for things like configuring PCIe links and
enumerating IO buses and so on. The vendor code for doing this
is opaque, at best. For example, https://github.com/openSIL/openSIL/blob/turin_poc/xUSL/Nbio/Brh/NbioPcieComplexDataBrh.c
(and that's a cleaned-up version).

[snip color mapping code]

Yes, that would be vastly better. (I would still prefer to have
different named types for colours in the different encoding schemes.)

I'll see your named types and raise you a bitfield struct. The
shifting and masking is superfluous.

This is exactly the sort of thing that, as you point out, a
`static inline` function is far better suited for. Some code
bases don't want to use them for a variety of reasons, usually
compatibility concerns with older code, compilers, or language
standards. Some variants of Unix, for instance, worry about
header compatibility with C90 [and in some cases K&R C] code.

Indeed. But even if they don't want to use "inline", a static function
is better - the compiler will do the inlining anyway (if it makes sense
according to its heuristics).

Assuming the compiler they're working with is known to do so,
then I agree.

If a compiler is not capable of inlining static functions without them
being labelled "inline", then you are unlikely to get efficient results >anyway. (Or the user has not enabled optimisation, and again cannot
expect efficient results.) I don't see the point in pandering to poorly >optimising compilers (including good compilers with optimisation
disabled) in order to produce marginally less big and slow code. There
was a time when a good optimising compiler was a significant investment
and not always within the budget for a project, but such times are far
in the past. I can understand that some developers are hamstrung by
daft C90 restrictions, but I have little sympathy for people wanting
good results from poor tools.

(The exception, perhaps, is people who have to use Microchip development >tools.)

It's not just because the optimizer is bad or the developers are
obtuse. Sometimes it's a deliberate decision to support
external tooling, like a debugger or tracing program or similar.
Some projects deliberately tolerate slower code for that.

Moreover, on large code bases, with long life spans, upgrading a
compiler is a significant investment. Almost invariably the
code has UB somewhere (I work on a code base that has been
evolving since before ANSI C; out of about 11 million lines,
there's lots of code that can be considered "legacy" in it).
From a business standpoint, it's not worth the time or
engineering resources required to go find all of it and make it
strictly conforming; from a technical standpoint, it may not
always be possible to do so anyway (though other superset
standards, like POSIX, are another matter), and in other cases
the resulting obfuscation to meet much stricter demands of ISO C
has been deemed, rightly or wrongly, as simply not worth it. It
may not ideal, but them's the breaks.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 02, 2026 05:25:53

Bart <bc@freeuk.com> writes:
[...]

David Brown <david.brown@hesbynett.no> writes:

[...]

<https://cppreference.com/c/language/operator_precedence>
<https://cppreference.com/cpp/language/operator_precedence>

[...]

Your table however also shows || had same precedence as both ?: and
=. There, I couldn't find an example that made a difference.

Still, I'd find that unsettling; I would rather that ?: was distinct
from bother, either with its own level, or via other language
rules. (In my stuff it is always written with parentheses.)

I think you're misreading the table due to its poor formatting.

In the C++ table (second URL above), the precedence levels are
numbered from 1 to 17, but the number in the first column is aligned
to the *middle* of the list of operators in the second column.
So level 15 is just "a || b", and level 16 goes from "a ? b : c" to
"a &= b a ^= b a |= b". You can tell where the level 16 section
starts by the "Right-to-left" associativity in the last column,
which is aligned with the *first* item in the list. I've submitted
a suggestion to fix it (and then saw that someone else had already
done so), but apparently cppreference.com is being hit by vandalism,
so it might take a while before it's corrected.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 02, 2026 05:35:43

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? And despite all
that the latter might not even get triggered because it's probably
optimized away? - I can't help, this sounds really crude.

Is there any rationale from the _software designer_'s perspective?

In the abstract machine, every operator and subexpression is
evaluated (barring things like "||", "&&", and "?:"). (INT_MAX + 1)
has undefined behavior due to overflow, therefore any expression
that has (INT_MAX + 1) as a subexpression has undefined behavior.

Replacing (expr * 0) by 0 is an optimization, and optimizations
are *optional*. A naive implementation could generate code that
peforms the addition and the muliplication by 0; if the addition
traps, it traps.

Note that in a context that requires a constant expression, overflow is
a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Kenny McCormack@3:633/10 to All on Tuesday, June 02, 2026 12:36:25

Subject: Operator precedence in other (non-C, but "C-like") languages (Was: something about a girl)

In article <10vku5o$2glfs$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
...

Digression: Perl borrows most or all of C's operators, and keeps
the same precedences. "Operators borrowed from C keep the same
precedence relationship with each other, even where C's precedence
is slightly screwy." But Perl has "and" and "or" operators that
work like "&&" and "||" but have lower precedence (that turns out
to be convenient in some contexts).

I vaguely recall that there's some language that uses the ?: syntax
for the conditional operator, but with a different precedence and/or >associativity than C. I can't remember which language it is.

(It turns out it was PHP that you were thinking of)

There is another language that claims to be C-like in terms of its
operators and functions (although its overall syntax and reason for
existence are completely not like C), but which has the quirk that || and
&& work like in C, except that they don't do "short circuit" evaluation.
Both sides of the operator are always evaluated. Working in this language,
I found this lack of short-circuit jarring, but when I mentioned it on the support board (for this particular language), they had no idea what I was talking about...

And that language is: WinBatch.

--
Men rarely (if ever) manage to dream up a God superior to themselves.
Most Gods have the manners and morals of a spoiled child.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tuesday, June 02, 2026 14:37:11

On 02/06/2026 14:07, Dan Cross wrote:

In article <10vlvie$2ne3j$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 01/06/2026 20:48, Dan Cross wrote:

In article <10vjsg2$259m3$3@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 01/06/2026 13:04, Dan Cross wrote:

In article <10vjdn8$22tgu$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 31/05/2026 19:11, Bart wrote:

[snip]

[snip color mapping code]

Yes, that would be vastly better. (I would still prefer to have
different named types for colours in the different encoding schemes.)

I'll see your named types and raise you a bitfield struct. The
shifting and masking is superfluous.

Sure. "Named types" does not preclude bit-fields. I'd prefer some kind
of struct, for type safety, but even a typedef is better than nothing.
And when you have a struct for something like this, bit-fields are an
obvious choice (at least for code that doesn't have to be portable to different endian systems).

This is exactly the sort of thing that, as you point out, a
`static inline` function is far better suited for. Some code
bases don't want to use them for a variety of reasons, usually
compatibility concerns with older code, compilers, or language
standards. Some variants of Unix, for instance, worry about
header compatibility with C90 [and in some cases K&R C] code.

Indeed. But even if they don't want to use "inline", a static function >>>> is better - the compiler will do the inlining anyway (if it makes sense >>>> according to its heuristics).

Assuming the compiler they're working with is known to do so,
then I agree.

If a compiler is not capable of inlining static functions without them
being labelled "inline", then you are unlikely to get efficient results
anyway. (Or the user has not enabled optimisation, and again cannot
expect efficient results.) I don't see the point in pandering to poorly
optimising compilers (including good compilers with optimisation
disabled) in order to produce marginally less big and slow code. There
was a time when a good optimising compiler was a significant investment
and not always within the budget for a project, but such times are far
in the past. I can understand that some developers are hamstrung by
daft C90 restrictions, but I have little sympathy for people wanting
good results from poor tools.

(The exception, perhaps, is people who have to use Microchip development
tools.)

It's not just because the optimizer is bad or the developers are
obtuse. Sometimes it's a deliberate decision to support
external tooling, like a debugger or tracing program or similar.
Some projects deliberately tolerate slower code for that.

That's fine - you are knowingly picking a different tradeoff.

Moreover, on large code bases, with long life spans, upgrading a
compiler is a significant investment.

In any given project, I consider the toolchain as part of the project.
I don't upgrade or replace it without very good reason. If I pull an
old project out of its mothballs to make a change, the first thing I do
is a clean rebuild and compare the binary to the one stored with the
project, to be sure that everything builds cleanly and identically. (My record for doing this was almost exactly 20 years between code changes -
and that code was in C90. But the compiler was happy to do inlining optimisations without the "inline" keyword.)

Almost invariably the
code has UB somewhere (I work on a code base that has been
evolving since before ANSI C; out of about 11 million lines,
there's lots of code that can be considered "legacy" in it).
From a business standpoint, it's not worth the time or
engineering resources required to go find all of it and make it
strictly conforming; from a technical standpoint, it may not
always be possible to do so anyway (though other superset
standards, like POSIX, are another matter), and in other cases
the resulting obfuscation to meet much stricter demands of ISO C
has been deemed, rightly or wrongly, as simply not worth it. It
may not ideal, but them's the breaks.

Fair enough.

I am a little spoiled in that most of the code I work with, I wrote.
But that is less true now than it used to be. In the old days, I would
rarely need anything from the standard library and nothing from microcontroller vendors or third parties. (Before that, I would
typically use assembly for microcontrollers, and then everything was my
own work.) I have sometimes had to add "-fno-strict-aliasing -fwrapv"
to deal with UB in other people's code, which always feels uncomfortable.

And of course sometimes you get handed code written by a muppet with no
clue what they are doing. I once had to debug code written by someone
who did not see the point in keeping the number and type of parameters
in sync between function definitions and function calls. The parts of
the program that worked did so by sheer chance.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Kenny McCormack@3:633/10 to All on Tuesday, June 02, 2026 12:39:08

Subject: It is not futile to change the subject line (Was: this girl calls c ugly)

In article <10vmgn2$2tjoi$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-02 11:07, Keith Thompson wrote:

[...]
(Digression: I hate the fact that such a long and sometimes
informative thread has such a stupid subject header.)

And what did prevent you from changing it? :-}

Futility. At best, I could start a new subthread. The existing
subject line would live on.

See. That wasn't so hard, was it?

I maintain that there are several good reasons why changing the Subject
line is a good thing. Many other people disagree with me, but I don't care about that.

It is, as you imply, especially a good idea where, as here, the original
(i.e., carried) Subject line is dumb.

--
I shot a man on Fifth Aveneue, just to see him die.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Kenny McCormack@3:633/10 to All on Tuesday, June 02, 2026 12:42:09

Subject: Re: It is not futile to change the subject line (Was: this girl calls c ugly)

In article <10vmitc$o9ge$2@news.xmission.com>,
Kenny McCormack <gazelle@shell.xmission.com> wrote:
...

I maintain that there are several good reasons why changing the Subject
line is a good thing. Many other people disagree with me, but I don't care >about that.

It is, as you imply, especially a good idea where, as here, the original >(i.e., carried) Subject line is dumb.

Oh, and please read this, which I recently composed on the subject:

3 Reasons Why People Don't Change Subject Lines

1) Because they don't know how (and can't be bothered to learn). Or,
eqv, that whatever crappy newsreader they are using makes it difficult
or impossible to do so.

2) Because they think it is a violation of netiquette to do so. I.e.,
they think it "breaks" the thread". The theory is that doing so creates problems for people who use poor newsreaders.

3) Because they get a perverse thrill out of keeping an old, totally inappropriate thread title, when they clearly know better.

--
Many people in the American South think that DJT is, and will be remembered
as, one of the best presidents in US history. They are absolutely correct.

He is currently at number 46 on the list. High praise, indeed!

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 02, 2026 13:05:08

In article <10vh1eo$1ei50$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 31/05/2026 10:49, David Brown wrote:

On 31/05/2026 10:12, Richard Harnden wrote:

On 31/05/2026 00:43, Keith Thompson wrote:

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently.� A simpler set of rules,
with fewer levels,*might* have been better.� I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are.� I would accept them if they had
been defined differently.

Can't the compiler easily remove any parens that aren't necessary?
So - just write complex expressions in a way that a human can most
easily understand, it makes your intention clear and probable doesn't
increase the size of the executable.

Of course.� Parentheses do not affect the generated code unless they
affect the semantics of the expression.� (Some people think parentheses
affect the order of evaluation,

They can do if they make a expression be parsed differently. Do you have
an example where they make no difference but people might think they do?

This is all a bit of a distraction from the original point that
David and Richard Harnden were trying to make, which seemed
clear enough to me, but perhaps should have been given with a
better example. Maybe something like:

d = a*b + c;

Is equivalent to,

d = (a*b) + c;

And in this case, the parentheses are superfluous and don't
change the order of evaluation of the expression as far as the
language is concerned. Whether a compiler rearranges it in
generated code in a way that is more convenient of faster or
whatever is another matter.

I would quibble with this idea that the compiler "removes"
parentheses. I get the intuition, but C is not Go where the
compiler "inserts" semi-colons for you, and has no analogous
concept. Rather, as I think Keith said, expressions are parsed
into some internal representation, and then transformed into
something like an abstract syntax tree, where syntactic
notations like parentheses are lost.

Both expressions above correspond to an AST like:

��Ŀ
�BinOp +�
��
? ?
? ?
��Ŀ ��Ŀ
�BinOp *� �Sym `c`�
��
? ?
? ?
��Ŀ ��Ŀ
�Sym `a`� �Sym `b`�
��

But the to get to that, it may be that the compiler uses a
different initial representation, like a parse tree that more
closely resembles the source language grammar. Here, the
two expressions might have different parsed representations.
E.g., for the first, simplifying heavily, may look something
like this:

��Ŀ
� expr �
��
? � ?
? � ?
��Ŀ . ��Ŀ
�term � (+) �term �
�� ' ��
? � ? �
? � ? �
��Ŀ . ��Ŀ ��Ŀ
�ident� (*) �ident� �ident�
�� ' ��
� � �
� � �
.�. .�. .�.
(`a`) (`b`) (`c`)
`�' `�' `�'

While the second might add an extra `expr` node, as in:

��Ŀ
� expr �
��
? � ?
? � ?
��Ŀ . ��Ŀ
� expr � (+) �term �
�� ' ��
� �
� �
��Ŀ ��Ŀ
�term � �ident�
��
? � ? �
? � ? �
��Ŀ . ��Ŀ .�.
�ident� (*) �ident� (`c`)
�� ' �� `�'
� �
� �
.�. .�.
(`a`) (`b`)
`�' `�'

I believe that the answer, for most compilers that parse and
then convert to an AST, the second is more likely to be created
than the first. However, given that the same AST is created
from both parse trees, this is unlikely to have an effect on the
object code ultimately output from the compiler.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Tuesday, June 02, 2026 14:20:15

On 02/06/2026 13:25, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:
[...]

David Brown <david.brown@hesbynett.no> writes:

[...]

<https://cppreference.com/c/language/operator_precedence>
<https://cppreference.com/cpp/language/operator_precedence>

[...]

Your table however also shows || had same precedence as both ?: and
=. There, I couldn't find an example that made a difference.

Still, I'd find that unsettling; I would rather that ?: was distinct
from bother, either with its own level, or via other language
rules. (In my stuff it is always written with parentheses.)

I think you're misreading the table due to its poor formatting.

In the C++ table (second URL above), the precedence levels are
numbered from 1 to 17, but the number in the first column is aligned
to the *middle* of the list of operators in the second column.
So level 15 is just "a || b", and level 16 goes from "a ? b : c" to
"a &= b a ^= b a |= b". You can tell where the level 16 section
starts by the "Right-to-left" associativity in the last column,
which is aligned with the *first* item in the list. I've submitted
a suggestion to fix it (and then saw that someone else had already
done so), but apparently cppreference.com is being hit by vandalism,
so it might take a while before it's corrected.

You're right, it is a confusing layout. But it might explain why I
couldn't find different behaviours between C/C++ in my examples
involving || and ?:.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Tuesday, June 02, 2026 06:29:01

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Note that in a context that requires a constant expression, overflow is
a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

gcc disagrees with you.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Tuesday, June 02, 2026 14:38:10

On 02/06/2026 14:05, Dan Cross wrote:

In article <10vh1eo$1ei50$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 31/05/2026 10:49, David Brown wrote:

On 31/05/2026 10:12, Richard Harnden wrote:

On 31/05/2026 00:43, Keith Thompson wrote:

C's operator precedence rules are complicated and arguably flawed.
They could have been defined differently.� A simpler set of rules,
with fewer levels,*might* have been better.� I don't have any
concrete suggestions -- nor do I have any strong preferences.
I accept C's rules as they are.� I would accept them if they had
been defined differently.

Can't the compiler easily remove any parens that aren't necessary?
So - just write complex expressions in a way that a human can most
easily understand, it makes your intention clear and probable doesn't
increase the size of the executable.

Of course.� Parentheses do not affect the generated code unless they
affect the semantics of the expression.� (Some people think parentheses
affect the order of evaluation,

They can do if they make a expression be parsed differently. Do you have
an example where they make no difference but people might think they do?

This is all a bit of a distraction from the original point that
David and Richard Harnden were trying to make, which seemed
clear enough to me, but perhaps should have been given with a
better example. Maybe something like:

d = a*b + c;

Is equivalent to,

d = (a*b) + c;

And in this case, the parentheses are superfluous and don't
change the order of evaluation of the expression as far as the
language is concerned. Whether a compiler rearranges it in
generated code in a way that is more convenient of faster or
whatever is another matter.

I would quibble with this idea that the compiler "removes"
parentheses. I get the intuition, but C is not Go where the
compiler "inserts" semi-colons for you, and has no analogous
concept. Rather, as I think Keith said, expressions are parsed
into some internal representation, and then transformed into
something like an abstract syntax tree, where syntactic
notations like parentheses are lost.

Both expressions above correspond to an AST like:

��Ŀ
�BinOp +�
��
? ?
? ?
��Ŀ ��Ŀ
�BinOp *� �Sym `c`�
��
? ?
? ?
��Ŀ ��Ŀ
�Sym `a`� �Sym `b`�
��

But the to get to that, it may be that the compiler uses a
different initial representation, like a parse tree that more
closely resembles the source language grammar. Here, the
two expressions might have different parsed representations.
E.g., for the first, simplifying heavily, may look something
like this:

��Ŀ
� expr �
��
? � ?
? � ?
��Ŀ . ��Ŀ
�term � (+) �term �
�� ' ��
? � ? �
? � ? �
��Ŀ . ��Ŀ ��Ŀ
�ident� (*) �ident� �ident�
�� ' ��
� � �
� � �
.�. .�. .�.
(`a`) (`b`) (`c`)
`�' `�' `�'

While the second might add an extra `expr` node, as in:

��Ŀ
� expr �
��
? � ?
? � ?
��Ŀ . ��Ŀ
� expr � (+) �term �
�� ' ��
� �
� �
��Ŀ ��Ŀ
�term � �ident�
��
? � ? �
? � ? �
��Ŀ . ��Ŀ .�.
�ident� (*) �ident� (`c`)
�� ' �� `�'
� �
� �
.�. .�.
(`a`) (`b`)
`�' `�'

I believe that the answer, for most compilers that parse and
then convert to an AST, the second is more likely to be created
than the first. However, given that the same AST is created
from both parse trees, this is unlikely to have an effect on the
object code ultimately output from the compiler.

You're describing a 'Concrete Syntax Tree' or CST, versus AST.

Although in that case, I expect to see a discrete node for bracketed expressions (ie. parenthesised), as those would also have a distinct production in any formal grammar.

Personally I don't have much use for CSTs for a normal compiler, but
they might be useful for source-to-source translators, or programs that
do source refactoring, where you want to preserve extras such as
parentheses even if they're not strictly needed.

(Injecting the right parentheses for examples like `(a + b) * c' which
would have an AST like '(* (+ a b) c)' is surpringly tricky. Easier to
just follow the original source!

In any case, that still wouldnt't turn ((a+b)) back into the original;
you'd need a suitable CST.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tuesday, June 02, 2026 16:10:31

On 02/06/2026 15:29, Tim Rentsch wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Note that in a context that requires a constant expression, overflow is
a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

gcc disagrees with you.

My testing shows all versions of gcc that I tested on godbolt gave a
warning, even without any options. I don't believe that INT_MAX can
have any type suffixes that would avoid the overflow.

What version of gcc and/or flags let that case label pass without a diagnostic?

(I don't know if Keith is correct about it being a constraint violation
- I have not looked at the details there.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Tuesday, June 02, 2026 15:06:51

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vlvie$2ne3j$2@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

<snip>

Yes. The way one boots an AMD server SoC, for instance,
requires shipping a bunch of binary data structures around to
little microcontrollers spread across a bunch of AXI buses, that
are then responsible for things like configuring PCIe links and
enumerating IO buses and so on. The vendor code for doing this
is opaque, at best. For example, >https://github.com/openSIL/openSIL/blob/turin_poc/xUSL/Nbio/Brh/NbioPcieComplexDataBrh.c
(and that's a cleaned-up version).

Indeed, even the high-speed SERDES now have small microprocessors
that need proprietary firmware loaded at power-on.

Most vendors prefer to keep such details proprietary, for various
reasons both good and bad.

I'll agree that software (firmware) development at chip vendors
has been, in the past, an afterthought with the primary emphasis
on the hardware side. In modern chip design, software has taken
a larger role in both hardware definition, and the software
quality has improved somewhat.

[snip color mapping code]

Yes, that would be vastly better. (I would still prefer to have
different named types for colours in the different encoding schemes.)

I'll see your named types and raise you a bitfield struct. The
shifting and masking is superfluous.

Or useful helpers:

a = bit::extract(value, 12, 0); /* Extract bits 12:0 from value */

b = bit::insert(b, 0x10, 5, 5); /* Insert 0x10 into b starting at bit 5 for 5 bits */

One might also define data structures for control and status registers using bitfield structs.

e.g. for the SATA UAHC_GLB_OOBR register:

union UAHC_GBL_OOBR {
uint32_t u;
struct UAHC_GBL_OOBR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint32_t we : 1; /**< R/W/H - Write enable. */
uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value. Writable only if WE is set. */
uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value. Writable only if WE is set. */
uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value. Writable only if WE is set. */
uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value. Writable only if WE is set. */
#else
uint32_t cimax : 8;
uint32_t cimin : 8;
uint32_t cwmax : 8;
uint32_t cwmin : 7;
uint32_t we : 1;
#endif
} s;
};

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Tuesday, June 02, 2026 15:10:10

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vh1eo$1ei50$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

Both expressions above correspond to an AST like:

��Ŀ
�BinOp +�
��
? ?
? ?
��Ŀ ��Ŀ
�BinOp *� �Sym `c`�
��
? ?
? ?
��Ŀ ��Ŀ
�Sym `a`� �Sym `b`�
��

Ah, the dangers of assuming everyone uses UTF-8.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 02, 2026 15:19:04

In article <10vmmc2$2utb2$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 02/06/2026 14:05, Dan Cross wrote:
You're describing a 'Concrete Syntax Tree' or CST, versus AST.

Yes. "Concrete Syntax Tree" is another name for "Parse Tree".

Although in that case, I expect to see a discrete node for bracketed >expressions (ie. parenthesised), as those would also have a distinct >production in any formal grammar.

Was that not in the second parse tree diagram I presented?
Granted, I called it "expr", but as I noted, I was simplifying
heavily, mostly for space.

Personally I don't have much use for CSTs for a normal compiler, but
they might be useful for source-to-source translators, or programs that
do source refactoring, where you want to preserve extras such as
parentheses even if they're not strictly needed.

I think you're missing the point, here.

The question was whether, given some compiler, `a*b + c`
generates different code from `(a*b) + c`, and what it means for
the compiler to "remove the parentheses." I submit that, with
respect to the former, the answer is "very very unlikely" and
with respect to the latter, the question is a category error.

(Injecting the right parentheses for examples like `(a + b) * c' which
would have an AST like '(* (+ a b) c)' is surpringly tricky. Easier to
just follow the original source!

In any case, that still wouldnt't turn ((a+b)) back into the original;
you'd need a suitable CST.)

That's not related to what I was trying to convey.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 02, 2026 15:31:45

In article <mnCTR.17470$_BG8.10863@fx24.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vh1eo$1ei50$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

Both expressions above correspond to an AST like:

��Ŀ
�BinOp +�
��
? ?
? ?
��Ŀ ��Ŀ
�BinOp *� �Sym `c`�
��
? ?
? ?
��Ŀ ��Ŀ
�Sym `a`� �Sym `b`�
��

Ah, the dangers of assuming everyone uses UTF-8.

Yeah, my bad. Here:

+-------+
|BinOp +|
+-------+
/ \
/ \
+-------+ +-------+
|BinOp *| |Sym `c`|
+-------+ +-------+
/ \
/ \
+-------+ +-------+
|Sym `a`| |Sym `b`|
+-------+ +-------+

(The original looks bad in my newsreader, too.)

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 02, 2026 16:28:36

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? [...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

Given that `foo` has external linkage, I find this hard to
believe, and `clang -fsanitize=undefined` agrees with me,
both emitting a diagnostic about the overflow and generating
code in `foo` to call into the sanitizer machinery.

Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment. In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

But I'm not sure what _you_ mean by "transgress the bounds of
undefined behavior" here.

Even more than that, the program is strictly conforming, and must be
accepted by a conforming implementation.

See above.

Now let's change the program slightly:

#include <limits.h>

int
foo(){
static int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does transgress the bounds of undefined behavior. The
reason for the difference is that in the first program the semantics
of foo() is to evaluate the expression to be stored in 'zero' only
at runtime, whereas in the second program the semantics of foo() is
to evaluate the expression to be stored in 'zero' before program
startup (informally, "at compile time"). What matters is not
whether the offending expression /might/ be evaluated "at compile
time", but whether the offending expression /must/ be evaluated "at
compile time". Only in the second case is undefined behavior
inevitable (and thus it does not occur in the first program).

Fine point: strictly speaking, I believe the C standard allows even
the second program to complete translation phase 8 successfully, and
for any offending behavior to occur only when we actually try to run
the program. To say that another way, there is no requirement that
possible nasal demons be made manifest at any point before an actual >attempted execution. On the other hand, because that possibility is
there lurking in the background, there is no requirement that the
program be accepted, and could be rejected by a conforming compiler.

Indeed. Further, I believe that the same is true for the first
program, as well.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Tuesday, June 02, 2026 13:59:24

On 5/31/2026 3:54 PM, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 31/05/2026 16:24, James Kuyper wrote:

On 2026-05-31 07:18, David Brown wrote:

[...]

People might think they affect the order of evaluation, such as when you >>>> have function calls :

u = foo(x) + (foo(y) + foo(z));

Some people might think the use of parentheses means that "foo(y)" and >>>> "foo(z)" are called before "foo(x)", when the order of all these calls >>>> (and the additions) is unspecified. (Again, a given compiler might be >>>> influenced by the parentheses, but the language does not require it.

You're correct with regard to the function calls, but the
parenthesized addition must be performed first, and the other one
second, which may make a difference, for the same reasons given in my
previous paragraph.

The parentheses do not dictate the order of evaluation. But you are
correct - and it's worth pointing out, so thank you for doing that -
that for floating point operations, the grouping of operations can
affect the result.

The parentheses do not dictate the order of evaluation *of the
operands*. Each "+" can be evaluated (the addition performed)
only after the values of its operands are known. But regardless
of parentheses or operator precedence, the three operands foo(x),
foo(y), and foo(z) can be evaluated in any of 6 possible orders.
(It's different when you have operations like "&&", "||", and ",",
which imposes additional sequence points.)

If you are talking about floating point arithmetic (I was thinking of
integer arithmetic, but did not specify), then the operations are not
necessarily commutative or associative, and the compiler cannot then
re-arrange the operations unless it knows that doing so does not
affect the result.

It's not just floating-point. Signed integer overflow is also relevant.

(INT_MIN + INT_MAX) + 1 is well defined. (INT_MIN + INT_MAX) +1
is equivalent, and is also well defined. INT_MIN + (INT_MAX +1)
has undefined behavior.

But except for specific cases, the order of evaluation - both for the
values and side-effects - of sub-expressions is unspecified. Indeed,
they are unsequenced - the evaluations can interleave.

Usually, both sub-expressions of a binary operator will be evaluated
before the operator itself, simply because usually the results of the
operator cannot be calculated until the sub-expression's values are
known. But this is not a requirement of the language - if the
compiler can get the same results without doing so, it is free to pick
a different order. "(a + b) * 0" does not need to evaluate "a", "b",
or "a + b" at all unless there is a possibility of a side-effect - and
it can perform the side-effects in any order. "a + (b + c)" can check
"a" for a trap representation and deal with that before looking at "b"
and "c" or the results of "b + c", even though it cannot (for floating
point operations) re-arrange the code to do "a + b" first.

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

[...]

10 + 5 - 7 + 3

Oh my this is an error for the programmers logic! they forgot to do:

10 + 5 - (7 + 3)

?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 02, 2026 15:12:41

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

David Brown <david.brown@hesbynett.no> writes:

[...]

<https://cppreference.com/c/language/operator_precedence>
<https://cppreference.com/cpp/language/operator_precedence>

[...]

Both tables are now much clearer. Someone added dividing lines
between the precedence levels.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 02, 2026 15:29:50

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Note that in a context that requires a constant expression, overflow is
a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

gcc disagrees with you.

What makes you think so?

$ cat c.c
#include <limits.h>
int main(void) {
switch (0) {
case (INT_MAX + 1) * 0:
break;
}
}
$ gcc -std=c17 -pedantic-errors -c c.c
c.c: In function ?main?:
c.c:4:23: warning: integer overflow in expression of type ?int? results in ?-2147483648? [-Woverflow]
4 | case (INT_MAX + 1) * 0:
| ^
c.c:4:9: error: overflow in constant expression [-Woverflow]
4 | case (INT_MAX + 1) * 0:
| ^~~~
$

But taking a closer look at the standard, I'm not 100% sure that the
language requires a diagnostic, though I think that's the intent.
The relevant constraint is:

Each constant expression shall evaluate to a constant that is
in the range of representable values for its type.

If I squint really hard, I can argue that the entire expression
has to be a constant expression, but it doesn't say that its
subexpressions are constant expressions -- and *if* INT_MAX +
1 evaluates to INT_MIN in the current implementation, then
(INT_MAX + 1) * 0 evaluates to 0 and therefore satisfies the
constraint.

But INT_MAX + 1 could legally trap, for example, and I don't believe
it was intended that a given expression can be a constant expression
or not depending on the vagaries of the behavior of an instance
of UB.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Thursday, June 04, 2026 02:34:59

Bart <bc@freeuk.com> writes:

On 01/06/2026 03:10, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

On 31/05/2026 17:04, Tim Rentsch wrote:

Richard Harnden <richard.nospam@gmail.invalid> writes:

just write complex expressions in a way that a human can most
easily understand,

Unfortunately, (1) different people have different ideas of what
writing is most easily understood, and (2) different readers have
different notions of which writings are easily understood, and
which writings are not so easily understood. To make things
worse "easily understood" is not a boolean condition, nor is it
necessarily well-ordered -- "most easily understood" isn't always
a well-defined quality, even for a given audience.

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

Actual examples of too many parentheses?

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

My point was that it could be objective, at least for too many. So
(a*a) + (b*b) would be commonly agreed to have too many, [...]

Apparently you misunderstand what is meant by the word objective.
An objective statement is one that is independent of personal
assessment, even collective personal assessment. Reaching consensus
on a question doesn't make the common view an objective one -- just
a commonly held one. Saying the Sun rises in the East is an
objective statement. Saying the temperature is too hot in the month
of September is not an objective statement, even if most people
think so.

And then there is ?: :

a > b ? c : d # (a>b)?c:d
a + b ? c : d # (a+b)?c:d

The grouping of the first is probably what is intended. But in the
second, the intent might have been (a+b)?c:d, or a+(b?c:c); we don't
know for sure that the author didn't make a mistake or we don't know
outselves.

This example is so addlebrained that it's hard to imagine anyone
being confused about it. Or that it's worth any expenditure of
thought wondering what to do about people who are.

I don't understand what the problem is with my examples.

Here is a story from the earliest weeks of all of the time I have
been programming. In one of the first few programs I ever wrote
(and perhaps even the very first one), I had a statement like so:

x = alpha/beta*gamma

Of course the names here are made up, I don't remember the actual
names used. When x was printed out, it gave a value that was
much different from what I expected. What had happened was I had
unconsciously assumed, reasoning by analogy with written
mathematics, that the statement would be interpreted as

alpha
x = ------------
beta*gamma

After getting the program output back, and seeing the unexpected
result, someone explained to me that the statement was interpreted
as

x = (alpha/beta)*gamma

because that was how the language worked. Of course I was surprised
but I learned the rule and after that had no further problems with
how to read such expressions.

There can be ambiguity in the mind of the person looking at such
code as to how the first terms are grouped.

This statement illustrates the problem with examples that you give.
Not only is the presumed reader sort of arbitrarily naive, he or she
is apparently incapable of learning. Everyone who has ever learned
to program has had an experience of a program doing something other
than what was expected, because of a misunderstanding about how the
language works. When that happens, most people simply learn about
their misunderstanding and correct it. The readers in your examples
are like people who started programming after developing Alzheimer's
disease (and no offense meant to anyone afflicted with Alzheimer's).
Maybe there are such people, whether or not caused by a medical
condition, but it doesn't match most programmers' experience, and in
any case is not worth worrying about. If someone can't understand
the rules of the road they shouldn't be behind the wheel of a car.
If someone really can't learn the rules of expression syntax for the
language they are using, they should be advised to try a different
language, or perhaps give up programming altogether. It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without
difficulty. Yet the examples you give insist on focusing on the few
hopeless individuals. It shouldn't be a surprise that most people
don't share your concerns.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Thursday, June 04, 2026 03:37:24

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? [...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly
conforming with respect to conforming hosted implementations.
(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly
conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted implementations.)

Given that `foo` has external linkage, I find this hard to
believe, and `clang -fsanitize=undefined` agrees with me,
both emitting a diagnostic about the overflow and generating
code in `foo` to call into the sanitizer machinery.

A conforming implementation is free to emit a diagnostic whenever it
chooses, for any reason at all, regardless of whether the program
source is legal C or not. (I feel obliged to point out that, if a preprocessing #error directive is encountered, then there may be an
exception to that statement; however, there is no such #error in
the program shown above.)

Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for
not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that
possibility. Whether foo() has external linkage or internal
linkage doesn't change that. Only those actions initiated by
statements in main() are ever elaborated.

But I'm not sure what _you_ mean by "transgress the bounds of
undefined behavior" here.

It's a grammatical fine point. I think for present purposes it's
okay to gloss over the distinction, and say this statement may be
read as saying "the program does not have undefined behavior".

Even more than that, the program is strictly conforming, and must be
accepted by a conforming implementation.

See above.

Now let's change the program slightly:

#include <limits.h>

int
foo(){
static int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does transgress the bounds of undefined behavior. The
reason for the difference is that in the first program the semantics
of foo() is to evaluate the expression to be stored in 'zero' only
at runtime, whereas in the second program the semantics of foo() is
to evaluate the expression to be stored in 'zero' before program
startup (informally, "at compile time"). What matters is not
whether the offending expression /might/ be evaluated "at compile
time", but whether the offending expression /must/ be evaluated "at
compile time". Only in the second case is undefined behavior
inevitable (and thus it does not occur in the first program).

Fine point: strictly speaking, I believe the C standard allows even
the second program to complete translation phase 8 successfully, and
for any offending behavior to occur only when we actually try to run
the program. To say that another way, there is no requirement that
possible nasal demons be made manifest at any point before an actual
attempted execution. On the other hand, because that possibility is
there lurking in the background, there is no requirement that the
program be accepted, and could be rejected by a conforming compiler.

Indeed. Further, I believe that the same is true for the first
program, as well.

It isn't. In the first program the offending expression is never
evaluated, because foo() is never called.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Thursday, June 04, 2026 03:58:35

cross@spitfire.i.gajendra.net (Dan Cross) writes:

[discussing how to produce a 32-bit color value from rgb16]

Here's my offering:

// Converts a 16-bit RGB16 (5-6-5) value to an ARGB32
// ("RGBA8888") value.
static inline uint32_t
rgb16_to_argb(uint16_t color)
{
const uint32_t blue5 = (color >> 0) & 0x1F;
const uint32_t green6 = (color >> 5) & 0x3F;
const uint32_t red5 = (color >> 11) & 0x1F;

// Map from a 5 or 6 bit space into an 8 bit space. A
// 5-bit number has 32 possibilities; a 6 bit number
// has 64. We can calculate the projected 8-bit
// value for a k-bit number v, we can use the formula,
// v_8 = (v*2^8-1 + (k - 1)/2)/(2^k-1), or
// (v*255 + 15)/31 (for k=5) or (v*255 + 31)/63 (for
// k=6.
//
// To remove division by a prime and turn it into a
// shift, the constants below were empirically
// discovered to generate good results. See
// https://stackoverflow.com/questions/2442576/
// how-does-one-convert-16-bit-rgb565-to-24-bit-rgb888
// for details.
const uint32_t blue = (blue5 * 527 + 23) >> 6;
const uint32_t green = (green6 * 259 + 33) >> 6;
const uint32_t red = (red5 * 527 + 23) >> 6;
const uint32_t alpha = 0xFF000000;

return blue | (green << 8) | (red << 16) | alpha;
}

It's longer, yes, but I'd argue it's much easier to understand.
On my compiler, it generates almost identical code, except that
some instructions are in a different order.

I would choose a different approach, for two reasons. One is that,
for code that is likely to be in a header file, my preference is
that it be compilable under C90 rules if possible. The other is
that, given the simple nature of the transformation, it should be
able to produce a constant expression if given a constant input
value. Here is an possible implementation:

#define SOLID_RGB24_of_RGB16( rgb16 ) \
ARGB32_( 255ul, \
SCALE_5_to_8_( BITS_AT_OF_( 5, 11, (rgb16) ) ), \
SCALE_6_to_8_( BITS_AT_OF_( 6, 5, (rgb16) ) ), \
SCALE_5_to_8_( BITS_AT_OF_( 5, 0, (rgb16) ) ) \
)

#define ARGB32_( alpha, red, green, blue ) ( \
alpha << 24 | red << 16 | green << 8 | blue \
)

#define SCALE_5_to_8_( u ) ( u *527ul +23 >>6 )
#define SCALE_6_to_8_( u ) ( u *259ul +33 >>6 )

#define BITS_AT_OF_(width,where,u) ( u >> where & (1ul << width)-1 )

And here is a simple test driver:

const unsigned long some_red = SOLID_RGB24_of_RGB16( 29u << 11 );
const unsigned long some_green = SOLID_RGB24_of_RGB16( 59u << 5 );
const unsigned long some_blue = SOLID_RGB24_of_RGB16( 29u << 0 );

#include <stdio.h>

int
main(){
printf( " red: %#8lx\n", some_red );
printf( " green: %#8lx\n", some_green );
printf( " blue: %#8lx\n", some_blue );
}

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thursday, June 04, 2026 12:40:45

On 04/06/2026 10:34, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

My point was that it could be objective, at least for too many. So
(a*a) + (b*b) would be commonly agreed to have too many, [...]

Apparently you misunderstand what is meant by the word objective.
An objective statement is one that is independent of personal
assessment, even collective personal assessment.

I don't know of any infix PL syntax where 'a*a + b*b', as a standalone expression, doesn't mean '(a*a) + (b*b)'.

Google agrees with me (in that 2*2+3*3 shows 13), and so does my Casio calculator.

It's not my personal opinion!

I'm sure you can trawl for some obscure languages where that expression
works differently, or where you can reassign priority or meaning to
those operators, but that is just being contrary for the sake of it.

Reaching consensus
on a question doesn't make the common view an objective one -- just
a commonly held one.

So, the number of times in this group where I've been told that everyone
else disagrees with me about something so I must be wrong - this was
just your (pl) subjective opinion all along?

In the PL world then it is going to be mainly about subjective opinions!
There are few absolute truths.

But what about this example:

((((((a))))))

'Too many parentheses' is still subjective?

How about '((((a)))) using more parentheses than (a)'; that surely must
be objective?

Here is a story from the earliest weeks of all of the time I have
been programming. In one of the first few programs I ever wrote
(and perhaps even the very first one), I had a statement like so:

x = alpha/beta*gamma

Of course the names here are made up, I don't remember the actual
names used. When x was printed out, it gave a value that was
much different from what I expected. What had happened was I had unconsciously assumed, reasoning by analogy with written
mathematics, that the statement would be interpreted as

alpha
x = ------------
beta*gamma

You will have quickly found out that PL syntax is not mathematics. For a start, mathematics doesn't normally use '*', nor '/' for that matter.

Yes, there is a discrepancy with the precedences of divide and (implied) multiply. However, a*a + b*b example didn't use divide.

(Note that C has its own problems in this area:

a = b/*p; // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

If someone really can't learn the rules of expression syntax for the
language they are using, they should be advised to try a different
language, or perhaps give up programming altogether.

It can be multiple languages, and they might want to write the same
expression the same way in each.

It could be no language: maybe its pseudo-code, or some unspecified
language in a forum which is not language-specific. They want anybody to
just understand it.

This is the scenerio I mentioned where you can risk not using
precedences when expressions involve "+ - * /", comparisons, and AND/OR
since generally these are treated sensibly by infix languages (even in
C, almost).

But operators such as '<< >> & ^ |' are treated more diversely. Here you
would be taking a bigger risk. You could label such code as 'C Syntax'
(if posting for example) but that is just being lazy.

It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without
difficulty. Yet the examples you give insist on focusing on the few
hopeless individuals.

Are you saying that whoever wrote code like this:

crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

is needlessly worrying about the 99.9+% of the readership who you claim
will know C syntax rules precisely? That is, they would find this
version just as clear without any extra cognitive effort:

crcu32 = crcu32 >> 4 ^ s_crc32[crcu32 & 0xF ^ b & 0xF];

?

If so then you are hopelessly wrong.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thursday, June 04, 2026 14:35:25

On 04/06/2026 13:40, Bart wrote:

On 04/06/2026 10:34, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

My point was that it could be objective, at least for too many.� So
(a*a) + (b*b) would be commonly agreed to have too many, [...]

Apparently you misunderstand what is meant by the word objective.
An objective statement is one that is independent of personal
assessment, even collective personal assessment.

I don't know of any infix PL syntax where 'a*a + b*b', as a standalone expression, doesn't mean '(a*a) + (b*b)'.

Google agrees with me (in that 2*2+3*3 shows 13), and so does my Casio calculator.

It's not my personal opinion!

You are - again - moving the goalposts.

It is an objective fact that "a * a + b * b" means "(a * a) + (b * b)"
in normal mathematics (at least in the countries I am familiar with),
and also in most mainstream programming languages.

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely subjective opinion. Even if it is true that this is "commonly agreed
to" (and AFAIK you have no basis for that claim), that would still be a subjective opinion - no matter how common that opinion is.

Does that clear up your misunderstanding about "objective" and
"subjective" ?

�Reaching consensus
on a question doesn't make the common view an objective one -- just
a commonly held one.

So, the number of times in this group where I've been told that everyone else disagrees with me about something so I must be wrong - this was
just your (pl) subjective opinion all along?

Facts and opinions are different. You regularly get facts about C
wrong, and you are told you are wrong - that is objective. You
regularly give opinions that people disagree with, and are told they
disagree - that is subjective.

If you wrote, for example, that "a << b + c" is ambiguous in C, then you
would be factually and objectively wrong. If you wrote that it is
unclear, then you would be expressing a subjective opinion, and people
may or may not agree with you.

Sometimes you might voice an opinion that is so extreme or uncommon that people might tell you you are wrong, when saying they disagree would be
more appropriate - discussions here are not formal.

In the PL world then it is going to be mainly about subjective opinions! There are few absolute truths.

The programming language world is full of absolutely truths. The C
standards, for example, are full of facts about the C language. It is
not just a collection of guidelines or ideas for people to like or dislike.

But what about this example:

�� ((((((a))))))

'Too many parentheses' is still subjective?

Yes, obviously. "More parentheses than necessary" is objective, "too
many parentheses" is subjective. I expect most people will share the
same opinion, but it is still an opinion.

How about '((((a)))) using more parentheses than (a)'; that surely must
be objective?

Yes.

Here is a story from the earliest weeks of all of the time I have
been programming.� In one of the first few programs I ever wrote
(and perhaps even the very first one), I had a statement like so:

�� x = alpha/beta*gamma

Of course the names here are made up, I don't remember the actual
names used.� When x was printed out, it gave a value that was
much different from what I expected.� What had happened was I had
unconsciously assumed, reasoning by analogy with written
mathematics, that the statement would be interpreted as

�� alpha
�� x = ------------
�� beta*gamma

You will have quickly found out that PL syntax is not mathematics. For a start, mathematics doesn't normally use '*', nor '/' for that matter.

It's not so much the symbols, as the layout. A mathematician would not
write "a�b?c" either. They would write it in a way that makes the
intended precedence obvious to other mathematicians reading it, taking
into account the exact symbols used ("a.b" or "ab" might be considered
to bind tighter than "a?b"), the spacing, the position of the symbols on
the page, and - importantly - the context.

Programming can definitely be viewed as a sort of mathematics, but
writing code is not the same as writing mathematics.

Yes, there is a discrepancy with the precedences of divide and (implied) multiply. However, a*a + b*b example didn't use divide.

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong. C does not have a "problem" with this.
The parsing rules of the language are clear - often called "maximum
munch". The character sequence "/*" is the start of a comment, it is
not two separate operators.

You might personally have a problem with this. Whether you do or do not
is also an objective fact, but one that only you can judge. And you can
have a subjective opinion as to whether or not you like the rules of C here.

If someone really can't learn the rules of expression syntax for the
language they are using, they should be advised to try a different
language, or perhaps give up programming altogether.

It can be multiple languages, and they might want to write the same expression the same way in each.

Sure.

I also don't think people should be required to learn all the details of
a language in order to use it. Indeed, for bigger languages (say, C++
or Python) it would be infeasible to learn everything. Exactly where
you draw the lines of what you need to know and what you can look up if necessary will vary by person, and by the type of tasks they are doing
in a language.

It could be no language: maybe its pseudo-code, or some unspecified
language in a forum which is not language-specific. They want anybody to just understand it.

This is the scenerio I mentioned where you can risk not using
precedences when expressions involve "+ - * /", comparisons, and AND/OR since generally these are treated sensibly by infix languages (even in
C, almost).

But operators such as '<< >> & ^ |' are treated more diversely. Here you would be taking a bigger risk. You could label such code as 'C
Syntax' (if posting for example) but that is just being lazy.

It is correct that details here vary more. Whether you think extra parentheses should or should not be used is, however, a subjective
opinion. (My opinion is probably more in line with yours than Tim's
here - but it is still subjective.)

�It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without
difficulty.� Yet the examples you give insist on focusing on the few
hopeless individuals.

Are you saying that whoever wrote code like this:

�� crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

is needlessly worrying about the 99.9+% of the readership who you claim
will know C syntax rules precisely? That is, they would find this
version just as clear without any extra cognitive effort:

�� crcu32 = crcu32 >> 4 ^ s_crc32[crcu32 & 0xF ^ b & 0xF];

?

Tim did not write that. That example was not on the list of examples
you gave recently. The examples a couple of posts up in this branch
were a lot simpler. (That does not mean that Tim's "999 out of 1000"
figures are based on evidence.)

If so then you are hopelessly wrong.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thursday, June 04, 2026 14:18:05

On 04/06/2026 13:35, David Brown wrote:

On 04/06/2026 13:40, Bart wrote:

On 04/06/2026 10:34, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

My point was that it could be objective, at least for too many.� So
(a*a) + (b*b) would be commonly agreed to have too many, [...]

Apparently you misunderstand what is meant by the word objective.
An objective statement is one that is independent of personal
assessment, even collective personal assessment.

I don't know of any infix PL syntax where 'a*a + b*b', as a standalone
expression, doesn't mean '(a*a) + (b*b)'.

Google agrees with me (in that 2*2+3*3 shows 13), and so does my Casio
calculator.

It's not my personal opinion!

You are - again - moving the goalposts.

It is an objective fact that "a * a + b * b" means "(a * a) + (b * b)"
in normal mathematics (at least in the countries I am familiar with),
and also in most mainstream programming languages.

It is an objective fact, therefore, that "(a*a) + (b*b)" has more parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely subjective opinion.

So, you're arguing 'more than needed' is a completely different thing
from 'too many'.

Sigh...

If you wrote, for example, that "a << b + c" is ambiguous in C, then you

It is technically unambiguous in C. It can be ambiguous in the mind of somebody who would have to double-check the precedence levels, or where
the C context is missing.

The discssion seems to about what exactly is 'too many'.

Apparently you can constuct a valid C source file where 99.9% of the
text consists of () characters, but if someone - or even a million
people - say that it is too many, then that is just their subjective
opinion.

I don't have the patience for such nonsense any more:

* The () in '(a * b) + c' are generally unnecessary

* The () in 'a << (b + c)' are advisable

* The () in '(a << b) + c)' are necessary if the intent is to have
what might be the more intuitive meaning.

If this not 100% C-specific, than () are needed for both the last two examples, but not the first.

You all know this.

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with this.
The parsing rules of the language are clear - often called "maximum
munch".� The character sequence "/*" is the start of a comment, it is
not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

That the behaviour is deterministic doesn't change that.

�It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without
difficulty.� Yet the examples you give insist on focusing on the few
hopeless individuals.

Are you saying that whoever wrote code like this:

�� crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

is needlessly worrying about the 99.9+% of the readership who you
claim will know C syntax rules precisely? That is, they would find
this version just as clear without any extra cognitive effort:

�� crcu32 = crcu32 >> 4 ^ s_crc32[crcu32 & 0xF ^ b & 0xF];

?

Tim did not write that.

What was the 'something' in "It's silly to worry about something that ..."?

I assume it's people being unable to understand that second example.

Yet I seee parenthese being used in such cases a LOT more than 0.1% of
the time. 50% or more would be my guess.

� That example was not on the list of examples
you gave recently.

It was posted several times.

(https://github.com/richgel999/miniz/blob/master/miniz.c line 81, second
hit for '>>')

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 04, 2026 15:21:23

In your post you already addressed a lot that I'd have written as well
(more or less).

On 2026-06-04 14:35, David Brown wrote:

On 04/06/2026 13:40, Bart wrote:

[...]

[...]

It is an objective fact that "a * a + b * b" means "(a * a) + (b * b)"
in normal mathematics (at least in the countries I am familiar with),
and also in most mainstream programming languages.

It is an objective fact, therefore, that "(a*a) + (b*b)" has more parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely subjective opinion.� Even if it is true that this is "commonly agreed
to" (and AFAIK you have no basis for that claim), that would still be a subjective opinion - no matter how common that opinion is.

Does that clear up your misunderstanding about "objective" and
"subjective" ?

[...]

Sometimes you might voice an opinion that is so extreme or uncommon that people might tell you you are wrong, when saying they disagree would be
more appropriate - discussions here are not formal.

Right. - And thus I've considered Bart's informal "too many parentheses"
as just a sloppy formulation of "more than necessary" or "some spurious" parentheses. - We should grant him at least the same inaccuracies in his
heated posts as he receives in any heated replies.

Of course in the sense of clearness of communication it's not wrong to
point out inaccurate statements. - Especially if such inaccuracies are deliberately used to rhetorically obfuscate previous wrong statements!

Janis

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Thursday, June 04, 2026 06:38:19

Bart <bc@freeuk.com> writes:

[...]

Thank you for your response. I'm sorry my comments weren't
more helpful to you.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 04, 2026 15:47:38

On 2026-06-04 15:18, Bart wrote:

On 04/06/2026 13:35, David Brown wrote:

[...]

"(a*a) + (b*b) has too many parentheses", on the other hand, is a
purely subjective opinion.

So, you're arguing 'more than needed' is a completely different thing
from 'too many'.

It's a different thing, indeed. The suspicious keyword is "too many";
a valuation, and subjective. - It's no biggie to me, and in my other
post I said that I'd just read it as a sloppy formulated variant of
"more than necessary" or some such. So while inaccurately formulated
I'm fine with that; I understood what you had intended to express.

But the "completely", BTW, in your "is a completely different thing"
is a cheap rhetorical exaggeration to obfuscate or diminish the issue
with your valuating statement. (I don't like such primitive rhetoric
moves.)

[...]

I don't have the patience for such nonsense any more:

* The () in '(a * b) + c' are generally unnecessary

Right.

* The () in 'a << (b + c)' are advisable

Maybe, maybe not. (Depending on the involved persons, and on how they
handle the cases shown below; whether they mix types in subexpressions
or not.)

* The () in '(a << b) + c)' are necessary if the intent is to have
� what might be the more intuitive meaning.

I've already written in some former post about _unnecessarily_ mixing
different types in expressions.

If you stay in such subexpressions with the same types you'll notice
that the parentheses are unnecessary; the C-language's precedences
have been sensibly chosen (in this case[*]).

[*] And even if you add some of ^ | & it's still no problem, unless
you have also any of the comparison operators in your expressions.

Janis

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 04, 2026 15:57:39

On 2026-06-04 15:47, Janis Papanagnou wrote:

On 2026-06-04 15:18, Bart wrote:

[...]

* The () in '(a << b) + c)' are necessary if the intent is to have
�� what might be the more intuitive meaning.

I've already written in some former post about _unnecessarily_ mixing different types in expressions.

To not cause misunderstandings here; by "different types" I meant the
bit-logic and int-arithmetic, as explained in my mentioned former post.
(The technical data types are of course both just some sort of 'int'.)

If you stay in such subexpressions with the same types you'll notice
that the parentheses are unnecessary; the C-language's precedences
have been sensibly chosen (in this case[*]).

[*] And even if you add some of� ^ | &� it's still no problem, unless
you have also any of the comparison operators in your expressions.

Janis

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thursday, June 04, 2026 16:27:28

On 04/06/2026 15:18, Bart wrote:

On 04/06/2026 13:35, David Brown wrote:

On 04/06/2026 13:40, Bart wrote:

On 04/06/2026 10:34, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

My point was that it could be objective, at least for too many.� So
(a*a) + (b*b) would be commonly agreed to have too many, [...]

Apparently you misunderstand what is meant by the word objective.
An objective statement is one that is independent of personal
assessment, even collective personal assessment.

I don't know of any infix PL syntax where 'a*a + b*b', as a
standalone expression, doesn't mean '(a*a) + (b*b)'.

Google agrees with me (in that 2*2+3*3 shows 13), and so does my
Casio calculator.

It's not my personal opinion!

You are - again - moving the goalposts.

It is an objective fact that "a * a + b * b" means "(a * a) + (b * b)"
in normal mathematics (at least in the countries I am familiar with),
and also in most mainstream programming languages.

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a
purely subjective opinion.

So, you're arguing 'more than needed' is a completely different thing
from 'too many'.

Of course they are different things - albeit related things, rather than /completely/ different. One is a question of fact, the other a question
of opinion, and they do not always coincide.

It is a fact that "a << (b + c)" has more parentheses than needed. But
I think we are both of the opinion that it does not have "too many" parentheses - it has an appropriate number of parentheses.

Sigh...

If you wrote, for example, that "a << b + c" is ambiguous in C, then you

It is technically unambiguous in C.

There is no "technically" about it. It is unambiguous in C.

It can be ambiguous in the mind of
somebody who would have to double-check the precedence levels, or where
the C context is missing.

I would not use the word "ambiguous" there - "unclear" would be more appropriate in the situation when someone does not know the C precedence levels.

If you are given the expression and don't know it is in C, it's a very different matter - there are all kinds of things it could mean. In C++,
it could mean concatenating two strings and passing the result to a
output stream. In Forth, it could mean anything you like. With no
context, the expression is not "ambiguous" because that implies that
there is a number of reasonable interpretations - and without context,
there is no limit to the interpretations.

So while I entirely agree that "a << b + c" may not be clear, and may
easily be misinterpreted, "ambiguous" is the wrong word to use.

The discssion seems to about what exactly is 'too many'.

No, it's an attempt to get you to understand the difference between "objective" and "subjective" - fact and opinion. I don't understand why
you are having such a problem here.

Apparently you can constuct a valid C source file where 99.9% of the
text consists of () characters, but if someone - or even a million
people - say that it is too many, then that is just their subjective opinion.

64 levels of nested parentheses is /factually/ and /objectively/ too
many to be guaranteed supported by a conforming C compiler. It takes a
far smaller number to be viewed as too many in the subjective opinion of
a large proportion of people.

I don't have the patience for such nonsense any more:

* The () in '(a * b) + c' are generally unnecessary

Yes. They are unnecessary in C (that is a fact), and most people would
not find them helpful in understanding the expression (that is a claimed
fact, given without evidence, about people's opinions. It is my opinion
that this claimed fact is true).

* The () in 'a << (b + c)' are advisable

That is a subjective opinion. /I/ would generally advise including the parentheses here. Other people might have a different opinion. And
people can have different opinions depending on the target audience.

* The () in '(a << b) + c)' are necessary if the intent is to have
� what might be the more intuitive meaning.

The parentheses in "(a << b) + c" are necessary if the intent is to
shift "a" by "b", and then add "c" to the result. That is fact, not
opinion. Any discussion of "intuitive" is necessarily subjective.

If this not 100% C-specific, than () are needed for both the last two examples, but not the first.

You all know this.

Do /you/ know what is fact and what is opinion here? Do you understand
the difference, after spoon-feeding you these examples?

And do you understand why it is important in a discussion to be able to
make these distinctions? It matters, even if you and I would likely
both want the parentheses mostly in the same places.

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with
this. The parsing rules of the language are clear - often called
"maximum munch".� The character sequence "/*" is the start of a
comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design. It does not
"fall down". It is simply a minor consequence of the choice of operator syntax. Such an expression would occur rarely in code, and to be a
"gotcha" it would need to be realistic for someone to write it, without spaces, and for their code to compile and be used without the mistake
being noticed. Do you think that is in any way realistic? I do not.

And to be "poor design", it needs to be something that is likely to
cause problems (which it is not), or which requires significant effort
to work around. Writing "a = b / *p;" is not challenging, and a lot of
people prefer spaces around binary operators anyway.

I'd say you were making a mountain out of a molehill, but I don't think
it's as big as a molehill.

That the behaviour is deterministic doesn't change that.

Of course it does. If some compilers treated it differently, then there
might be a chance that someone wrote such code and got the expected
results from the tool they were using, even though it was treated
differently by other tools.

�It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without
difficulty.� Yet the examples you give insist on focusing on the few
hopeless individuals.

Are you saying that whoever wrote code like this:

�� crcu32 = (crcu32 >> 4) ^ s_crc32[(crcu32 & 0xF) ^ (b & 0xF)];

is needlessly worrying about the 99.9+% of the readership who you
claim will know C syntax rules precisely? That is, they would find
this version just as clear without any extra cognitive effort:

�� crcu32 = crcu32 >> 4 ^ s_crc32[crcu32 & 0xF ^ b & 0xF];

?

Tim did not write that.

What was the 'something' in "It's silly to worry about something that ..."?

My mind-reading skills are not that well developed.

I assume it's people being unable to understand that second example.

He did not say he was talking about those examples. Given that the
"crc" examples are more distant in the Usenet thread, it seems a stretch
to assume he was referring to them, rather than to the code examples you
had just given. (It would, perhaps, have been helpful if Tim had not
snipped those examples.)

Yet I seee parenthese being used in such cases a LOT more than 0.1% of
the time. 50% or more would be my guess.

� That example was not on the list of examples you gave recently.

It was posted several times.

(https://github.com/richgel999/miniz/blob/master/miniz.c line 81, second
hit for '>>')

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thursday, June 04, 2026 16:46:21

On 04/06/2026 15:27, David Brown wrote:

On 04/06/2026 15:18, Bart wrote:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a
purely subjective opinion.

So, you're arguing 'more than needed' is a completely different thing
from 'too many'.

Of course they are different things - albeit related things, rather
than /completely/ different.� One is a question of fact, the other a question of opinion, and they do not always coincide.

It is a fact that "a << (b + c)" has more parentheses than needed.� But
I think we are both of the opinion that it does not have "too many" parentheses - it has an appropriate number of parentheses.

So saying 'too many' of something will be a subjective opinion? OK, so
let's try compiling this bit of C:

void F(int, int);

int main() {
F(1, 2, 3);
}

8 out of 9 compilers reported 'Too many arguments'.

According to you, that's only their subjective opinion, not an objective
fact?

I tried a version in Go for good measure; it also used 'Too many'.

I think we'll leave it here.

Sigh...

If you wrote, for example, that "a << b + c" is ambiguous in C, then you >>

It is technically unambiguous in C.

There is no "technically" about it.� It is unambiguous in C.

It can be ambiguous in the mind of somebody who would have to double-
check the precedence levels, or where the C context is missing.

I would not use the word "ambiguous" there - "unclear" would be more appropriate in the situation when someone does not know the C precedence levels.

What would think if you saw this:

r << 16 + g << 8 + b

Did they really mean 'r << (16 + g) << (8 + b)' ?

No, it's an attempt to get you to understand the difference between "objective" and "subjective" - fact and opinion.� I don't understand why
you are having such a problem here.

See my example above with compilers. Maybe you can give all their
authors the same patronising talk.

* The () in '(a << b) + c)' are necessary if the intent is to have
�� what might be the more intuitive meaning.

The parentheses in "(a << b) + c" are necessary if the intent is to
shift "a" by "b", and then add "c" to the result.� That is fact, not opinion.� Any discussion of "intuitive" is necessarily subjective.

Intuitive because here << performs the same scaling function as multiply:

a << b is the same as a * 2**b

a * b is the same as a << log2(b) when b is a power of two
(or thereabouts!)

The point is: they naturally belong together.

Given 'a * 8 + b' or 'a << 3 + b', it is desirable to freely convert one
to the other without having to restructure the parentheses.

�� a = b/*p;�� // divide b by dereferenced pointer p

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.� It does not "fall down".� It is simply a minor consequence of the choice of operator syntax.� Such an expression would occur rarely in code, and to be a
"gotcha" it would need to be realistic for someone to write it, without spaces, and for their code to compile and be used without the mistake
being noticed.� Do you think that is in any way realistic?� I do not.

It's a poor show. This program:

#include <stdio.h>
int main() {
int a=1, b=200, c=3, d=77;
int *p = &d;

a = b / *p;
c = d /* comment*/ + 5;

printf("%d\n", a);
printf("%d\n", c);
}

displays 2 82. If that space between / and * is lost, it still compiles,
but displays 205 3.

Yes, it's unlikely, but so what? You don't dismess such issues in a PL
by crossing your fingers and suggesting it's unlikely to come up.

There are actually other issues associated with /**/ comments; here
someone forgot to terminate the first comment:

puts("one"); /* comment 1
puts("two"); /* commmet 2 */
puts("three"); /* comment 3 */

The middle line is silently elided. This is one with // comments:

puts("one"); // file c:\cx\
puts("two");
puts("three");

Again, the middle line is commented out.

I'd say C comments have a few issues. That the standard explains exactly
how they work doesn't help.

And to be "poor design", it needs to be something that is likely to
cause problems

But you would choose not to have these issues in a new language.

What was the 'something' in "It's silly to worry about something
that ..."?

My mind-reading skills are not that well developed.

It didn't stop you giving an opinion about what you thought he meant!

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thursday, June 04, 2026 16:18:07

David Brown <david.brown@hesbynett.no> writes:

On 04/06/2026 15:18, Bart wrote:

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with
this. The parsing rules of the language are clear - often called
"maximum munch".� The character sequence "/*" is the start of a
comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thursday, June 04, 2026 17:23:16

On 04/06/2026 17:18, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/06/2026 15:18, Bart wrote:

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with
this. The parsing rules of the language are clear - often called
"maximum munch".� The character sequence "/*" is the start of a
comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

How does that not make it bad design?

The proprocessor would strip everything from the /* until the next
matching */, so a chunk of your program goes missing.

If lucky, what's left will be an error, but not always.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Thursday, June 04, 2026 16:31:35

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? [...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly >conforming with respect to conforming hosted implementations.
(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly
conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted >implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for
not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that
possibility.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

Whether foo() has external linkage or internal
linkage doesn't change that.

I disagree. There's no possible way for the implementation to
know whether a function with external linkage will be ultimately
invoked or not; consider a system that supports loadable shared
modules. Nothing prevents even this simple program from being
compiled as a shared module, dynamically loaded, the loading
program explicitly searching for and finding the symbol
corresponding to the `foo` function, and invoking it.

Hence, the compiler _must_ treat with UB as written, which is
why `ubsan` inserts trapping code in `foo`.

In your example, `foo` clearly exhibits UB; I think your
argument is whether that has a realized effect or not, since the
UB is not invoked. I'm saying that in general a compiler cannot
possibly know that when it compiles `foo`, and is free to assume
the worst.

Only those actions initiated by
statements in main() are ever elaborated.

This is not true: code can obviously run outside of the bounds
of `main`, for several reasons.

First, there is the issue of static initializers, which you had
mentioned earlier. Not at play here, but it does invalidate
your statement above, as these run "before" main is invoked.

Second, we know that code can run after because `atexit` can be
used to register handlers that will run after it terminates: as
section 5.1.2.3.4 of n3220 says, "a return from the initial call
to the main function is equivalent to calling the exit function
with the value returned by the main function as its argument",
which means that it will run `atexit` handlers. (But, as the
footnote warns, lifetimes of variables with automatic storage
duration have ended in this case in accordance with sec 6.2.4,
since `main` has terminated.)

Third, it is possible to invoke code that may conditionally be
executed, such as signal handlers, in response to external
events. Certainly, `signal(SIGINT, some_handler);` does not
immediately guarantee that `some_handler` is run, but it does
not prevent it from running, either.

Of course, for the second and third points, we must acknowledge
that one might quibble about what it means to say that a program
invokes "actions initiated by statements in main()".
Registering signal and exit handlers is (generally) going to be
something done in as the consequence of an "action initiated by
statements in main()". And subsequent invocation of (say)
`some_handler` in the example above in response to receipt of a
`SIGINT` signal is arguably a consequence of that.

But I'm not sure what _you_ mean by "transgress the bounds of
undefined behavior" here.

It's a grammatical fine point. I think for present purposes it's
okay to gloss over the distinction, and say this statement may be
read as saying "the program does not have undefined behavior".

Except it does. `foo` is an example of what Regehr calls a
"Type 3" function in https://blog.regehr.org/archives/213.

Also you are discounting time-travel; code not not actually
invoke UB to suffer from it. The mere existence of it can be
enough.
https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633

Moreover, undefined behavior simply is; the definition from the
C standard does not say that time-traveling "post-modern"
compilers are free to assume and do anything if they observe UB
anywhere in a program.

Here, because `foo` has external linkage, the compiler cannot
know whether `foo` is invoked or not, and I see nothing
preventing it from assuming that the entire program is in error.
In particular, I don't think there is anything that prevents a
compiler from simply emitting `int main(void) { abort(); }`.

Even more than that, the program is strictly conforming, and must be
accepted by a conforming implementation.

See above.

Now let's change the program slightly:

#include <limits.h>

int
foo(){
static int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does transgress the bounds of undefined behavior. The
reason for the difference is that in the first program the semantics
of foo() is to evaluate the expression to be stored in 'zero' only
at runtime, whereas in the second program the semantics of foo() is
to evaluate the expression to be stored in 'zero' before program
startup (informally, "at compile time"). What matters is not
whether the offending expression /might/ be evaluated "at compile
time", but whether the offending expression /must/ be evaluated "at
compile time". Only in the second case is undefined behavior
inevitable (and thus it does not occur in the first program).

Fine point: strictly speaking, I believe the C standard allows even
the second program to complete translation phase 8 successfully, and
for any offending behavior to occur only when we actually try to run
the program. To say that another way, there is no requirement that
possible nasal demons be made manifest at any point before an actual
attempted execution. On the other hand, because that possibility is
there lurking in the background, there is no requirement that the
program be accepted, and could be rejected by a conforming compiler.

Indeed. Further, I believe that the same is true for the first
program, as well.

It isn't. In the first program the offending expression is never
evaluated, because foo() is never called.

See above.

Of course, I don't think any of this would _actually_ happen,
and if it did, one should take the compiler that does it and
toss it in the trash. But I don't think it's prohibited,
either; such is one of the consequences of an informal
specification like the C standard. Time-travel is especially
pernicious in post-modern compilers.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thursday, June 04, 2026 16:47:50

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:18, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/06/2026 15:18, Bart wrote:

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with
this. The parsing rules of the language are clear - often called
"maximum munch".� The character sequence "/*" is the start of a
comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

How does that not make it bad design?

The proprocessor would strip everything from the /* until the next
matching */, so a chunk of your program goes missing.

Whatcha talkin' 'bout willis?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 04, 2026 19:47:25

On 2026-06-04 18:18, Scott Lurndal wrote:

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

Curious; was the comment-handling at some point in history removed
from the Cpp-processing? - If so, when was that? And I assume the
semantics are still the same; is that correct?

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 04, 2026 20:15:35

On 2026-06-04 17:46, Bart wrote:

On 04/06/2026 15:27, David Brown wrote:

On 04/06/2026 15:18, Bart wrote:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a
purely subjective opinion.

So, you're arguing 'more than needed' is a completely different thing
from 'too many'.

Of course they are different things - albeit related things, rather
than /completely/ different.� One is a question of fact, the other a
question of opinion, and they do not always coincide.

It is a fact that "a << (b + c)" has more parentheses than needed.
But I think we are both of the opinion that it does not have "too
many" parentheses - it has an appropriate number of parentheses.

So saying 'too many' of something will be a subjective opinion?

Oh, I feel guilty! - My hint on the _suspicious_ "too many" keyword
got (wrongly!) *generalized* as always being a subjective valuation
in all cases. - I apologize to have contributed to your confusion.

OK, so let's try compiling this bit of C:

� void F(int, int);

� int main() {
�� F(1, 2, 3);
� }

8 out of 9 compilers reported 'Too many arguments'.

And that is correct in _this semantic context_. - The rules require
two integers and providing three is too many of course; that's a fact.

In x=(((((((a))))))); the rules allow the parenthesis, but they are
neither required nor do they seem to serve any sensible purpose. Here
the "two many" (w.r.t. clearness of the expression) is a subjectively
common and sensible valuation.

[...]

I think we'll leave it here.

(I'd have hoped we could leave all that from the beginning!)

[...]

* The () in '(a << b) + c)' are necessary if the intent is to have
�� what might be the more intuitive meaning.

The parentheses in "(a << b) + c" are necessary if the intent is to
shift "a" by "b", and then add "c" to the result.� That is fact, not
opinion.� Any discussion of "intuitive" is necessarily subjective.

Intuitive because here << performs the same scaling function as multiply:

� a << b�� is the same as a * 2**b

� a * b�� is the same as a << log2(b) when b is a power of two
�� (or thereabouts!)

("or thereabouts"? - You are squirming and lacking the precision that
would be necessary here for a sensual consideration of the concepts.)

The point is: they naturally belong together.

You can express the shift by arithmetic, yes. And you can express some
*special cases* of arithmetic by the shifts. - That doesn't imply that
you should thus mix types. Rather the opposite; if you stay within the respective operation class the precedences of the C-languages support
you with no parentheses necessary while staying withing the respective operation classes.

The point, is if you operate on bits you should best use bit-operations
and if you do arithmetic you should best use arithmetic operations.
(The word "best" expresses my personal valuation based on explanations
I already gave before and repeat below - it may be worth to think about
that for a moment before continuing.)

Given 'a * 8 + b' or 'a << 3 + b', it is desirable to freely convert one
to the other without having to restructure the parentheses.

No, it's undesirable if you want to express a cleanly typed expression.

If (for some reason) you don't care about a clear separation *then* you
might *need*, and probably *should* use parentheses to reestablish the clearness that you gave up in the first place by mixing the arithmetic
and bit type operations.

I suggest if you intend arithmetic write a * 8 + b (not a * 8 | b ),
if you intend bit operations write u << 3 | v (unless there's reason)
and you need no parentheses. So that then, in the cases where the shift
value is to be calculated, you may write u << a + b
and also need no parentheses (but you can of course use them if you're
unsure about readers' understanding or if you as programmer are unsure
about it despite existing precedence tables and given explanations).[*]

The precedences in "C" are sensibly defined in all those cases.

Janis

[*] Note that I used different letters to enhance comprehensibility for
you. (Where you used the same names because you haven't been capable of recognizing the point and throw all in one bag thus missing the point
of differentiating the two operation classes.)

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lew Pitcher@3:633/10 to All on Thursday, June 04, 2026 18:45:23

On Thu, 04 Jun 2026 16:18:07 +0000, Scott Lurndal wrote:

[snip]

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

So, I've looked through "The C Programming Language" (the K&R C)
and the paper "A Tour Through the Portable C Compiler" (S. C.
Johnson, circa 1974), and neither document states that the
preprocessor strips comments. In fact, the mentions of the
preprocessor are exclusively about the #operation operators,
and not about C comments.

In "A Tour Through the Portable C Compiler", Mr. Johnson explicitly
states that the 1st compiler pass (which follows the preprocessor pass)
takes care of the comments. Specifically, Mr. Johnson says
"Pass 1
The first pass does lexical analysis, parsing, symbol table
maintenance, tree building, optimization, and a number of
machine dependant things. ...

Lexical Analysis
The lexical analyzer is a conceptually simple routine that reads
the input and returns the tokens of the C language as it encounters
them ... The conceptual simplicity of this job is confounded a bit
by several other simple jobs that unfortunately must go on
simultaneously. These include
...
* Skipping comments
"
It appears that, in at least one seminal C compiler, the job of reducing comments to whitespace was not part of the preprocessor's responsibility
but was instead implemented as part of the first (lexical) pass of the
compiler proper.

--
Lew Pitcher
"In Skills We Trust"
Not LLM output - I'm just like this.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thursday, June 04, 2026 20:54:42

On 04/06/2026 17:46, Bart wrote:

On 04/06/2026 15:27, David Brown wrote:

On 04/06/2026 15:18, Bart wrote:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a
purely subjective opinion.

So, you're arguing 'more than needed' is a completely different thing
from 'too many'.

Of course they are different things - albeit related things, rather
than /completely/ different.� One is a question of fact, the other a
question of opinion, and they do not always coincide.

It is a fact that "a << (b + c)" has more parentheses than needed.
But I think we are both of the opinion that it does not have "too
many" parentheses - it has an appropriate number of parentheses.

So saying 'too many' of something will be a subjective opinion? OK, so
let's try compiling this bit of C:

� void F(int, int);

� int main() {
�� F(1, 2, 3);
� }

8 out of 9 compilers reported 'Too many arguments'.

According to you, that's only their subjective opinion, not an objective fact?

Again - /please/ stop trying to guess what people say or put words in
their mouths. I can't remember ever seeing you do so accurately.

"Too many parentheses" is subjective, because they affect the ease of
reading the code as a human reader. "Too many arguments in a function
call" affects the semantics of the code - it is objective fact. It is
not something that involves human opinions.

I think it would be easier to explain this to my cat than to you.
Simple logic seems to be completely beyond your grasp.

My mind-reading skills are not that well developed.

It didn't stop you giving an opinion about what you thought he meant!

I did not claim to know, or even assume, what he /meant/ - I commented
on what he /said/. That was factual. And I made a comment on what I /thought/ it was likely that he meant (or did not mean). That was
opinion, and clearly so. The words are in the post for all to see, the thoughts behind those words are not.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thursday, June 04, 2026 19:57:58

On 04/06/2026 17:47, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:18, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/06/2026 15:18, Bart wrote:

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with
this. The parsing rules of the language are clear - often called
"maximum munch".� The character sequence "/*" is the start of a
comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

How does that not make it bad design?

The proprocessor would strip everything from the /* until the next
matching */, so a chunk of your program goes missing.

Whatcha talkin' 'bout willis?

What were /you/ talking about? What was your point?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 04, 2026 11:59:49

Bart <bc@freeuk.com> writes:

On 04/06/2026 13:35, David Brown wrote:

On 04/06/2026 13:40, Bart wrote:

On 04/06/2026 10:34, Tim Rentsch wrote:

Bart <bc@freeuk.com> writes:

My point was that it could be objective, at least for too many.� So
(a*a) + (b*b) would be commonly agreed to have too many, [...]

Apparently you misunderstand what is meant by the word objective.
An objective statement is one that is independent of personal
assessment, even collective personal assessment.

I don't know of any infix PL syntax where 'a*a + b*b', as a
standalone expression, doesn't mean '(a*a) + (b*b)'.

Google agrees with me (in that 2*2+3*3 shows 13), and so does my
Casio calculator.

It's not my personal opinion!

You are - again - moving the goalposts.
It is an objective fact that "a * a + b * b" means "(a * a) + (b *
b)" in normal mathematics (at least in the countries I am familiar
with), and also in most mainstream programming languages.
It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming
languages.
"(a*a) + (b*b) has too many parentheses", on the other hand, is a
purely subjective opinion.

So, you're arguing 'more than needed' is a completely different thing
from 'too many'.

Sigh...

Yes, it's a different thing, assuming at least one reasonable
interpretation of "more than needed". But if you use the phrase
"more than needed" without specifying *for what purpose*, you have
ammunition for a long pointless argument.

`(a*a) + (b*b)` objectively has more parentheses than are needed
*for the purpose of telling the compiler which operations go with
which operands*. Assuming it's a full expression, it's exactly
equivalent to `a*a + b*b`.

My subjective opinion is that `(a*a) + (b*b)` has "too many"
parentheses. The relative precedences of "*" and "+" are
sufficiently well known that I find the parentheses distracting.

A subjective opinion doesn't become objective just because almost
everyone agrees with it.

Even the idea that "*" should bind more tightly than "+" is
subjective. It's a rule that only goes back to the 1600s or so.
Mathematicians *invented* it. There are real advantages to that
choice, and *tremendous* advantages to having a near-universal
convention, but for example strict left-to-right association would
also have been a valid choice. (And implementing an expression
parser that binds "+" more tightly than "*" could be an interesting
exercise, though few would want to use it in practice.) Again, a
subjective preference doesn't become objective just because nearly
everyone agrees with it.

On the other hand, if I'm explaining the precedence rules, I might
say (as I did above) that `a*a + b*b` is equivalent to `(a*a) + (b*b)`.
In that context the parentheses are not "too many"; they're a
necessary part of the explanation. I find them to be "too many"
for the purpose of writing clear code, but not for some more
specialized purposes.

If you wrote, for example, that "a << b + c" is ambiguous in C, then
you

It is technically unambiguous in C. It can be ambiguous in the mind of somebody who would have to double-check the precedence levels, or
where the C context is missing.

Agreed.

The discssion seems to about what exactly is 'too many'.

If so, then we need to be clear what "too many" means. Too many
for what purpose?

Apparently you can constuct a valid C source file where 99.9% of the
text consists of () characters, but if someone - or even a million
people - say that it is too many, then that is just their subjective
opinion.

(((((((((((((((a))))))))))))))), without any more context, is likely
to be "too many" parentheses. If it's the result of a complicated
macro expansion or machine-generated code, it might not matter.
If the purpose is to test what depth of parentheses a compiler
supports without crashing, it's probably not nearly enough.

I don't have the patience for such nonsense any more:

* The () in '(a * b) + c' are generally unnecessary

* The () in 'a << (b + c)' are advisable

* The () in '(a << b) + c)' are necessary if the intent is to have
what might be the more intuitive meaning.

I agree on all three points (apart from the mismatched ")" in the
last one).

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thursday, June 04, 2026 21:04:50

On 04/06/2026 19:47, Janis Papanagnou wrote:

On 2026-06-04 18:18, Scott Lurndal wrote:

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

Curious; was the comment-handling at some point in history removed
from the Cpp-processing? - If so, when was that? And I assume the
semantics are still the same; is that correct?

No, at least since the standardisation of the C language (including K&R "standard"), "preprocessing" has been an integral part of the C language
and conversion of comments to space characters is done in phase 3 of the translation. But the C standards do not give an explicit distinction
between "preprocessing" and "compiling" - just different translation
phases. (They do not define a "compiler" at all.) It is not uncommon
for implementations to separate translation into two or more programs, especially in the good old days when hosts had much less memory, but
logically they are all one implementation. Distinguishing "the compiler itself" is somewhat artificial.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 04, 2026 12:11:45

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-04 18:18, Scott Lurndal wrote:

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

Curious; was the comment-handling at some point in history removed
from the Cpp-processing? - If so, when was that? And I assume the
semantics are still the same; is that correct?

According to the standard, each comment is replaced by one space
character in translation phase 3. For implementations where the
preprocessor is a separate program, it typically handles translation
phases 1-6 or 1-7. ("gcc -E" doesn't splice string literals.)

The semantics may have been different in some ancient
implementations. For example, I vaguely recall that it was common
for ABC/**/DEF to be equivalent to ABCDEF. K&R1 says that comments
are treated as whitespace.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Lew Pitcher@3:633/10 to All on Thursday, June 04, 2026 19:13:36

On Thu, 04 Jun 2026 21:04:50 +0200, David Brown wrote:

On 04/06/2026 19:47, Janis Papanagnou wrote:

On 2026-06-04 18:18, Scott Lurndal wrote:

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

Curious; was the comment-handling at some point in history removed
from the Cpp-processing? - If so, when was that? And I assume the
semantics are still the same; is that correct?

No, at least since the standardisation of the C language (including K&R "standard"), "preprocessing" has been an integral part of the C language
and conversion of comments to space characters is done in phase 3 of the translation. But the C standards do not give an explicit distinction between "preprocessing" and "compiling" - just different translation
phases. (They do not define a "compiler" at all.) It is not uncommon
for implementations to separate translation into two or more programs, especially in the good old days when hosts had much less memory, but logically they are all one implementation. Distinguishing "the compiler itself" is somewhat artificial.

In historic Unix (Version 7 and before), the preprocessor was implemented
as a separate program ("cpp") from the compiler ("cc"). The compiler itself
had no facility to handle preprocessor directives, and was, itself, often divided into two separate programs ("cc0" and "cc1"). All three phases
("cpp", "cc0" and "cc1") were managed by a program ("cc"), although the
program for each phase could be invoked independently through manual
execution.

What differs from today is that the preprocessor was an optional component, made available for a programmer's convenience.

--
Lew Pitcher
"In Skills We Trust"
Not LLM output - I'm just like this.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thursday, June 04, 2026 20:29:39

On 04/06/2026 19:54, David Brown wrote:

On 04/06/2026 17:46, Bart wrote:

On 04/06/2026 15:27, David Brown wrote:

On 04/06/2026 15:18, Bart wrote:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages. >>>>>
"(a*a) + (b*b) has too many parentheses", on the other hand, is a
purely subjective opinion.

So, you're arguing 'more than needed' is a completely different
thing from 'too many'.

Of course they are different things - albeit related things, rather
than /completely/ different.� One is a question of fact, the other a
question of opinion, and they do not always coincide.

It is a fact that "a << (b + c)" has more parentheses than needed.
But I think we are both of the opinion that it does not have "too
many" parentheses - it has an appropriate number of parentheses.

So saying 'too many' of something will be a subjective opinion? OK, so
let's try compiling this bit of C:

�� void F(int, int);

�� int main() {
�� F(1, 2, 3);
�� }

8 out of 9 compilers reported 'Too many arguments'.

According to you, that's only their subjective opinion, not an
objective fact?

Again - /please/ stop trying to guess what people say or put words in
their mouths.� I can't remember ever seeing you do so accurately.

This is what you actually said:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely subjective opinion. Even if it is true that this is "commonly agreed
to" (and AFAIK you have no basis for that claim), that would still be a subjective opinion - no matter how common that opinion is.

You're saying that:

* "more than needed" is objective
* "too many" is subjective

Even though both are about exactly the same thing: superfluous but
harmless parentheses in an expression.

So you are picking on my choice of words, apparently in order to win
some stupid argument on the internet. Even though the same "too many"
phrase used elsewhere can be objective, according to you.

This looks like a pattern: people here seem to have remarkable trouble debating with me on actual ideas and resort instead to find hidden significance in the some choice of words I'd happen to use.

"Too many parentheses" is subjective, because they affect the ease of reading the code as a human reader.

And 'more than needed' isn't that?!

Why don't you write a bunch of expressions with variable numbers of parentheses, and against each tick off whether 'more than needed' and
'too many' is true.

I'd be interested in whether there would be any difference in the two
columns, and if there is one, as what point they would diverge.

No, this is just getting ludicrous and suggests not wanting to tackle
the real subject: should people write '(a << b) & c' or 'a << b & c'?

Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Presumably, the same 99.9% will not use indentation, and will write
their programs all on one line anyway, because it is still after all completely unambiguous according to the C standard!

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Thursday, June 04, 2026 20:19:58

In article <10vsh43$b3is$1@dont-email.me>,
Lew Pitcher <lew.pitcher@digitalfreehold.ca> wrote:

On Thu, 04 Jun 2026 16:18:07 +0000, Scott Lurndal wrote:

[snip]

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

So, I've looked through "The C Programming Language" (the K&R C)
and the paper "A Tour Through the Portable C Compiler" (S. C.
Johnson, circa 1974), and neither document states that the
preprocessor strips comments. In fact, the mentions of the
preprocessor are exclusively about the #operation operators,
and not about C comments.

The PDP-11 compiler from 5th Edition research Unix removes
comments in `cc.c`. The 1972 compilers from Dennis Ritchie's
web page remove them in the compiler proper, as they predated
the preprocessor: https://www.nokia.com/bell-labs/about/dennis-m-ritchie/primevalC.html

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thursday, June 04, 2026 20:31:20

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsh43$b3is$1@dont-email.me>,
Lew Pitcher <lew.pitcher@digitalfreehold.ca> wrote:

On Thu, 04 Jun 2026 16:18:07 +0000, Scott Lurndal wrote:

[snip]

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

So, I've looked through "The C Programming Language" (the K&R C)
and the paper "A Tour Through the Portable C Compiler" (S. C.
Johnson, circa 1974), and neither document states that the
preprocessor strips comments. In fact, the mentions of the
preprocessor are exclusively about the #operation operators,
and not about C comments.

The PDP-11 compiler from 5th Edition research Unix removes
comments in `cc.c`. The 1972 compilers from Dennis Ritchie's
web page remove them in the compiler proper, as they predated
the preprocessor: >https://www.nokia.com/bell-labs/about/dennis-m-ritchie/primevalC.html

The v6 cpp.c processes the comments
and deletes them if the 'passcom' (-C) flag is not set.

case '/': for (;;) {
if (*p++=='*') {/* comment */
if (!passcom) {inp=p-2; dump(); ++flslvl;}
for (;;) {
while (!iscom(*p++));
if (p[-1]=='*') for (;;) {
if (*p++=='/') goto endcom;
if (eob(--p)) {
if (!passcom) {inp=p; p=refill(p);}
else if ((p-inp)>=BUFSIZ) {/* split long comment */
inp=p; p=refill(p); /* last char written is '*' */
putc('/',fout); /* terminate first part */
/* and fake start of 2nd */
outp=inp=p-=3; *p++='/'; *p++='*'; *p++='*';
} else p=refill(p);
} else break;
} else if (p[-1]=='\n') {
++lineno[ifno]; if (!passcom) putc('\n',fout);
} else if (eob(--p)) {
if (!passcom) {inp=p; p=refill(p);}
else if ((p-inp)>=BUFSIZ) {/* split long comment */
inp=p; p=refill(p);
putc('*',fout); putc('/',fout);
outp=inp=p-=2; *p++='/'; *p++='*';
} else p=refill(p);
} else ++p; /* ignore null byte */
}
endcom:
if (!passcom) {outp=inp=p; --flslvl; goto again;}
break;
}
if (eob(--p)) p=refill(p);
else break;

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Thursday, June 04, 2026 16:33:52

On 2026-06-04 13:47, Janis Papanagnou wrote:

On 2026-06-04 18:18, Scott Lurndal wrote:

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

Curious; was the comment-handling at some point in history removed
from the Cpp-processing? - If so, when was that? And I assume the
semantics are still the same; is that correct?

That question can only be answered in the context of a particular implementation of C. The C standard defines only what the entire
implementation must do when translating and executing a program. Whether
all of those tasks are performed by a single program, or whether
responsibility for different parts of the process are given to different programs is an implementation detail outside the scope of the C standard.
cpp basically implemented translation phases 1-4. cc implemented phases
5-7. The linker implemented phase 8. But those statements are only
partially accurate, and other impllementations divided the tasks
differently.

One advantage of having a single program do the whole thing, is that
error messages can mention the actual text of the line where a problem
was detected, without any pre-processing applied.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thursday, June 04, 2026 20:34:23

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:47, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:18, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/06/2026 15:18, Bart wrote:

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with >>>>>>> this. The parsing rules of the language are clear - often called >>>>>>> "maximum munch".� The character sequence "/*" is the start of a
comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

How does that not make it bad design?

The proprocessor would strip everything from the /* until the next
matching */, so a chunk of your program goes missing.

Whatcha talkin' 'bout willis?

What were /you/ talking about? What was your point?

Your inaccurate characterization that a chunk of the program
went "missing". Nothing meaningful is missing (and the comment
remains in the original source file).

So what do you mean, exactly, when you claim that the output of
the preprocessor causes a chunk of the program (which doesn't
include whitespace or comments) is missing?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 04, 2026 13:36:52

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly >>conforming with respect to conforming hosted implementations. >>(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly
conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted >>implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for
not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that
possibility.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

N3220 5.1.2.4, Program semantics.

It defines the *observable behavior* of a program, which consists of
accesses to volatile objects, data written to files, and I/O dynamics of interactive devices.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and
dangerous.

Whether foo() has external linkage or internal
linkage doesn't change that.

I disagree. There's no possible way for the implementation to
know whether a function with external linkage will be ultimately
invoked or not; consider a system that supports loadable shared
modules. Nothing prevents even this simple program from being
compiled as a shared module, dynamically loaded, the loading
program explicitly searching for and finding the symbol
corresponding to the `foo` function, and invoking it.

Remember that linking is translation phase 8. The compiler is not
the entire implementation.

Hence, the compiler _must_ treat with UB as written, which is
why `ubsan` inserts trapping code in `foo`.

I don't know what "_must_ treat with UB" means.

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid. But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. It can
reject it if it can prove that it *always* has undefined behavior
during execution.

In your example, `foo` clearly exhibits UB; I think your
argument is whether that has a realized effect or not, since the
UB is not invoked. I'm saying that in general a compiler cannot
possibly know that when it compiles `foo`, and is free to assume
the worst.

foo() exhibits UB if and only if it's called during execution.

Yes, a compiler can't know whether foo() will be called.
An implementation, particularly a linker, might know, but is not
required to. No, it is not free to assume the worst.

I certainly wouldn't want a compiler to reject `1/time(NULL)`
because it can't prove that time(NULL) won't be zero, or reject
`argc+1` because it can't prove that argc < INT_MAX. Code whose
behavior would be undefined if it were executed has no behavior
(and therefore no UB) if it's not executed.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Thursday, June 04, 2026 20:41:28

In article <sglUR.17897$pxGb.10844@fx07.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsh43$b3is$1@dont-email.me>,
Lew Pitcher <lew.pitcher@digitalfreehold.ca> wrote:

On Thu, 04 Jun 2026 16:18:07 +0000, Scott Lurndal wrote:

[snip]

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

So, I've looked through "The C Programming Language" (the K&R C)
and the paper "A Tour Through the Portable C Compiler" (S. C.
Johnson, circa 1974), and neither document states that the
preprocessor strips comments. In fact, the mentions of the
preprocessor are exclusively about the #operation operators,
and not about C comments.

The PDP-11 compiler from 5th Edition research Unix removes
comments in `cc.c`. The 1972 compilers from Dennis Ritchie's
web page remove them in the compiler proper, as they predated
the preprocessor: >>https://www.nokia.com/bell-labs/about/dennis-m-ritchie/primevalC.html

The v6 cpp.c processes the comments
and deletes them if the 'passcom' (-C) flag is not set.

[snip]

You sure? That looks like V7 code to me.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thursday, June 04, 2026 20:49:08

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <sglUR.17897$pxGb.10844@fx07.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsh43$b3is$1@dont-email.me>,
Lew Pitcher <lew.pitcher@digitalfreehold.ca> wrote:

On Thu, 04 Jun 2026 16:18:07 +0000, Scott Lurndal wrote:

[snip]

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

So, I've looked through "The C Programming Language" (the K&R C)
and the paper "A Tour Through the Portable C Compiler" (S. C.
Johnson, circa 1974), and neither document states that the
preprocessor strips comments. In fact, the mentions of the
preprocessor are exclusively about the #operation operators,
and not about C comments.

The PDP-11 compiler from 5th Edition research Unix removes
comments in `cc.c`. The 1972 compilers from Dennis Ritchie's
web page remove them in the compiler proper, as they predated
the preprocessor: >>>https://www.nokia.com/bell-labs/about/dennis-m-ritchie/primevalC.html

The v6 cpp.c processes the comments
and deletes them if the 'passcom' (-C) flag is not set.

[snip]

You sure? That looks like V7 code to me.

Yes, it is. I didn't have a machine readable version of the
v6 compiler handy. Dug it out and here's the v6 version.

getch()
{
register int c, lastst;

while ((c=getc1())=='/' && !instring)
{
if ((c=getc1())!='*')
{
pushback(c);
return('/');
}
if (!skipcom)
{putc('/',fout); putc('*', fout);}
lastst=0;
while ( (c = getc1()) != '\0')
{
if (lastst && c=='/')
{
if (!skipcom)
putc('/', fout);
break;
}
if (c=='\n' || !skipcom)
putc(c, fout);
lastst = (c=='*');
}
if (c=='\0')break;
}
return(c);
}

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 04, 2026 14:06:23

Bart <bc@freeuk.com> writes:

On 04/06/2026 19:54, David Brown wrote:

[...]

Again - /please/ stop trying to guess what people say or put words
in their mouths.� I can't remember ever seeing you do so accurately.

This is what you actually said:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely
subjective opinion. Even if it is true that this is "commonly agreed
to" (and AFAIK you have no basis for that claim), that would still be a
subjective opinion - no matter how common that opinion is.

You're saying that:

* "more than needed" is objective
* "too many" is subjective

Stop it. He's not saying that.

You're taking phrases out of context and making false claims that the
full statement was far more general than it actually was.

Nobody said or implied that "too many" is always subjective.

"Too many parentheses" is subjective, because they affect the ease
of reading the code as a human reader.

And 'more than needed' isn't that?!

More than needed *for what*? Without that context, we can't tell
whether "more than needed" is subjective or objective.

You know all this.

[...]

No, this is just getting ludicrous and suggests not wanting to tackle
the real subject: should people write '(a << b) & c' or 'a << b & c'?

Oh, is that the real subject?

I presume you prefer `(a << b) & c` to `a << b & c`.

So do I.

Others might or might not have different opinions. If that was the
"real subject", we've wasted a lot of time debating the difference
between subjectivity and objectivity.

Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Tim didn't say or imply that.

Presumably, the same 99.9% will not use indentation, and will write
their programs all on one line anyway, because it is still after all completely unambiguous according to the C standard!

Of course not, because 99.9% of C programmers are not idiots..
Your record of guessing incorrectly what other people think is
unbroken. I suggest you stop trying.

If people are having a debate about some controversial topic, have
you found that arguing against some unrealistic parody of the other
person's position is ever useful (unless your goal is to prolong
the debate)? Stop telling people what they think.

Tim probably prefers fewer parentheses than most C programmers do.
You probably prefer more. There *might* be an interesting discussion
to be had about that difference, but I doubt it.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 04, 2026 14:16:14

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

One advantage of having a single program do the whole thing, is that
error messages can mention the actual text of the line where a problem
was detected, without any pre-processing applied.

Typical preprocessors emit directives that tell the compiler about
the current file name and line number, precisely so that diagnostic
messages can refer to the original text.

For example:

$ cat hello.c
#include <stdio.h>
int main(void) {
printf("Hello world!\n");
}
$ gcc -E hello.c | tail
extern int __uflow (FILE *);
extern int __overflow (FILE *, int);
# 983 "/usr/include/stdio.h" 3 4

# 2 "hello.c" 2

# 2 "hello.c"
int main(void) {
printf("Hello world!\n");
}
$

The line `# 2 "hello.c"` is, according to the C standard, a
"non-directive", which is a kind of directive. Executing a
non-directive has undefined behavior, but gcc apparently treats it
very much like a #line directive.

It doesn't really matter whether the preprocessor is a separate program
or not.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thursday, June 04, 2026 22:28:30

On 04/06/2026 21:34, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:47, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:18, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/06/2026 15:18, Bart wrote:

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p

Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with >>>>>>>> this. The parsing rules of the language are clear - often called >>>>>>>> "maximum munch".� The character sequence "/*" is the start of a >>>>>>>> comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and
consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

How does that not make it bad design?

The proprocessor would strip everything from the /* until the next
matching */, so a chunk of your program goes missing.

Whatcha talkin' 'bout willis?

What were /you/ talking about? What was your point?

Your inaccurate characterization that a chunk of the program
went "missing". Nothing meaningful is missing (and the comment
remains in the original source file).

So what do you mean, exactly, when you claim that the output of
the preprocessor causes a chunk of the program (which doesn't
include whitespace or comments) is missing?

This is the example I gave elsewhere:

---------------------------
There are actually other issues associated with /**/ comments; here
someone forgot to terminate the first comment:

puts("one"); /* comment 1
puts("two"); /* commmet 2 */
puts("three"); /* comment 3 */
---------------------------

After preprocessing you're left with this:

puts("one");
puts("three");

That middle puts call is missing, and it's meant to be part of the program.

This can also be a consequence of an inadvertent /* sequence such as in
'a = b/*p;'.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Thursday, June 04, 2026 22:47:36

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 19:54, David Brown wrote:

[...]

Again - /please/ stop trying to guess what people say or put words
in their mouths.� I can't remember ever seeing you do so accurately.

This is what you actually said:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely >>> subjective opinion. Even if it is true that this is "commonly agreed
to" (and AFAIK you have no basis for that claim), that would still be a
subjective opinion - no matter how common that opinion is.

You're saying that:

* "more than needed" is objective
* "too many" is subjective

Stop it. He's not saying that.

That is EXACTLY what he's saying: "It is an OBJECTIVE fact .. has more
... than needed", and:

"has too many ... is ... purely subjective".

You're taking phrases out of context and making false claims that the
full statement was far more general than it actually was.

And this is exactly what other people are doing.

So I used TOO MANY instead of MORE THAN NEEDED to describe the exact
same phenomenon.

(1) Why are you all making such a big fucking deal of this?

(2) Why are you all sticking up for each other?

(3) Why don't you this discuss the fucking subject instead of going down
these pointless rabbit holes?

Nobody said or implied that "too many" is always subjective.

"Too many parentheses" is subjective, because they affect the ease
of reading the code as a human reader.

And 'more than needed' isn't that?!

More than needed *for what*? Without that context, we can't tell
whether "more than needed" is subjective or objective.

Jesus, the subthread has been going long enough.

It is abourt how many brackets are too many, more than needed,
superfluous to requirements, etc etc etc.

Yes, I've finally broken and refuse to call round brackets 'parentheses' anymore.

Except that I really no longer care. Do whatever the hell you like with
your fucking language.

This is not a civil discussion forum, it is a bear-pit.

You know all this.

[...]

No, this is just getting ludicrous and suggests not wanting to tackle
the real subject: should people write '(a << b) & c' or 'a << b & c'?

Oh, is that the real subject?

I presume you prefer `(a << b) & c` to `a << b & c`.

So do I.

Others might or might not have different opinions. If that was the
"real subject", we've wasted a lot of time debating the difference
between subjectivity and objectivity.

Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Tim didn't say or imply that.

So what was his 99.9% all about? Nobody has a clue, except they are
certain that what I think it is is wrong!

Presumably, the same 99.9% will not use indentation, and will write
their programs all on one line anyway, because it is still after all
completely unambiguous according to the C standard!

Of course not, because 99.9% of C programmers are not idiots..
Your record of guessing incorrectly what other people think is
unbroken. I suggest you stop trying.

This is what Tim said:

"If someone really can't learn the rules of expression syntax for the
language they are using, they should be advised to try a different
language, or perhaps give up programming altogether. It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without
difficulty."

It sounds to me very much as though he expects 99.9% to know all C's precedences by heart and to never need to use superfluous brackets (or
'more than needed if 'superfluous' is still to subjective).

But of course, I am wrong and he is right, and you will defend his view
(a subjective one) to the death.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thursday, June 04, 2026 21:58:14

Bart <bc@freeuk.com> writes:

On 04/06/2026 21:34, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:47, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:18, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/06/2026 15:18, Bart wrote:

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p >>>>>>>>>>
Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with >>>>>>>>> this. The parsing rules of the language are clear - often called >>>>>>>>> "maximum munch".� The character sequence "/*" is the start of a >>>>>>>>> comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and >>>>>>>> consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

How does that not make it bad design?

The proprocessor would strip everything from the /* until the next
matching */, so a chunk of your program goes missing.

Whatcha talkin' 'bout willis?

What were /you/ talking about? What was your point?

Your inaccurate characterization that a chunk of the program
went "missing". Nothing meaningful is missing (and the comment
remains in the original source file).

So what do you mean, exactly, when you claim that the output of
the preprocessor causes a chunk of the program (which doesn't
include whitespace or comments) is missing?

This is the example I gave elsewhere:

---------------------------
There are actually other issues associated with /**/ comments; here
someone forgot to terminate the first comment:

puts("one"); /* comment 1
puts("two"); /* commmet 2 */
puts("three"); /* comment 3 */
---------------------------

After preprocessing you're left with this:

puts("one");
puts("three");

That middle puts call is missing, and it's meant to be part of the program.

Of course. There is no functional difference between removing the commented text in cpp and leaving it in. In both cases, puts("two"); will be treated as a comment and will be ignored by the rest of the compiler.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Thursday, June 04, 2026 15:25:08

On 6/4/2026 12:29 PM, Bart wrote:
[...]

And 'more than needed' isn't that?!

All hail extra ()'s! :^)

((branch) ? (cond0) : (cond1))

Well, I like to make my ? operators explicitly separated with extra
()'s... I basically never use (?:) anyway. Some times I did in a crazy
macro expression along the lines of the chaos PP lib... Oh my.

;^o

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Richard Harnden@3:633/10 to All on Thursday, June 04, 2026 23:25:53

On 04/06/2026 22:28, Bart wrote:

There are actually other issues associated with /**/ comments; here
someone forgot to terminate the first comment:

�� puts("one");�� /* comment 1
�� puts("two");�� /* commmet 2 */
�� puts("three");� /* comment 3 */

I get ...

$ gcc x.c
x.c:6:21: warning: '/*' within block comment [-Wcomment]
6 | puts("two"); /* commmet 2 */
| ^
1 warning generated.

... so I don't see it as a big deal.

It's up there with typing 'foo():' when I meant 'foo();' - I'll get lots
of errors which all boil down to my inability to release the shift-key.
I don't blame the keyboard layout. Or the font. Or my eyesight.

I'm sure there are plenty of editors that have a nested-comments angry
colour.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Friday, June 05, 2026 00:27:00

On 2026-06-04 23:47, Bart wrote:

Jesus, the subthread has been going long enough.

I'd dare to say that there's an extremely high chance
that *everyone* in this group is agreeing with you on
this statement! - I suggest pinning it at the wall. :-)

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 04, 2026 16:09:28

Bart <bc@freeuk.com> writes:

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 19:54, David Brown wrote:

[...]

Again - /please/ stop trying to guess what people say or put words
in their mouths.� I can't remember ever seeing you do so accurately.

This is what you actually said:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely >>>> subjective opinion. Even if it is true that this is "commonly agreed
to" (and AFAIK you have no basis for that claim), that would still be a >>>> subjective opinion - no matter how common that opinion is.

You're saying that:

* "more than needed" is objective
* "too many" is subjective

Stop it. He's not saying that.

That is EXACTLY what he's saying: "It is an OBJECTIVE fact .. has more
... than needed", and:

"has too many ... is ... purely subjective".

You're taking phrases out of context and making false claims that the
full statement was far more general than it actually was.

And this is exactly what other people are doing.

Taken literally, your statement implies that you admit that that's
what you're doing. Is that what you meant? If so, I suggest you
*stop* making such false claims. If not, what did you actually mean?

So I used TOO MANY instead of MORE THAN NEEDED to describe the exact
same phenomenon.

That's not the problem. There is an actual meaningful distinction
here, between what's needed by the compiler and what's useful to
improve clarity for human readers. I have found some of what you've
written to be unclear about that distinction.

Can we agree that the question of whether parentheses in a C
expression are necessary to the compiler can be answered objectively?
Can we agree that the question of whether extra parentheses are
helpful to a human reader is at least partly subjective, and
varies from case to case? Is there really anything else that we
fundamentally disagree about?

(1) Why are you all making such a big fucking deal of this?

Why are you?

(2) Why are you all sticking up for each other?

Most of us happen to agree with each other on most of the points being discussed. I'm not "sticking up" for anyone. I have expressed
disagreement in this thread with people other than you.

(3) Why don't you this discuss the fucking subject instead of going
down these pointless rabbit holes?

OK, what subject do you want to discuss? Please be clear and specific.

[...]

It is abourt how many brackets are too many, more than needed,
superfluous to requirements, etc etc etc.

There is of course no objective answer to that, only opinions.
A substantial percentage of this thread has been about exactly
what you now say you want it to be about. I've said myself that
I think the parentheses in `(a*a) + (b*b)` are excessive, but the
parentheses in `(a << b) & c` are appropriate.

?
?
?
That (pointing to the prevous paragraph) was me talking about exactly
what you want us to be talking about. Consider acknowledging that.

[...]

Presumably, the same 99.9% will not use indentation, and will write
their programs all on one line anyway, because it is still after all
completely unambiguous according to the C standard!

Of course not, because 99.9% of C programmers are not idiots..
Your record of guessing incorrectly what other people think is
unbroken. I suggest you stop trying.

This is what Tim said:

"If someone really can't learn the rules of expression syntax for the language they are using, they should be advised to try a different
language, or perhaps give up programming altogether. It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without difficulty."

And you inferred from that that he opposes using indentation.

https://en.wikipedia.org/wiki/Straw_man

Or maybe you were being figurative, but I honestly can't tell.

It sounds to me very much as though he expects 99.9% to know all C's precedences by heart and to never need to use superfluous brackets (or
'more than needed if 'superfluous' is still to subjective).

But of course, I am wrong and he is right, and you will defend his
view (a subjective one) to the death.

Nope.

I don't know whether that's his opinion or not. Perhaps you haven't
noticed that I don't always agree with Tim. I don't know whether
he thinks that the parentheses in `(a << b) & c` are excessive, or
whether he finds `a << b & c` clearer. He can certainly express his
own opinion if he wants to. If he thinks (subjectively) that those
parentheses are excessive, then I (subjectively) disagree with him.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Friday, June 05, 2026 00:44:45

On 05/06/2026 00:09, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 19:54, David Brown wrote:

[...]

Again - /please/ stop trying to guess what people say or put words
in their mouths.� I can't remember ever seeing you do so accurately.

This is what you actually said:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages. >>>>>
"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely >>>>> subjective opinion. Even if it is true that this is "commonly agreed >>>>> to" (and AFAIK you have no basis for that claim), that would still be a >>>>> subjective opinion - no matter how common that opinion is.

You're saying that:

* "more than needed" is objective
* "too many" is subjective

Stop it. He's not saying that.

That is EXACTLY what he's saying: "It is an OBJECTIVE fact .. has more
... than needed", and:

"has too many ... is ... purely subjective".

You're taking phrases out of context and making false claims that the
full statement was far more general than it actually was.

And this is exactly what other people are doing.

Taken literally, your statement implies that you admit that that's
what you're doing. Is that what you meant? If so, I suggest you
*stop* making such false claims. If not, what did you actually mean?

So I used TOO MANY instead of MORE THAN NEEDED to describe the exact
same phenomenon.

That's not the problem. There is an actual meaningful distinction
here, between what's needed by the compiler and what's useful to
improve clarity for human readers. I have found some of what you've
written to be unclear about that distinction.

Can we agree that the question of whether parentheses in a C
expression are necessary to the compiler can be answered objectively?
Can we agree that the question of whether extra parentheses are
helpful to a human reader is at least partly subjective, and
varies from case to case? Is there really anything else that we fundamentally disagree about?

(1) Why are you all making such a big fucking deal of this?

Why are you?

I didn't start this business of something being subjective or objective,
or suggesting than one turn of phrase to discuss the same thing was
subjective and the other objective (implying that a subjective opinion
had less worth). TR started that and several people backed him up.

Myself I wouldn't even use those terms. My point was that some overuses
of () for commonly known precedences are more overkill than others.

If that's subjective then so be it; it is not some fundamental law of
the universe. I would just call it common sense.

Why are you?

Since you ask, I was defending my point of view then got sidetracked by
this subjective/objective nonsense. I notice that TR has disappeared
from this subthread.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Thursday, June 04, 2026 23:49:43

In article <10vsnl7$lkmu$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly >>>conforming with respect to conforming hosted implementations. >>>(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly >>>conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted >>>implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for
not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that >>>possibility.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

N3220 5.1.2.4, Program semantics.

It defines the *observable behavior* of a program, which consists of
accesses to volatile objects, data written to files, and I/O dynamics of >interactive devices.

Yes, but it does so for strictly-conforming programs with no UB.

To understand conformance, we have to jump over to section 4,
which explicitly says that, 'Undefined behavior is otherwise
indicated in this document by the words "undefined behavior" or
by the omission of any explicit definition of behavior.' As it
does not say that a program with an instance of undefined
behavior in an integer constant expression that is not executed
must otherwise behave in any given manner, what the program does
is undefined. A constaint violation mandates a diagnostic, but
beyond that, the standard is (AFAICT) silent.

Undefined Behavior, in turn, is not defined as specific only to
execution: the standard simply says that it is "behavior, upon
use of a *nonportable or erroneous program construct*..." for
which there are no requirements, and there are examples of
things that are explicitly UB at translation time, such as
improperly terminated lexemes and so forth.

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

Morever, sec 6.6 para 17 says that, "the semantic rules for
evaluation of a constant expression are the same as for
nonconstant expressions." This brings us back to 5.1.2.4,
though I submit that para (4) is a stronger argument for what
you and Tim are saying, as it reads in part, "An actual
implementation is not required to evaluate part of an expression
if it can deduce that its value is not used and that no needed
side effects are produced (including any caused by calling a
function or through volatile access to an object)." I interpret
this to mean that, if the implementation can determine that
there is no way that `foo` can be called, it does not _have_ to
evaluate the above expression. However, it must satisfy the
range constraint from section 6.6, so it likely will, and in any
event, the standard does not say that it, "shall not" evaluate
it, or when.

Once the compiler does that, if it does, and observes UB, the
standard is silent on what requirements it imposes, which means
the behavior is undefined. I see no reason it couldn't arrange
to invoke `foo` at that point.

So no, I do not see how execution according to the rules of the
abstract machine is not guaranteed, here. I certainly see no
way in which this can be regarded as a strictly conforming
program.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and
dangerous.

Two separate things. My point earlier was that code can
obviously run after `main` terminates. Moreoever, I can't
imagine what would _prevent_ a runtime system that invokes
`main` from doing something like printing, "PROGRAM STOPPED"
after `main` returned. C imposes no requirements here.

Whether `foo` could be invoked after, I think, is undefined.

Whether foo() has external linkage or internal
linkage doesn't change that.

I disagree. There's no possible way for the implementation to
know whether a function with external linkage will be ultimately
invoked or not; consider a system that supports loadable shared
modules. Nothing prevents even this simple program from being
compiled as a shared module, dynamically loaded, the loading
program explicitly searching for and finding the symbol
corresponding to the `foo` function, and invoking it.

Remember that linking is translation phase 8. The compiler is not
the entire implementation.

Exactly my point. The compiler cannot know how `foo` might be
used, or how the translated object might be exercised. There's
I don't see how it could possibly know that, given that `foo`
has external linkage.

Hence, the compiler _must_ treat with UB as written, which is
why `ubsan` inserts trapping code in `foo`.

I don't know what "_must_ treat with UB" means.

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid. But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. It can
reject it if it can prove that it *always* has undefined behavior
during execution.

What I'm saying is that, `foo` has undefined behavior _period_.
That's manifest in an integer constant expression, whether it is
executed at runtime or not. I believe that the standard forces
the expression to be evaluated at translation time, via the
"shall" mandate when checking the constraint on the range in sec
6.6 para 4. Further, that evaluation must happen in accordance
with the rules of the abstract machine, as per 5.1.2.4 para 17.
The diagnostic is mandated, as is the translation-time
evaluation. The expression is itself manifestly exhibits UB,
and so therefore the result of the rest of the translation is
undefined.

I could be wrong; this is all excessively pedantic. And of
course, if an implementation does something silly and emits
garbage for Tim's program, then I argue it should be chucked
onto the dustbin of excessive fawning over the standard. But
I'm not convinced that the standard _prohibits_ such an extreme
interpretation.

In your example, `foo` clearly exhibits UB; I think your
argument is whether that has a realized effect or not, since the
UB is not invoked. I'm saying that in general a compiler cannot
possibly know that when it compiles `foo`, and is free to assume
the worst.

foo() exhibits UB if and only if it's called during execution.

Yes, a compiler can't know whether foo() will be called.
An implementation, particularly a linker, might know, but is not
required to. No, it is not free to assume the worst.

See above.

I certainly wouldn't want a compiler to reject `1/time(NULL)`
because it can't prove that time(NULL) won't be zero, or reject
`argc+1` because it can't prove that argc < INT_MAX. Code whose
behavior would be undefined if it were executed has no behavior
(and therefore no UB) if it's not executed.

That's categorically different; what you are describing are what
Regehr calls, "Type-2" functions, and I agree with you for
those.

The program that Tim posted has a "Type-3" function, and
constraints dictate that the UB express must be evaluated at
translation time, and a diagnostic emitted. In the most
charitable interpretation, it cannot be considered a strictly
conforming program, even if the implementation is smart enough
to avoid evaluating the constant expression, as it is
unspecified whether it's evaluated or not, and strictly
conforming programs shall not rely on unspecified behavior.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 05, 2026 00:02:26

In article <10vspuu$lkmu$3@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

One advantage of having a single program do the whole thing, is that
error messages can mention the actual text of the line where a problem
was detected, without any pre-processing applied.

Typical preprocessors emit directives that tell the compiler about
the current file name and line number, precisely so that diagnostic
messages can refer to the original text.

For example:

$ cat hello.c
#include <stdio.h>
int main(void) {
printf("Hello world!\n");
}
$ gcc -E hello.c | tail
extern int __uflow (FILE *);
extern int __overflow (FILE *, int);
# 983 "/usr/include/stdio.h" 3 4

# 2 "hello.c" 2

# 2 "hello.c"
int main(void) {
printf("Hello world!\n");
}
$

The line `# 2 "hello.c"` is, according to the C standard, a
"non-directive", which is a kind of directive. Executing a
non-directive has undefined behavior, but gcc apparently treats it
very much like a #line directive.

It doesn't really matter whether the preprocessor is a separate program
or not.

In fairness to Kuyper, however, the *text* from the original
source file is lost. E.g.,

term% cat n.c
#include <stdio.h>
#define FOO "hi"; // Note trailing `;`
int
main(void)
{
printf("%s\n", FOO);
return 0;
}
term% clang -fkeep-system-includes -E n.c
# 1 "n.c"
# 1 "<built-in>" 1
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "n.c" 2
#include <stdio.h> /* clang -E -fkeep-system-includes */
# 1 "n.c"
# 2 "n.c" 2

int
main(void)
{
printf("%s\n", "hi";);
return 0;
}
term%

In this example, the preprocessor macro `FOO` has been lost, and
only its expansion remains. The compiler has no information to
give a useful diagnostic.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 05, 2026 00:03:11

In article <8xlUR.17899$pxGb.16870@fx07.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <sglUR.17897$pxGb.10844@fx07.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsh43$b3is$1@dont-email.me>,
Lew Pitcher <lew.pitcher@digitalfreehold.ca> wrote:

On Thu, 04 Jun 2026 16:18:07 +0000, Scott Lurndal wrote:

[snip]

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

So, I've looked through "The C Programming Language" (the K&R C)
and the paper "A Tour Through the Portable C Compiler" (S. C. >>>>>Johnson, circa 1974), and neither document states that the >>>>>preprocessor strips comments. In fact, the mentions of the >>>>>preprocessor are exclusively about the #operation operators,
and not about C comments.

The PDP-11 compiler from 5th Edition research Unix removes
comments in `cc.c`. The 1972 compilers from Dennis Ritchie's
web page remove them in the compiler proper, as they predated
the preprocessor: >>>>https://www.nokia.com/bell-labs/about/dennis-m-ritchie/primevalC.html

The v6 cpp.c processes the comments
and deletes them if the 'passcom' (-C) flag is not set.

[snip]

You sure? That looks like V7 code to me.

Yes, it is. I didn't have a machine readable version of the
v6 compiler handy. Dug it out and here's the v6 version.

getch()
{
register int c, lastst;

while ((c=getc1())=='/' && !instring)
{
if ((c=getc1())!='*')
{
pushback(c);
return('/');
}
if (!skipcom)
{putc('/',fout); putc('*', fout);}
lastst=0;
while ( (c = getc1()) != '\0')
{
if (lastst && c=='/')
{
if (!skipcom)
putc('/', fout);
break;
}
if (c=='\n' || !skipcom)
putc(c, fout);
lastst = (c=='*');
}
if (c=='\0')break;
}
return(c);
}

Yeah, that's from `cc.c`, right?

I think 7th Ed was the first where `cpp` was liberated from the
compiler proper (or the driver, anyway).

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Friday, June 05, 2026 00:18:05

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <8xlUR.17899$pxGb.16870@fx07.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <sglUR.17897$pxGb.10844@fx07.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsh43$b3is$1@dont-email.me>,
Lew Pitcher <lew.pitcher@digitalfreehold.ca> wrote:

On Thu, 04 Jun 2026 16:18:07 +0000, Scott Lurndal wrote:

[snip]

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

So, I've looked through "The C Programming Language" (the K&R C) >>>>>>and the paper "A Tour Through the Portable C Compiler" (S. C. >>>>>>Johnson, circa 1974), and neither document states that the >>>>>>preprocessor strips comments. In fact, the mentions of the >>>>>>preprocessor are exclusively about the #operation operators,
and not about C comments.

The PDP-11 compiler from 5th Edition research Unix removes
comments in `cc.c`. The 1972 compilers from Dennis Ritchie's
web page remove them in the compiler proper, as they predated
the preprocessor: >>>>>https://www.nokia.com/bell-labs/about/dennis-m-ritchie/primevalC.html

The v6 cpp.c processes the comments
and deletes them if the 'passcom' (-C) flag is not set.

[snip]

You sure? That looks like V7 code to me.

Yes, it is. I didn't have a machine readable version of the
v6 compiler handy. Dug it out and here's the v6 version.

getch()
{
register int c, lastst;

while ((c=getc1())=='/' && !instring)
{
if ((c=getc1())!='*')
{
pushback(c);
return('/');
}
if (!skipcom)
{putc('/',fout); putc('*', fout);}
lastst=0;
while ( (c = getc1()) != '\0')
{
if (lastst && c=='/')
{
if (!skipcom)
putc('/', fout);
break;
}
if (c=='\n' || !skipcom)
putc(c, fout);
lastst = (c=='*');
}
if (c=='\0')break;
}
return(c);
}

Yeah, that's from `cc.c`, right?

No, it's from cpp.c

$ ls /work/reference/collegetapes/sltape/v6cc/
c0.c c00.c c01.c c02.c c03.c c04.c c05.c c1.h
c10.c c11.c c12.c c13.c c2.h c20.c c21.c cc.c cpp.c

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Thursday, June 04, 2026 17:26:11

On 6/4/2026 4:44 PM, Bart wrote:

On 05/06/2026 00:09, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 19:54, David Brown wrote:

[...]

Again - /please/ stop trying to guess what people say or put words >>>>>> in their mouths.� I can't remember ever seeing you do so accurately. >>>>>

This is what you actually said:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more
parentheses than needed in the context of most programming languages. >>>>>>
"(a*a) + (b*b) has too many parentheses", on the other hand, is a >>>>>> purely
subjective opinion.� Even if it is true that this is "commonly agreed >>>>>> to" (and AFAIK you have no basis for that claim), that would still >>>>>> be a
subjective opinion - no matter how common that opinion is.

You're saying that:

*� "more than needed" is objective
*� "too many" is subjective

Stop it.� He's not saying that.

That is EXACTLY what he's saying: "It is an OBJECTIVE fact .. has more
... than needed", and:

� "has too many ... is ... purely subjective".

You're taking phrases out of context and making false claims that the
full statement was far more general than it actually was.

And this is exactly what other people are doing.

Taken literally, your statement implies that you admit that that's
what you're doing.� Is that what you meant?� If so, I suggest you
*stop* making such false claims.� If not, what did you actually mean?

So I used TOO MANY instead of MORE THAN NEEDED to describe the exact
same phenomenon.

That's not the problem.� There is an actual meaningful distinction
here, between what's needed by the compiler and what's useful to
improve clarity for human readers.� I have found some of what you've
written to be unclear about that distinction.

Can we agree that the question of whether parentheses in a C
expression are necessary to the compiler can be answered objectively?
Can we agree that the question of whether extra parentheses are
helpful to a human reader is at least partly subjective, and
varies from case to case?� Is there really anything else that we
fundamentally disagree about?

(1) Why are you all making such a big fucking deal of this?

Why are you?

I didn't start this business of something being subjective or objective,
or suggesting than one turn of phrase to discuss the same thing was subjective and the other objective (implying that a subjective opinion
had less worth). TR started that and several people backed him up.

Myself I wouldn't even use those terms. My point was that some overuses
of () for commonly known precedences are more overkill than others.

If that's subjective then so be it; it is not some fundamental law of
the universe. I would just call it common sense.

Why are you?

Since you ask, I was defending my point of view then got sidetracked by
this subjective/objective nonsense. I notice that TR has disappeared
from this subthread.

Wrt the number of ()'s? Might as well go to sleep with the following
song playing in the background:

(The Fate of Ophelia - Taylor Swift (Lyrics) Charlie Puth ft. Selena
Gomez, the weekd, ariana grande)

https://youtu.be/yleL-JbEHc8?list=RDyleL-JbEHc8

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 04, 2026 18:04:38

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsnl7$lkmu$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly >>>>conforming with respect to conforming hosted implementations. >>>>(Incidentally, given the rules for freestanding implementations, I'm >>>>not sure that it is even possible for any program to be strictly >>>>conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted >>>>implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for
not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that >>>>possibility.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

N3220 5.1.2.4, Program semantics.

It defines the *observable behavior* of a program, which consists of >>accesses to volatile objects, data written to files, and I/O dynamics of >>interactive devices.

Yes, but it does so for strictly-conforming programs with no UB.

It does so for programs in general, not just strictly conforming
ones. If a program has undefined behavior, all bets are off,
but for example a program that evaluates `printf("%d\n", INT_MAX)`
is not strictly conforming, but it's fully subject to 5.1.2.4.

To understand conformance, we have to jump over to section 4,
which explicitly says that, 'Undefined behavior is otherwise
indicated in this document by the words "undefined behavior" or
by the omission of any explicit definition of behavior.' As it
does not say that a program with an instance of undefined
behavior in an integer constant expression that is not executed
must otherwise behave in any given manner, what the program does
is undefined. A constaint violation mandates a diagnostic, but
beyond that, the standard is (AFAICT) silent.

I don't think an integer constant expression can have undefined
behavior. INT_MAX+1 and 1/0 are not constant expressions, because
neither "evaluate(s) to a constant that is in the range of
representable values for its type".

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

The program in question, quoted above, has:

int zero = (INT_MAX+1)*0;

`(INT_MAX+1)*0` is not a constant expression, not because of the
overflow, but because a constant expression is not required in
that context. "constant-expression" is defined by a production in
the grammar (it reduces to "conditional-expression"). Even in

int n = 42;

42 is not a a constant expression, because the grammar doesn't
call for a constant expression in that context -- even though it
looks like one. Similarly, in `a + b * c`, `a + b` looks like an
additive expression, but it isn't one. (Not a perfect analogy.)

Undefined Behavior, in turn, is not defined as specific only to
execution: the standard simply says that it is "behavior, upon
use of a *nonportable or erroneous program construct*..." for
which there are no requirements, and there are examples of
things that are explicitly UB at translation time, such as
improperly terminated lexemes and so forth.

Yes, there are constructs that are explicitly UB at translation time.
(I think that's unfortunate, and there are efforts to clear up some
such cases in C2y.)

Signed integer overflow is not one of those constructs.
Any undefined behavior from evaluating INT_MAX+1 happens during
execution (barring constraint violations).

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

It satisfies the requirements for an integer constant expression in
6.6p8, but it violates the constraint in 6.6p4. (I presume that an
"integer constant expression" must be a "constant expression".)
But since "constant-expression" is a grammatical production,
it doesn't have to satisfy that constraint, and no diagnostic
is required. (A warning is certainly permitted.)

Similarly, this:
int n = INT_MAX + 1;
at block scope doesn't require a diagnostic, though of course it
has undefined behavior -- but at file scope, the initializer is a
constant expression, so that would be a constraint violation.

Morever, sec 6.6 para 17 says that, "the semantic rules for
evaluation of a constant expression are the same as for
nonconstant expressions." This brings us back to 5.1.2.4,
though I submit that para (4) is a stronger argument for what
you and Tim are saying, as it reads in part, "An actual
implementation is not required to evaluate part of an expression
if it can deduce that its value is not used and that no needed
side effects are produced (including any caused by calling a
function or through volatile access to an object)." I interpret
this to mean that, if the implementation can determine that
there is no way that `foo` can be called, it does not _have_ to
evaluate the above expression. However, it must satisfy the
range constraint from section 6.6, so it likely will, and in any
event, the standard does not say that it, "shall not" evaluate
it, or when.

Overflow in a constant expression is not undefined behavior. It's a
constraint violation. But that doesn't apply here, because the
initializer is not a constant expression. (Sorry if I'm repeating
myself.)

Once the compiler does that, if it does, and observes UB, the
standard is silent on what requirements it imposes, which means
the behavior is undefined. I see no reason it couldn't arrange
to invoke `foo` at that point.

Any UB in the program would occur during execution, and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values
would grant the same permission.

So no, I do not see how execution according to the rules of the
abstract machine is not guaranteed, here. I certainly see no
way in which this can be regarded as a strictly conforming
program.

foo()'s behavior would be undefined if it were called. It *isn't*
called, so there's no actual UB. The program does not violate any
of the other requirements for strict conformance.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and
dangerous.

Two separate things. My point earlier was that code can
obviously run after `main` terminates. Moreoever, I can't
imagine what would _prevent_ a runtime system that invokes
`main` from doing something like printing, "PROGRAM STOPPED"
after `main` returned. C imposes no requirements here.

Yes, it does. An OS can print "PROGRAM STOPPED", but not as part
of the execution of the program. On my system, a shell prompt is
printed after a program terminates, but not by the program. If I
execute a "hello, world" program with its output redirected to a file
(on a system that supports that), the resulting file cannot contain
"PROGRAM STOPPED". The requirements in 5.1.2.4 specify both what
the execution of a program must do and what it must not do.

Whether `foo` could be invoked after, I think, is undefined.

Whether foo() has external linkage or internal
linkage doesn't change that.

I disagree. There's no possible way for the implementation to
know whether a function with external linkage will be ultimately
invoked or not; consider a system that supports loadable shared
modules. Nothing prevents even this simple program from being
compiled as a shared module, dynamically loaded, the loading
program explicitly searching for and finding the symbol
corresponding to the `foo` function, and invoking it.

Remember that linking is translation phase 8. The compiler is not
the entire implementation.

Exactly my point. The compiler cannot know how `foo` might be
used, or how the translated object might be exercised. There's
I don't see how it could possibly know that, given that `foo`
has external linkage.

We were presented with a complete translation unit that included a
function definition for "main". It's a complete program. There's no
valid way for some other program to call foo. If OS provided such
a mechanism, it would be outside the scope of C.

Hence, the compiler _must_ treat with UB as written, which is
why `ubsan` inserts trapping code in `foo`.

I don't know what "_must_ treat with UB" means.

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid. But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. It can
reject it if it can prove that it *always* has undefined behavior
during execution.

What I'm saying is that, `foo` has undefined behavior _period_.
That's manifest in an integer constant expression, whether it is
executed at runtime or not. I believe that the standard forces
the expression to be evaluated at translation time, via the
"shall" mandate when checking the constraint on the range in sec
6.6 para 4. Further, that evaluation must happen in accordance
with the rules of the abstract machine, as per 5.1.2.4 para 17.
The diagnostic is mandated, as is the translation-time
evaluation. The expression is itself manifestly exhibits UB,
and so therefore the result of the rest of the translation is
undefined.

foo is a function. foo does not have undefined behavior; it has no
behavior at all. A *call* to foo during execution has undefined
behavior. (`foo;` is a statement-expression that does nothing;
it does not have undefined behavior.)

[SNIP]

I think the question of whether the initializer is a
constant-expression or not has caused some not entirely relevant
confusion.

Here's another example that avoids that issue.

#include <limits.h>

int foo(void) {
int zero;
zero = INT_MAX;
zero ++;
zero *= 0;
return zero;
}

int main(void) {
return 0;
}

Given my grammatical argument above, I would say that this program
has no constant expressions. Whether that argument is correct or
not, it certainly has no constant expressions that violate any
constraint or that have undefined behavior. Evaluating `zero ++`
(which doesn't even pretend to be a constant expression) would have
run-time undefined behavior -- *if* foo() were ever called.

And given this translation unit, I don't think there's any way to
construct a multi-TU program that calls foo, so a compiler *can*
determine that foo is never called (but there's no requirement to
do so, or to make any use of that information).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 04, 2026 18:36:46

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vspuu$lkmu$3@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

One advantage of having a single program do the whole thing, is that
error messages can mention the actual text of the line where a problem
was detected, without any pre-processing applied.

Typical preprocessors emit directives that tell the compiler about
the current file name and line number, precisely so that diagnostic >>messages can refer to the original text.

For example:

$ cat hello.c
#include <stdio.h>
int main(void) {
printf("Hello world!\n");
}
$ gcc -E hello.c | tail
extern int __uflow (FILE *);
extern int __overflow (FILE *, int);
# 983 "/usr/include/stdio.h" 3 4

# 2 "hello.c" 2

# 2 "hello.c"
int main(void) {
printf("Hello world!\n");
}
$

The line `# 2 "hello.c"` is, according to the C standard, a >>"non-directive", which is a kind of directive. Executing a
non-directive has undefined behavior, but gcc apparently treats it
very much like a #line directive.

It doesn't really matter whether the preprocessor is a separate program
or not.

In fairness to Kuyper, however, the *text* from the original
source file is lost. E.g.,

term% cat n.c
#include <stdio.h>
#define FOO "hi"; // Note trailing `;`
int
main(void)
{
printf("%s\n", FOO);
return 0;
}
term% clang -fkeep-system-includes -E n.c
# 1 "n.c"
# 1 "<built-in>" 1
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "n.c" 2
#include <stdio.h> /* clang -E -fkeep-system-includes */
# 1 "n.c"
# 2 "n.c" 2

int
main(void)
{
printf("%s\n", "hi";);
return 0;
}
term%

In this example, the preprocessor macro `FOO` has been lost, and
only its expansion remains. The compiler has no information to
give a useful diagnostic.

Ah, but it does, as long as the original file is still there.

$ gcc -c n.c
n.c: In function ?main?:
n.c:2:17: error: expected ?)? before ?;? token
2 | #define FOO "hi"; // Note trailing `;`
| ^
n.c:6:20: note: in expansion of macro ?FOO?
6 | printf("%s\n", FOO);
| ^~~
n.c:6:11: note: to match this ?(?
6 | printf("%s\n", FOO);
| ^
$

The output of `gcc -E` doesn't include the name FOO, but it does include
the line `# 3 "n.c"`, and that's enough information for the compiler to
open the original source file and copy information from it into an error message.

(This is perhaps straying slightly off-topic, since the standard
only requires a diagnostic, but it's still interesting to see how
actual compilers do things.)

$ cat n.c
#include <stdio.h>
#define FOO "hi"; // Note trailing `;`
int
main(void)
{
printf("%s\n", FOO);
return 0;
}
$ gcc -E n.c >| n-preprocessed.c
$ grep FOO n-preprocessed.c
$ tail n-preprocessed.c
# 2 "n.c" 2

# 3 "n.c"
int
main(void)
{
printf("%s\n", "hi";);
return 0;
}
$ gcc -c n-preprocessed.c
n.c: In function ?main?:
n.c:6:24: error: expected ?)? before ?;? token
6 | printf("%s\n", FOO);
| ~ ^
| )
$

And if I rename n.c before compiling n-preprocessed.c, the error
messages doesn't include that line of code.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 05, 2026 02:47:35

In article <10vsrpo$men2$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[snip]
Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Tim didn't say or imply that.

So what was his 99.9% all about? Nobody has a clue, except they are
certain that what I think it is is wrong!

Have you thought about, I don't know, maybe asking him?

Presumably, the same 99.9% will not use indentation, and will write
their programs all on one line anyway, because it is still after all
completely unambiguous according to the C standard!

Of course not, because 99.9% of C programmers are not idiots..
Your record of guessing incorrectly what other people think is
unbroken. I suggest you stop trying.

This is what Tim said:

"If someone really can't learn the rules of expression syntax for the >language they are using, they should be advised to try a different
language, or perhaps give up programming altogether. It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without >difficulty."

It sounds to me very much as though he expects 99.9% to know all C's >precedences by heart and to never need to use superfluous brackets (or
'more than needed if 'superfluous' is still to subjective).

But of course, I am wrong and he is right, and you will defend his view
(a subjective one) to the death.

You omited some of what reads to me like fairly important
context before the part you posted:

|This statement illustrates the problem with examples that you give.
|Not only is the presumed reader sort of arbitrarily naive, he or she
|is apparently incapable of learning. Everyone who has ever learned
|to program has had an experience of a program doing something other
|than what was expected, because of a misunderstanding about how the
|language works. When that happens, most people simply learn about
|their misunderstanding and correct it. The readers in your examples
|are like people who started programming after developing Alzheimer's
|disease (and no offense meant to anyone afflicted with Alzheimer's).
|Maybe there are such people, whether or not caused by a medical
|condition, but it doesn't match most programmers' experience, and in
|any case is not worth worrying about. If someone can't understand
|the rules of the road they shouldn't be behind the wheel of a car.

I don't presume to speak for him, but his point appears to be
that most programmers (999 out of a 10000) learn from their
mistakes. Part of that may be developing techniques to prevent
future reoccurance of those mistakes.

Programmers make mistakes; it happens all the time. Many C
programmers may well have experienced mistakes with operator
precedence; it's well-known that the rules have some rough
edges. Usually this is fairly easy to spot in testing; it may
result in a momentary head scratching, perhaps a, "huh...that's
weird..." followed by looking at a table or puzzling over the
grammar for a moment, and then an, "ohh....I see." Perhaps
the programmer thinks, "wow, that confused me.... I'm going to
put in some parentheses to make it clear what's going on the
next time I'm in here..." or maybe they don't. That's the part
that is subjective.

The point is, not just most programmers, but most people in
general, make mistakes and then learn from them. If one cannot
learn from those mistakes vis a particular activity (like
programming, or maybe driving) them maybe one should not be
doing that activity, whatever it is. I suppose one might
struggle to learn from one's mistakes and still enjoy
programming, perhaps as a hobby. I don't see any harm in that;
driving might be another matter: cars are big, heavy, and go
fast enough to kill someone.

Where you seem to go off the rails in _this_ discussion is what
others have already told you: you are mistaking an expression of
preference with measurable facts. What constitutes "too many"
or "too few" parentheses is not well-defined: one cannot go look
in a text book and and a defintiion of "too many" here. And
even though most people agree that `((((((((a * b))))))))` is
"too many", that's still an opinion: someone else may disagree.
_I_ may think that the person who wrote that and anyone who
agree with them has no taste and an utter lack of class, but
that's nothing more than my opinion.

Here's an example: when I use the ternary operator, I _usually_
wrap the first expression in parens. Necessary? Almost never.
But I just like the way it looks, but aesthetics are purely
subjective.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 05, 2026 02:49:49

In article <10vsqlu$men2$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 04/06/2026 21:34, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:47, Scott Lurndal wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 17:18, Scott Lurndal wrote:

David Brown <david.brown@hesbynett.no> writes:

On 04/06/2026 15:18, Bart wrote:

(Note that C has its own problems in this area:

�� a = b/*p;�� // divide b by dereferenced pointer p >>>>>>>>>>
Here, /* also happens to start a block comment.)

Here you are objectively wrong.� C does not have a "problem" with >>>>>>>>> this. The parsing rules of the language are clear - often called >>>>>>>>> "maximum munch".� The character sequence "/*" is the start of a >>>>>>>>> comment, it is not two separate operators.

This is where it falls down. It's very clearly a 'gotcha', and >>>>>>>> consequence of poorly thought-out design.

It is neither a "gotcha", not a consequence of poor design.

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

How does that not make it bad design?

The proprocessor would strip everything from the /* until the next
matching */, so a chunk of your program goes missing.

Whatcha talkin' 'bout willis?

What were /you/ talking about? What was your point?

Your inaccurate characterization that a chunk of the program
went "missing". Nothing meaningful is missing (and the comment
remains in the original source file).

So what do you mean, exactly, when you claim that the output of
the preprocessor causes a chunk of the program (which doesn't
include whitespace or comments) is missing?

This is the example I gave elsewhere:

---------------------------
There are actually other issues associated with /**/ comments; here
someone forgot to terminate the first comment:

puts("one"); /* comment 1
puts("two"); /* commmet 2 */
puts("three"); /* comment 3 */
---------------------------

After preprocessing you're left with this:

puts("one");
puts("three");

That middle puts call is missing, and it's meant to be part of the program.

The middle call is not "missing". It is "commented out." It
that was not deliberate, you might have a bad time, but it's
independent of the preprocessor.

This can also be a consequence of an inadvertent /* sequence such as in
'a = b/*p;'.

Sounds like a bug.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 05, 2026 02:54:15

In article <10vt97i$pube$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vspuu$lkmu$3@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

One advantage of having a single program do the whole thing, is that
error messages can mention the actual text of the line where a problem >>>> was detected, without any pre-processing applied.

Typical preprocessors emit directives that tell the compiler about
the current file name and line number, precisely so that diagnostic >>>messages can refer to the original text.

In fairness to Kuyper, however, the *text* from the original
source file is lost. E.g.,

term% cat n.c
#include <stdio.h>
#define FOO "hi"; // Note trailing `;`
int
main(void)
{
printf("%s\n", FOO);
return 0;
}
term% clang -fkeep-system-includes -E n.c
# 1 "n.c"
# 1 "<built-in>" 1
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "n.c" 2
#include <stdio.h> /* clang -E -fkeep-system-includes */
# 1 "n.c"
# 2 "n.c" 2

int
main(void)
{
printf("%s\n", "hi";);
return 0;
}
term%

In this example, the preprocessor macro `FOO` has been lost, and
only its expansion remains. The compiler has no information to
give a useful diagnostic.

Ah, but it does, as long as the original file is still there.

Mm, yeah, I suppose, as long as the original is still available.

$ gcc -c n.c
n.c: In function ?main?:
n.c:2:17: error: expected ?)? before ?;? token
2 | #define FOO "hi"; // Note trailing `;`
| ^
n.c:6:20: note: in expansion of macro ?FOO?
6 | printf("%s\n", FOO);
| ^~~
n.c:6:11: note: to match this ?(?
6 | printf("%s\n", FOO);
| ^
$

The output of `gcc -E` doesn't include the name FOO, but it does include
the line `# 3 "n.c"`, and that's enough information for the compiler to
open the original source file and copy information from it into an error >message.

(This is perhaps straying slightly off-topic, since the standard
only requires a diagnostic, but it's still interesting to see how
actual compilers do things.)

$ cat n.c
#include <stdio.h>
#define FOO "hi"; // Note trailing `;`
int
main(void)
{
printf("%s\n", FOO);
return 0;
}
$ gcc -E n.c >| n-preprocessed.c
$ grep FOO n-preprocessed.c
$ tail n-preprocessed.c
# 2 "n.c" 2

# 3 "n.c"
int
main(void)
{
printf("%s\n", "hi";);
return 0;
}
$ gcc -c n-preprocessed.c
n.c: In function ?main?:
n.c:6:24: error: expected ?)? before ?;? token
6 | printf("%s\n", FOO);
| ~ ^
| )
$

And if I rename n.c before compiling n-preprocessed.c, the error
messages doesn't include that line of code.

I feel like there is a Stallman joke in there struggling to get
out, but I can't quite get there.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 05, 2026 03:02:07

In article <1BoUR.3$lmCb.1@fx22.iad>, Scott Lurndal <slp53@pacbell.net> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[snip]

getch()
{
register int c, lastst;

while ((c=getc1())=='/' && !instring)
{
if ((c=getc1())!='*')
{
pushback(c);
return('/');
}
if (!skipcom)
{putc('/',fout); putc('*', fout);}
lastst=0;
while ( (c = getc1()) != '\0')
{
if (lastst && c=='/')
{
if (!skipcom)
putc('/', fout);
break;
}
if (c=='\n' || !skipcom)
putc(c, fout);
lastst = (c=='*');
}
if (c=='\0')break;
}
return(c);
}

Yeah, that's from `cc.c`, right?

No, it's from cpp.c

$ ls /work/reference/collegetapes/sltape/v6cc/
c0.c c00.c c01.c c02.c c03.c c04.c c05.c c1.h
c10.c c11.c c12.c c13.c c2.h c20.c c21.c cc.c cpp.c

Oh interesting. I don't have a `cpp.c` in my v6 archive.

I wonder what else I'm missing.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Friday, June 05, 2026 09:29:19

On 04/06/2026 21:29, Bart wrote:

On 04/06/2026 19:54, David Brown wrote:

On 04/06/2026 17:46, Bart wrote:

On 04/06/2026 15:27, David Brown wrote:

On 04/06/2026 15:18, Bart wrote:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more >>>>>> parentheses than needed in the context of most programming languages. >>>>>>
"(a*a) + (b*b) has too many parentheses", on the other hand, is a >>>>>> purely subjective opinion.

So, you're arguing 'more than needed' is a completely different
thing from 'too many'.

Of course they are different things - albeit related things, rather
than /completely/ different.� One is a question of fact, the other a
question of opinion, and they do not always coincide.

It is a fact that "a << (b + c)" has more parentheses than needed.
But I think we are both of the opinion that it does not have "too
many" parentheses - it has an appropriate number of parentheses.

So saying 'too many' of something will be a subjective opinion? OK,
so let's try compiling this bit of C:

�� void F(int, int);

�� int main() {
�� F(1, 2, 3);
�� }

8 out of 9 compilers reported 'Too many arguments'.

According to you, that's only their subjective opinion, not an
objective fact?

Again - /please/ stop trying to guess what people say or put words in
their mouths.� I can't remember ever seeing you do so accurately.

This is what you actually said:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more parentheses than needed in the context of most programming languages.

"(a*a) + (b*b) has too many parentheses", on the other hand, is a purely subjective opinion.� Even if it is true that this is "commonly agreed
to" (and AFAIK you have no basis for that claim), that would still be a subjective opinion - no matter how common that opinion is.

You're saying that:

How can this be /so/ difficult for you?

*� "more than needed" is objective

No, I said that "(a*a) + (b*b)" has more parentheses than needed in the context of most programming languages" is objective.

*� "too many" is subjective

No, I said that "(a*a) + (b*b) has too many parentheses" is subjective.

The context is /critical/. There are plenty of situations where the
words "more than needed" might turn up in a subjective phrase. There
are plenty of situations where "too many" might turn up in an objective phrase.

It is not those particular words that make the difference between
"subjective" and "objective". "Subjective" means there is a subject -
almost always a human subject - and the judgement or categorisation
depends on that person or persons. "Objective" means there is no person involved, and the judgement or categorisation is independent of any person.

A categorisation of an expression that depends on its meaning in C does
not involve a person - the judgement is mechanical and based solely on
the expression and the C standards. It is therefore objective. Any sufficiently intelligent and literate person will reach the same
decision even if they have never used C or any other programming language.

A categorisation of what people feel is too many parentheses in an
expression is entirely dependent on that person. Some people might be
happy with more, some people might prefer a minimum number allowed by
the language while maintaining the same semantics. Some might prefer
lots but be okay with fewer, or prefer fewer but understand why others
prefer more. Some might draw a hard line and say that more than three nestings is too much, others might have no limits. Some will say it
depends on the circumstances, drawing distinction between code that they
write and code they have to read, or code that is generated
automatically in some way. Clearly, this is all highly subjective.

Even though both are about exactly the same thing: superfluous but
harmless parentheses in an expression.

So you are picking on my choice of words, apparently in order to win
some stupid argument on the internet. Even though the same "too many"
phrase used elsewhere can be objective, according to you.

I don't care about the words - I care that you can make a distinction
between what is factual and objective, and what is opinionated and
subjective.

My suspicion is that you actually have a real, serious problem in this
area. Your programming has been so insular and isolated for so long,
that you are perhaps genuinely unable to make such distinctions - at
least in the context of programming. For you, programming revolves
entirely around /you/ - you designed your language(s), you implemented
it, you use it. Your language, and the programs you have written in it,
are part of you and have no non-subjective existence - and languages and programs that are not yours have only limited existence and relevance to
you. This makes it very difficult for you to distinguish between
objective matters, such as a language's syntax, and subjective matters,
such as coding style. For example, you appear to think that code
written in an unclear style means the syntax is ambiguous, conflating subjective opinion with objective fact. You view the C standards as a
set of guidelines, rather than a contract and specification, because in
your own programming world your language descriptions /are/ a set of guidelines and rough notes that you can change at a whim as easily and
often as you change code written in the language. In your programming
world, everything is subjective because it all comes from your personal
likes and dislikes, and everything seems objective because there are no
other people to have opinions or thoughts.

This looks like a pattern: people here seem to have remarkable trouble debating with me on actual ideas and resort instead to find hidden significance in the some choice of words I'd happen to use.

For discussions to have any chance of being productive, they have to
share a common language and understanding of terms and concepts.

"Too many parentheses" is subjective, because they affect the ease of
reading the code as a human reader.

And 'more than needed' isn't that?!

In the context it was used, that is correct. "More than needed" means
that some could be removed without changing the semantics of the
expression - it's meaning as a C expression.

Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Please give a reference for him saying that. (I'll save you the bother,
he has not made any remarks remotely like this in c.l.c. since I have
been here.)

Presumably, the same 99.9% will not use indentation, and will write
their programs all on one line anyway, because it is still after all completely unambiguous according to the C standard!

Don't presume - you make a fool out of yourself every time you do.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Friday, June 05, 2026 00:53:39

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsrpo$men2$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[snip]
Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Tim didn't say or imply that.

So what was his 99.9% all about? Nobody has a clue, except they are
certain that what I think it is is wrong!

Have you thought about, I don't know, maybe asking him?

At the risk of saying what may be obvious to everyone, Bart has
shown that he has no interest in having a serious, constructive,
useful, or productive conversation with anyone. His questions
are all rhetorical; he hasn't asked me a straight question
because he isn't really interested in what I would say. In
short, Bart isn't looking for an answer, he's looking for an
argument. My recommendation is just stop responding to him
altogether. My response to him upthread was a sincere effort to
provide a neutral and helpful answer to his question. Maybe my
remarks were helpful to other people, and if they were that's
good. Any further efforts to interact with Bart are not just a
waste of time but actually counterproductive. What Bart needs is
not help with understanding C but a good therapist. In any case
I'm confident that whatever Bart's needs may be, no one responding
to his postings here is in a position to provide them. Please
consider these remarks before responding to him further.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Friday, June 05, 2026 10:34:00

On 04/06/2026 21:13, Lew Pitcher wrote:

On Thu, 04 Jun 2026 21:04:50 +0200, David Brown wrote:

On 04/06/2026 19:47, Janis Papanagnou wrote:

On 2026-06-04 18:18, Scott Lurndal wrote:

Indeed, and in the early days, the compiler itself would never
have seen '/*' - the preprocessor (cpp) would have removed it
from the source before the source reached the first
pass of the compiler (c0).

Curious; was the comment-handling at some point in history removed
from the Cpp-processing? - If so, when was that? And I assume the
semantics are still the same; is that correct?

No, at least since the standardisation of the C language (including K&R
"standard"), "preprocessing" has been an integral part of the C language
and conversion of comments to space characters is done in phase 3 of the
translation. But the C standards do not give an explicit distinction
between "preprocessing" and "compiling" - just different translation
phases. (They do not define a "compiler" at all.) It is not uncommon
for implementations to separate translation into two or more programs,
especially in the good old days when hosts had much less memory, but
logically they are all one implementation. Distinguishing "the compiler
itself" is somewhat artificial.

In historic Unix (Version 7 and before), the preprocessor was implemented
as a separate program ("cpp") from the compiler ("cc"). The compiler itself had no facility to handle preprocessor directives, and was, itself, often divided into two separate programs ("cc0" and "cc1"). All three phases ("cpp", "cc0" and "cc1") were managed by a program ("cc"), although the program for each phase could be invoked independently through manual execution.

When you type "$(CC) main.c -o main" and get a program "main", there are usually a number of programs run in the process. Traditionally (and
still the case for some compilers) there was a split between the "preprocessor" and the "compiler". But such a split is artificial in
terms of the C implementation - as is having the compiler generate
assembly and pass it to a separate assembler, and a separate linker. A
C implementation translates C into a program suitable for running on the target - whether that is done using a single program or multiple
programs is implementation detail.

(In contrast, I have seen embedded C compilers where there was a single program that covered everything - preprocessing, compiling, assembling, linking, and also contained the standard headers and standard library as
part of the monolithic tool rather than separate files.)

What differs from today is that the preprocessor was an optional component, made available for a programmer's convenience.

I am not sure what you mean. I can run code through a C pre-processor
without compiling it today. I can write "manually pre-processed" C code
and compile it today without a pre-processing stage. I have rarely had
use of the former (perhaps debugging some macros), and never had need of
the later, but it is certainly possible.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Friday, June 05, 2026 10:41:10

On 2026-06-05 01:49, Dan Cross wrote:

[...]

[...]

[ ... (INT_MAX+1)*0 ]

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

I'm curious about that "violation"; a violation would require
(at least) two sorts of logical preconditions. - The first is
that all *sequentially* (literally) evaluated sub-expression
values are representable as value - INT_MAX+1 certainly can't
be represented in generated code that conforms to the abstract
*mathematical* value - but is that necessary if _the whole_
expression is (mathematically) just 0 (because of the final
factor). And the second (related) is whether the order of the
sub-expression evaluation is relevant; if we'd assume the
expression evaluation to be considered from right to left then
it would be irrelevant what's inside the parenthesis.

From the standard quotes I cannot really recognize that these
preconditions, how to determine UB/errors/violations, would be
necessary.

I'm no native speaker and I fear my question as formulated was
hard to understand. It's basically the question of the standard
implying (INT_MAX+1)*0 to be analyzed sequentially as written
or whether it could as well analyze it from right to left and
thus recognizing no problem, since from the mathematical view -
but also practically - a concrete representable value of a here
irrelevant sub-expression isn't necessary. Or another try of a
(paraphrased) formulation; for the determination of constraint
violations does the expression have strict (sort of) sequencing
points _after each term_ (and each left-to-right sub-expression
has to be well-defined) or can it be valued/analyzed as a whole
not putting any preconditions about evaluation order etc. when
determining the overall value?

Janis

PS: One yet non-considered question that was part of my original
post was: "Is there any rationale from the _software designer_'s
perspective?"

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Friday, June 05, 2026 11:04:51

On 05/06/2026 08:53, Tim Rentsch wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsrpo$men2$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[snip]
Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Tim didn't say or imply that.

So what was his 99.9% all about? Nobody has a clue, except they are
certain that what I think it is is wrong!

Have you thought about, I don't know, maybe asking him?

Asking him straight questions is usually futile. You can probably guess
this from the response below.

Notice he hasn't tried to enlighten anyone about that 99.9%.

That may just have been a throwaway line like when I say 'nobody likes
X', but I would still dispute that, if it's about what I think it is,
it's anything like a super-majority.

At the risk of saying what may be obvious to everyone, Bart has
shown that he has no interest in having a serious, constructive,
useful, or productive conversation with anyone. His questions
are all rhetorical; he hasn't asked me a straight question
because he isn't really interested in what I would say. In
short, Bart isn't looking for an answer, he's looking for an
argument. My recommendation is just stop responding to him
altogether. My response to him upthread was a sincere effort to
provide a neutral and helpful answer to his question. Maybe my
remarks were helpful to other people, and if they were that's
good. Any further efforts to interact with Bart are not just a
waste of time but actually counterproductive. What Bart needs is
not help with understanding C but a good therapist. In any case
I'm confident that whatever Bart's needs may be, no one responding
to his postings here is in a position to provide them. Please
consider these remarks before responding to him further.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Friday, June 05, 2026 12:39:43

On 05/06/2026 08:29, David Brown wrote:

On 04/06/2026 21:29, Bart wrote:

You're saying that:

How can this be /so/ difficult for you?

*� "more than needed" is objective

No, I said that "(a*a) + (b*b)" has more parentheses than needed in the context of most programming languages" is objective.

*� "too many" is subjective

No, I said that "(a*a) + (b*b) has too many parentheses" is subjective.

If anyone is interested (which I doubt; bart-bashing is much more fun),
this is the original context:

TR:

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

BC:

Actual examples of too many parentheses?

TR:

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

Here it is clear that 'too many' was just a paraphrase of 'unnecessary'.
Here is my followup to TR:

BC:

My point was that it could be objective, at least for too many.

For an infix syntax where * has higher priority than +, then it is a
fact that the () in (a*a) + (b*b) are not necessary.

So, assume a minimum number of () needed to properly parse an expression according to intent. Then:

(1) TOO FEW: necessarily has to be subjective. It suggests a desire for
more () than the minimum, but the exact number will vary.

(2) TOO MANY, MORE THAN NEEDED, ETC: These can objective if refering to
any number of extra () above the mininum. This is the point I made
above, the one I defended.

(3) TOO MANY, MORE THAN NEEDED, ETC: These can also be used in a
judgemental manner, and there are subjective. This is where a certain
number of extra () are accepted for readability etc, but the exact level
will vary.

If this is the point people have been trying to make, then they've been
doing it incredibly badly, and been unnecessarily unpleasant and insulting.

My own view is that C syntax has too much of (3), but necessarily so
because of the choices made in its operator levels.

The syntaxes I work on tend to have more of (2); () is less often needed
for readability because of more sensible design choices. And IMO less
often needed for overrides too, for the same reasons.

For example, where C has (*P).m or (*Q)[i], I'd write P^.m or Q^[i],
since I chose a postfix rather then prefix deferences operator.

In general, for the same programs, C will probably use at least 20% more parentheses.

Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Please give a reference for him saying that.� (I'll save you the bother,
he has not made any remarks remotely like this in c.l.c. since I have
been here.)

Find out what was the subject of the 99.9% (even if that was an
exaggeration). Then we'll talk.

No, he didn't use the word 'machines'; I paraphrased to suggest
supernormal people who know everything and never make mistakes.

You're going to argue about this now?

Presumably, the same 99.9% will not use indentation, and will write
their programs all on one line anyway, because it is still after all
completely unambiguous according to the C standard!

Don't presume - you make a fool out of yourself every time you do.

And you proceed to do exactly the same; Bart must be wrong, but you
don't about what!

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Friday, June 05, 2026 05:34:20

I didn't read Bart's posting. Unfortunately it seems
true that any continued interaction with his comments
is counterproductive.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Friday, June 05, 2026 05:49:58

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

One advantage of having a single program do the whole thing, is
that error messages can mention the actual text of the line where
a problem was detected, without any pre-processing applied.

Typical preprocessors emit directives that tell the compiler
about the current file name and line number, precisely so that
diagnostic messages can refer to the original text.

For example:

$ cat hello.c
#include <stdio.h>
int main(void) {
printf("Hello world!\n");
}
$ gcc -E hello.c | tail
extern int __uflow (FILE *);
extern int __overflow (FILE *, int);
# 983 "/usr/include/stdio.h" 3 4

# 2 "hello.c" 2

# 2 "hello.c"
int main(void) {
printf("Hello world!\n");
}
$

The line `# 2 "hello.c"` is, according to the C standard, a
"non-directive", which is a kind of directive. Executing a
non-directive has undefined behavior,

Since it is gcc that is generating the non-directives, for
internal purposes, and gcc that is consuming them, it hardly
seems worth worrying about whether their behavior is defined
or not.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Friday, June 05, 2026 06:41:23

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Note that in a context that requires a constant expression, overflow is
a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

gcc disagrees with you.

What makes you think so?

[...]

I'm skipping this and proceeding on to the original question.

But taking a closer look at the standard, I'm not 100% sure that the
language requires a diagnostic, though I think that's the intent.
The relevant constraint is:

Each constant expression shall evaluate to a constant that is
in the range of representable values for its type.

If I squint really hard, I can argue that the entire expression
has to be a constant expression, but it doesn't say that its
subexpressions are constant expressions -- and *if* INT_MAX +
1 evaluates to INT_MIN in the current implementation, then
(INT_MAX + 1) * 0 evaluates to 0 and therefore satisfies the
constraint.

My reasoning is as follows.

To determine if the constraint is satisfied, the compiler must
first evaluate the expression (INT_MAX + 1) * 0.

To evaluate the expression (INT_MAX + 1) * 0, the compiler must
first evaluate the sub-expression (INT_MAX + 1).

Because the expression (INT_MAX + 1) overflows, the behavior is
undefined, and the compiler is free to decide that the value of
the sub-expression (INT_MAX + 1) is, let's say, 12.

The compiler next evaluates the overall expression as 12*0, which
is 0 (an int).

This result of the overall expression satisfies the constraint,
and so the compiler is not obliged to generate a diagnostic.

Going back, when evaluating (INT_MAX + 1), the compiler could
have decided to choose the value 3.14159e47. In that case the
value of the overall expression would be 0.0. This value has
type double, which does not satisfy the constraint that the
result have integer type. Thus if the compiler had made this
decision then a diagnostic would be required.

Overall conclusion: whether a diagnostic is required depends on
what behavior is chosen for the construct (INT_MAX + 1). The
implementation could choose a behavior where the constraint is
satisfied, or it could choose a behavior where the constraint is
not satisfied.

But INT_MAX + 1 could legally trap, for example, and I don't
believe it was intended that a given expression can be a constant
expression or not depending on the vagaries of the behavior of an
instance of UB.

I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Friday, June 05, 2026 15:42:59

On 05/06/2026 13:39, Bart wrote:

On 05/06/2026 08:29, David Brown wrote:

On 04/06/2026 21:29, Bart wrote:

You're saying that:

How can this be /so/ difficult for you?

*� "more than needed" is objective

No, I said that "(a*a) + (b*b)" has more parentheses than needed in
the context of most programming languages" is objective.

*� "too many" is subjective

No, I said that "(a*a) + (b*b) has too many parentheses" is subjective.

If anyone is interested (which I doubt; bart-bashing is much more fun),
this is the original context:

I am writing in a detailed and repetitive maner to be sure there are no misunderstandings, not as "bart-bashing".

TR:

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading.� Good writing is always a balance
between too much and too little.

This is clearly about "too many" or "too few" as a subjective matter -
i.e., in addition to the minimum required for the desired semantics.
(The minimum requirements are objective, so the code has the correct C semantics - additional parentheses are about style and clarity, which
are subjective.)

BC:

Actual examples of too many parentheses?

I assume here we are again talking about "too many" beyond the necessary number. Coming from anyone else, I would happily assume they are
talking about subjective opinions - "Can you give examples of real-world
code where you think there are too many unnecessary parentheses,
resulting in code that is harder to read than it would otherwise be?"
Coming from you, it might also mean the nonsensical question "Can you
give examples of code that objectively has too many unnecessary
parentheses?".

TR:

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

Here it is clear that 'too many' was just a paraphrase of 'unnecessary'.

No, it is not. In the expression "a << (b + c)", there are unnecessary parentheses, but not - IMHO - too many parentheses. That is because "unnecessary" (in this context - and don't generalise from it) is an
objective matter of whether or not the semantics of the expression are affected by the parentheses. "Too many" (in this context) is a
subjective matter of clarity of code. In my opinion, the parentheses
are helpful and there are therefore not too many of them - but as a
matter of C semantics, they are objectively unnecessary.

Again, I am unable to read Tim's mind, and I am not accountable for what
he writes or how he writes it. But to my reading, it is quite clear
that "too many" is /not/ a paraphrase of "unnecessary".

Here is my followup to TR:

BC:

My point was that it could be objective, at least for too many.

Yes, you wrote that. You are wrong. At least, you are wrong until
someone exceeds the 63 levels of nesting that are required to be
supported by conforming compilers, but I do not believe that is
something you are considering.

For an infix syntax where * has higher priority than +, then it is a
fact that the () in (a*a) + (b*b) are not necessary.

Agreed.

So, assume a minimum number of () needed to properly parse an expression according to intent. Then:

No, don't assume that. "Intent" implies reading the mind of the
programmer. There is no such thing as "obvious intent" - there is the objective semantics of what the programmer writes, and the subjective
ease with which people (including the programmer himself/herself) can
read the code and understand the semantics of it. The former depends
solely on the code written, the later depends significantly on the
people reading it.

Let us rather assume a minimum number of parentheses so that removing
any would change the semantics of the expression. That is an objective measure.

(1) TOO FEW: necessarily has to be subjective. It suggests a desire for
more () than the minimum, but the exact number will vary.

Agreed. (And we would both share the opinion that "a << b + c" has too
few parentheses because we would feel it is easier to read with more parentheses - while we would both think that "a * a + b * b" does not
have too few.)

(2) TOO MANY, MORE THAN NEEDED, ETC: These can objective if refering to
any number of extra () above the mininum. This is the point I made
above, the one I defended.

Nope.

"a << (b + c)" has "more than needed" - that is objective.

"a << (b + c)" does not have "too many" in an objective sense, because
the extra parentheses have not affected any objective characteristic of
the expression - the semantics are the same. Some people may
subjectively feel there are "too many" because they think "a << b + c"
is clearer - others will have different subjective opinions.

That is the context of the phrases we have had, and how they have been used.

Terms like "too many" or "more than needed" can be used in different
contexts, and have different meanings. If you have a bowl that can hold
6 apples, and you try to put 10 apples in the bowl, that is objectively
"too many". If you write "that expression has more parentheses than
needed to make the meaning clear to readers", then that is a subjective
claim - it does not say anything about the number of parentheses needed
to express the semantics in C (that's objective), but talks about the subjective views of readers.

You cannot take a phrase like these and say "this is always objective"
or "this is always subjective" - the context is always critical.

(3) TOO MANY, MORE THAN NEEDED, ETC: These can also be used in a
judgemental manner, and there are subjective. This is where a certain
number of extra () are accepted for readability etc, but the exact level will vary.

If this is the point people have been trying to make, then they've been doing it incredibly badly, and been unnecessarily unpleasant and insulting.

I cannot speak for the intentions of others, but it has certainly been
very frustrating trying to get you to understand the distinction between objective facts and subjective opinions, and trying to get you to stop re-writing other people's words and to stop taking partial quotations
out of context and wildly and inaccurately generalising them.

My own view is that C syntax has too much of (3), but necessarily so
because of the choices made in its operator levels.

That's a subjective opinion. I would agree with it, to at least some
extent - some of the precedence order is not as I would have picked.
But given that there are situations where I would include additional parentheses in C code despite agreeing with the precedence order, I
don't think the C syntax rule choices are the issue. I don't believe I
would use fewer parentheses even if << and >> had the same precedence
level as * and /, or if the bitwise operators had higher precedence than equality and other relational operators.

Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Please give a reference for him saying that.� (I'll save you the
bother, he has not made any remarks remotely like this in c.l.c. since
I have been here.)

Find out what was the subject of the 99.9% (even if that was an exaggeration). Then we'll talk.

Again, I am not responsible for what Tim (or anyone else) writes. If
you have asked him for clarification, and he has not given a
satisfactory answer, there's little more to do.

No, he didn't use the word 'machines'; I paraphrased to suggest
supernormal people who know everything and never make mistakes.

You're going to argue about this now?

Normally there is nothing wrong with paraphrasing, though in this
discussion it would make a lot more sense to be precise about
quotations. However, wildly exaggerating what someone says is not "paraphrasing". It is misrepresenting them, and is dishonest when done intentionally and knowingly.

Presumably, the same 99.9% will not use indentation, and will write
their programs all on one line anyway, because it is still after all
completely unambiguous according to the C standard!

Don't presume - you make a fool out of yourself every time you do.

And you proceed to do exactly the same; Bart must be wrong, but you
don't about what!

I am not presuming - I was making a comment based on past history. It
would be nice if it changed, either because you stop trying to guess
what people think or might say, and stop distorting what they write.
Put a bit more effort into reading peoples posts, and less effort into
the paranoia, and I'm sure you'll feel the threads are more productive.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Friday, June 05, 2026 14:04:19

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1BoUR.3$lmCb.1@fx22.iad>, Scott Lurndal <slp53@pacbell.net> wrote: >>cross@spitfire.i.gajendra.net (Dan Cross) writes:

[snip]

<snip>

Yeah, that's from `cc.c`, right?

No, it's from cpp.c

$ ls /work/reference/collegetapes/sltape/v6cc/
c0.c c00.c c01.c c02.c c03.c c04.c c05.c c1.h
c10.c c11.c c12.c c13.c c2.h c20.c c21.c cc.c cpp.c

Oh interesting. I don't have a `cpp.c` in my v6 archive.

I wonder what else I'm missing.

For your archive, cpp.c

#
# include <stdio.h>
/* C command */

# define SBSIZE 15000
# define SYMSIZ 1500
# define TOKLEN 16
# define DROP (-2)
# define SAME 0
# define MAXINC 10
char sbf[SBSIZE];
# define CHSPACE 1000
char ts[CHSPACE+50];
# define EXPSIZE 500
char *strdex(), *copy(), *calloc(), *token(), *coptok();
char *tsa ts;
char *tsp ts;
char *fnames[MAXINC];
# define LINELEN 512
FILE *fin;
FILE *fout;
int instring;
char direct[50];
int nd 1;
char *dirs[10] {direct, 0};
char nfil[100];
int pflag;
int depth;
int skipcom;
FILE *fins[MAXINC];
int ifno;
char *lp;
char *line;
# define NPREDEF 20
char *prespc[NPREDEF];
char **predef prespc;
char *punspc[NPREDEF];
char **prund punspc;
char **predp;
int lineno[MAXINC];
int exfail;
struct symtab {
char name[TOKLEN];
char *value;
} *symtab, *lookup();
struct symtab *defloc;
struct symtab *udfloc;
struct symtab *incloc;
struct symtab *ifloc;
struct symtab *elsloc;
struct symtab *eifloc;
struct symtab *ifdloc;
struct symtab *ifnloc;
struct symtab *sysloc;
struct symtab *lneloc;
struct symtab *prdloc;
int trulvl;
int flslvl;
char *stringbuf;

mainpp(argc,argv)
char *argv[];
{
int i;
# ifdef tgp
int ifbrk;
# endif
char ln[LINELEN];
register int c;
register char *rlp;
char *sp;
struct symtab stab[SYMSIZ];

fin = stdin;
fout = stdout;
# ifdef unix
fnames[ifno=0] = "";
# endif
# ifdef gcos
fnames[ifno=0] = "s*";
# endif
# ifdef ibm
fnames[ifno=0] = "";
# endif
for(i=1; i<argc; i++)
{
switch(argv[i][0])
{
case '-':
switch(argv[i][1])
{
case 'P':
pflag++;
case 'E':
continue;
case 'D':
if (predef>prespc+NPREDEF)
{
error("too many -D options, ignoring %s",argv[i]);
continue;
}
*predef++ = argv[i]+2;
continue;
case 'U':
if (prund>punspc+NPREDEF)
{
error("too many -U options, ignoring %s",argv[i]);
continue;
}
*prund++ = argv[i]+2;
continue;
case 'I':
if (nd>8)
error("excessive -I file (%s) ignored",argv[i]);
else
dirs[nd++] = argv[i]+2;
continue;
case '\0': continue;
default:
error("unknown flag %s", argv[i]);
continue;
}
default:
if (fin==stdin)
{
fin = fopen(argv[i], "r");
if (fin==NULL)
{
error("No source file %s",argv[i]);
exit(8);
}
fnames[ifno]=argv[i];
strcpy(direct, argv[i]);
for(sp=direct; *sp; sp++);
while (sp>direct && *sp != '/') sp--;
# ifdef unix
if (sp==direct)
*sp++ = '.';
# endif
*sp=0; /* direct now has place where source file is */
}
else
if (fout==stdout)
{
fout= fopen(argv[i], "w");
if (fout==NULL)
{
error("Can't write %s", argv[i]);
exit(8);
}
}
else
error("extraneous name %s", argv[i]);
}
}

fins[ifno]=fin;
exfail = 0;
/* after user -I files here are the standard include libraries */
# ifdef unix
dirs[nd++] = "/usr/include";
# endif
# ifdef gcos
dirs[nd++] = "cc";
# endif
# ifdef ibm
dirs[nd++] = "stdio.";
# endif
/* dirs[nd++] = "/compool"; */
dirs[nd++] = 0;
symtab = stab;
for (c=0; c<SYMSIZ; c++) {
stab[c].name[0] = '\0';
stab[c].value = 0;
}
insym(&defloc, "define");
insym(&udfloc, "undef");
insym(&incloc, "include");
insym(&elsloc, "else");
insym(&eifloc, "endif");
insym(&ifdloc, "ifdef");
insym(&ifnloc, "ifndef");
insym(&ifloc, "if");
# ifdef unix
insym(&sysloc, "unix");
# endif
# ifdef gcos
insym (&sysloc, "gcos");
# endif
# ifdef ibm
insym (&sysloc, "ibm");
# endif
insym(&lneloc, "line");
predp=predef;
while (predp>prespc)
if (sp=strdex(*--predp, '='))
{
*sp++=0;
stsym(*predp, sp);
}
else
insym(&prdloc, *predp);
predp=prund;
while (predp>punspc)
{
if (sp=strdex(*--predp, '='))
*sp++=0;
lookup(*predp, DROP);
}
stringbuf = sbf;
trulvl = 0;
flslvl = 0;
line = ln;
lineno[0] = 1;
if (pflag==0) fprintf(fout, "# 1 \"%s\"\n", fnames[ifno]);
while(getline()) {
skipcom=0;
if (ln[0] != '#' && flslvl==0)
{
# ifdef tgp
ifbrk= checklen(line);
# endif
for (rlp = line; c = *rlp++;)
putc(c, fout);
# ifdef tgp
if (ifbrk)
fprintf(fout,"\n# %d",lineno[ifno]);
# endif
}
putc('\n', fout);
}
# ifdef tgp
checklen(line);
# endif
for(rlp=line; c = *rlp++;)
putc(c,fout);
}

getline()
{
register int c, sc, state;
struct symtab *np;
char *namep, *filname, **dirp;
int filok, inctype;

lp = line;
*lp = '\0';
state = 0;
if ((c=getch()) == '#')
state = 1;
while (c!='\n' && c!='\0') {
if (letter(c)) {
namep = lp;
sch(c);
while (letnum(c=getch()))
sch(c);
sch('\0');
lp--;
if (state==6)
{
lookup(namep, DROP);
goto out;
}
if (state>3 && state <6) {
if (flslvl==0 &&(state+!lookup(namep,-1)->name[0])==5)
trulvl++;
else
flslvl++;
out:
while (c!='\n' && c!= '\0')
c = getch();
return(c);
}
if (state==3) /* include */
if (*namep != '"' && *namep != '<')
{
error("Bad include syntax", 0);
state=1;
}
if (state!=2 || flslvl==0)
{
pushback(c);
np = lookup(namep, state);
c = getch();
}
if (state==1) {
if (np==defloc)
skipcom = state = 2;
else if (np==incloc)
state = 3;
else if (np==ifnloc)
state = 4;
else if (np==ifdloc)
state = 5;
else if (np==eifloc) {
if (flslvl)
--flslvl;
else if (trulvl)
--trulvl;
else errback("If-less endif",0);
goto out;
}
else if (np==elsloc) {
if (flslvl)
--flslvl? ++flslvl : ++trulvl;
else if (trulvl)
{++flslvl; --trulvl;}
else
errback("If-less else",0);
goto out;
}
else if (np==udfloc) {
state=6;
}
else if (np==ifloc) {
/*
if (flslvl ==0 && yyparse())
*/ error("IF not implemented, true assumed",0); if (1)
trulvl++;
else
flslvl++;
return('\n');
}
else if (np==lneloc)
{
if(pflag==0) fprintf(fout, "# ");
lp=line;
for(; c !='\n' && c != '\0'; c=getch())
if (!pflag)
sch(c);
sch('\0');
return(c);
}
else {
errback("Undefined control",0);
while (c!='\n' && c!='\0')
c = getch();
return(c);
}
} else if (state==2) {
if (flslvl)
goto out;
np->value = stringbuf;
if (c != '\n' && c != 0)
{
savch(c);
while ((c=getch())!='\n' && c!='\0')
{
if (c== '\\')
{
c = getch();
if (c=='\n')continue;
savch('\\');
}
savch(c);
}
}
savch('\0');
return(1);
}
continue;
} else if ((sc=c) == '\'' || sc== '"' || (state==3 && sc== '<')) {
sch(sc);
filname = lp;
inctype = sc=='<';
if (sc== '<')
{
/*
fprintf(fout==stdout?stderr:stdout, "note: include <> obsolete, use \"\"\n");
*/
sc= '>';
}
instring++;
while ((c=getch())!=sc && c!='\n' && c!='\0') {
sch(c);
if (c=='\\')
sch(getch());
}
instring = 0;
if (flslvl)
goto out;
if (state==3) {
if (flslvl)
goto out;
*lp = '\0';
while ((c=getch())!='\n' && c!='\0');
if (ifno+1 >=MAXINC)
error("Unreasonable include nesting",0);
filok=0;
for(dirp=dirs+inctype; *dirp; dirp++)
{
if (filname[0]=='/' || **dirp=='\0')
strcpy(nfil,filname);
else
{
strcpy(nfil,*dirp);
# ifdef unix
strcat(nfil, "/");
# endif
# ifdef gcos
strcat(nfil, "/");
# endif
# ifdef ibm
strcat(nfil, ".");
# endif
strcat(nfil, filname);
}
if ( (fins[ifno+1]=fopen(nfil, "r"))!=NULL)
{
filok=1;
fin = fins[++ifno];
break;
}
}
if (filok==0)
errback("Can't find include file %s", filname);
else
{
if (pflag==0) fprintf(fout, "\n# 1 \"%s\"", filname);
lineno[ifno]=1;
fnames[ifno] = copy(filname);
}
return(c);
}
}
sch(sc=c);
c = getch();
if (isdigit(sc))
{
for (;isalpha(c) || isdigit(c); c=getch())
sch(c);
}
}
sch('\0');
if (state>1)
errback("Control syntax",0);
return(c);
}
insym(sp, namep)
struct symtab **sp;
char *namep;
{
register struct symtab *np;
*sp = np = lookup(namep, 1);
np -> value = np -> name;
}

stsym(namep, valp)
char *namep, *valp;
{
register struct symtab *np;

np = lookup(namep, 1);

value = valp;

}

error(s, x)
char *s;
{
FILE *efout;
efout = fout==stdout ? stderr : stdout;
if (fnames[ifno][0])
fprintf(efout,"%s: %d: ", fnames[ifno], lineno[ifno]);
fprintf(efout, s, x);
putc('\n',efout);
exfail++;
}
errback(s,x)
char *s;
{
lineno[ifno]--;
error(s,x);
lineno[ifno]++;
}

sch(c)
{
register char *rlp;

rlp = lp;
if (rlp==line+LINELEN-2)
error("Line overflow", 0);
*rlp++ = c;
if (rlp>line+LINELEN-1)
rlp = line+LINELEN-1;
lp = rlp;
}

savch(c)
{
*stringbuf++ = c;
if (stringbuf-sbf < SBSIZE)
return;
error("Too much defining", 0);
exit(exfail);
}

getch()
{
register int c, lastst;

while ((c=getc1())=='/' && !instring)
{
if ((c=getc1())!='*')
{
pushback(c);
return('/');
}
if (!skipcom)
{putc('/',fout); putc('*', fout);}
lastst=0;
while ( (c = getc1()) != '\0')
{
if (lastst && c=='/')
{
if (!skipcom)
putc('/', fout);
break;
}
if (c=='\n' || !skipcom)
putc(c, fout);
lastst = (c=='*');
}
if (c=='\0')break;
}
return(c);
}
char pushbuff[EXPSIZE];
char *pushp pushbuff;
pushback(c)
{
*++pushp = c;
if (pushp>pushbuff+EXPSIZE) {
error("too much backup", 0);
exit(8);
}
}

getc1()
{
register c;

if (*pushp !=0)
return(*pushp--);
depth=0;
if ((c = getc(fin)) == EOF && ifno>0) {
fclose(fin);
fin = fins[--ifno];
if (pflag==0) fprintf(fout, "\n# %d \"%s\"\n",lineno[ifno], fnames[ifno]);
c = getc1();
if (c=='\n') lineno[ifno]--;
}
if (c==EOF)
return(0);
if (c=='\n' )
lineno[ifno]++;
return(c);
}

struct symtab *
lookup(namep, enterf)
char *namep;
{
register char *np, *snp;
register struct symtab *sp;
int i, c, around;
np = namep;
snp = np+TOKLEN;
around = i = 0;
while ( (c = *np++ ) && (np-snp)<0)
{
i =+ c;
}
i =% SYMSIZ;
sp = &symtab[i];
while (sp->name[0]) {
if (sp->name[0] != DROP)
{
snp = sp->name;
np = namep;
while (*snp++ == *np)
if (*np++ == '\0' || np==namep+TOKLEN) {
if (enterf==DROP)
{
sp->name[0]= DROP;
return(sp);
}
if (!enterf)
subst(namep, sp);
return(sp);
}
}
if (++sp >= &symtab[SYMSIZ])
if (around++)
{
error("too many defines", 0);
exit(exfail);
}
else
sp = symtab;
}
if (enterf>0) {
snp = namep;
for (np = &sp->name[0]; np < &sp->name[TOKLEN];)
if (*np++ = *snp)
snp++;
}
return(sp);
}
char revbuff[200], *bp;
backsch(c)
{
if (bp-revbuff > 200)
error("Excessive define looping", bp--);
*bp++ = c;
}

subst(np, sp)
char *np;
struct symtab *sp;
{
register char *vp;
int macflg;

lp = np;
bp = revbuff;
if (depth++>100)
{
error("define recursion loop on %s", np);
return;
}
if ((vp = sp->value) == 0)
return;
macflg= (*vp == '(');
/* arrange that define unix unix still
has no effect, avoiding rescanning */
while (blank(*vp))
vp++;
if (strcmp(sp->name,vp) == SAME)
{
while (*vp)
sch(*vp++);
return;
}
if (macflg)
expdef(vp);
else
while (*vp)
backsch(*vp++);
while (bp>revbuff)
pushback(*--bp);
}

char *
copy(as)
char as[];
{
register char *otsp, *s;
int i;

otsp = tsp;
s = as;
while(*tsp++ = *s++);
if (tsp >tsa+CHSPACE)
{
# ifdef unix
tsp = tsa = i = calloc(CHSPACE+50,sizeof(char));
if (i== NULL)
# endif
{
error("no space for file names", 0);
exit(8);
}
}
return(otsp);
}

expdef(proto)
char *proto;
{
char buffer[EXPSIZE], *parg[20], *pval[20], name[20], *cspace, *wp;
char protcop[EXPSIZE], *pr;
int narg, k, c;
pr = protcop;
while (*pr++ = *proto++)
if (pr>=protcop+EXPSIZE){
error("define prototype too big", 0);
exit(8);
}
proto= protcop;
for (narg=0; (parg[narg] = token(&proto)) != 0; narg++)
;
/* now scan input */
cspace = buffer;
while ((c=getch()) == ' ');
if (c != '(')
{
error("defined function requires arguments", 0);
return;
}
pushback(c);
for(k=0; pval[k] = coptok(&cspace, buffer+EXPSIZE); k++);
if (k!=narg)
{
error("define argument mismatch");
return;
}
while (c= *proto++)
{
if (!letter(c))
backsch(c);
else
{
wp = name;
*wp++ = c;
while (letnum(*proto))
*wp++ = *proto++;
*wp = 0;
for (k=0; k<narg; k++)
if(strcmp(name,parg[k]) == SAME)
break;
wp = k <narg ? pval[k] : name;
while (*wp) backsch(*wp++);
}
}
}

char *
token(cpp) char **cpp;
{
char *val;
int stc;
stc = **cpp;
*(*cpp)++ = '\0';
if (stc==')') return(0);
while (**cpp == ' ') (*cpp)++;
for (val = *cpp; (stc= **cpp) != ',' && stc!= ')'; (*cpp)++)
{
if (!letnum(stc) || (val == *cpp && !letter(stc)))
{
error("define prototype argument error");
return(0);
}
}
return(val);
}

char *
coptok (cpp, clim) char **cpp, *clim;
{
char *val;
int stc, stop,paren;
paren = stop = 0;
val = *cpp;
if (getch() == ')')
return(0);
while (((stc = getch()) != ',' && stc != ')' ) || paren > 0 || stop >0)
{
if (stc == '\0')
{
error("non terminated macro call", 0);
val = 0;
break;
}
if (stop == 0 && (stc == '"' || stc == '\''))
stop = stc;
else if (stc==stop)
stop=0;
if ( stc == '\\')
{
stc = getch();
if (stop>0 || (stc != ',' && stc != '\\'))
*(*cpp)++ = '\\';
*(*cpp)++ = stc;
}
else
{
*(*cpp)++ = stc;
if (stop==0)
{
if (stc == '(')
paren++;
if (stc == ')')
paren--;
}
}
if (*cpp >= clim)
{
error("define argument too long",0);
exit(8);
}
}
*(*cpp)++ = 0;
pushback(stc);
return(val);
}
letter(c)
{
if (isalpha(c) || c == '_')
return (1);
else
return(0);
}
letnum(c)
{
if (letter(c) || isdigit(c))
return(1);
else
return(0);
}

blank(c)
{
return(c==' ' || c== '\t');
}

char *
strdex(s,c)
char *s;
{
while (*s)
if (*s==c)
return(s);
else
s++;
return(0);
}
# ifdef tgp
# define MAXOUT 80
checklen(sln)
char *sln;
{
/* for tgp: scans string sln, and puts in newlines for blanks,
where it likes, but to make lines less than MAXOUT chars long */

char *p, *s, *st;
int stopc, back, ifdone, c;
st=s=sln;
ifdone=p=stopc=back=0;
while (c= *s++)
{
if (c == '\\')
back=2;
if (back==0)
{
if (stopc== c)
stopc=0;
else
if (c == '"' || c == '\'')
stopc= c;
}
if (back>0)back--;
if (s-st >MAXOUT && p != 0)
{
st=p;
*p= '\n';
ifdone=1;
}

if (stopc==0 && back==0)
if (c==' ') p=s-1;;
}
return(ifdone);
}
# endif

main(argc,argv) char *argv[]; {
exit(mainpp (argc,argv) );
}

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Friday, June 05, 2026 16:50:21

On 05/06/2026 14:42, David Brown wrote:

On 05/06/2026 13:39, Bart wrote:

"a << (b + c)" has "more than needed" - that is objective.

"a << (b + c)" does not have "too many" in an objective sense, because

OK. Suppose "too many" /is/ subjective; what actual difference does it
make to anything?

I cannot speak for the intentions of others, but it has certainly been
very frustrating trying to get you to understand the distinction between objective facts and subjective opinions,

Why is that even important? I asked:

Actual examples of too many parentheses?

The reply was:

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

I didn't introduce this objective/subjective business. It seems now more
like a ploy to devalue any arguments of mine, and also to evade
answering; I'm still waiting for those examples from TR!

These were his prior comments:

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

So he obviously has his own tolerance level. I would also guess those 'writers' belong to that 0.1%.

I would actually agree that parentheses can add clutter, but not that
the answer is to not use them when they are optional.

It C they are often added many of us (we are a lot more than 0.1%) need
them to more easily parse code. That doesn't mean we are stupid.

I suggested that minimising parentheses because the result is still 'unambiguous' is equivalent to doing away with indentation for the same reason.

People didn't like that. Yet indentation and extra parentheses /are/
both redundant.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, June 05, 2026 10:49:28

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-05 01:49, Dan Cross wrote:

[...]

[...]

[ ... (INT_MAX+1)*0 ]

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

I'm curious about that "violation"; a violation would require
(at least) two sorts of logical preconditions. - The first is
that all *sequentially* (literally) evaluated sub-expression
values are representable as value - INT_MAX+1 certainly can't
be represented in generated code that conforms to the abstract
*mathematical* value - but is that necessary if _the whole_
expression is (mathematically) just 0 (because of the final
factor). And the second (related) is whether the order of the
sub-expression evaluation is relevant; if we'd assume the
expression evaluation to be considered from right to left then
it would be irrelevant what's inside the parenthesis.

If the expression were evaluated right to left, it would still
compute INT_MAX+1, which is UB.

Let's look at an example where it's not in a context that requires a
constant expression:

int n;
n = (INT_MAX+1)*0;

In the abstract machine, the RHS is evaluated by adding INT_MAX
and 1 (which overflows, UB) and then multiplying the result by 0.

A compiler is allowed, but not required, to reduce the assignment to
`n = 0;`. If it does so, then no overflow occurs at run time --
but the definedness of the behavior is determined independent of
any optimizations. The C standard does not require any particular
behavior. It can set n to 0 because that's a valid consequence
of UB.

Let's take an example where it's definitely in a context that
requires an integer constant expression:

switch (0) {
case (INT_MAX+1)*0:
break;
}

The wording in 6.6 (Constant expressions) is slightly vague.
For example, I would assume that any subexpression of a constant
expression must be a constant expression, but it doesn't actually
say so.

But since, in the abstract machine, (INT_MAX+1)*0 doesn't yield
any defined value, I'd say it violates the constraint that "Each
constant expression shall evaluate to a constant that is in the
range of representable values for its type".

The alternative would be for to be a constant expression for
implementations that are able to recognize that anything multiplied
by zero is zero (analysis that compilers aren't required to perform),
and not for others.

On the other hand, "An implementation may accept other forms of
constant expressions; however, it is implementation-defined whether
they are an integer constant expression." That probably allows,
but does not reuqire, an implementation to treat (INT_MAX+1)*0 as
a constant expression with the value 0.

From the standard quotes I cannot really recognize that these
preconditions, how to determine UB/errors/violations, would be
necessary.

I'm no native speaker and I fear my question as formulated was
hard to understand. It's basically the question of the standard
implying (INT_MAX+1)*0 to be analyzed sequentially as written
or whether it could as well analyze it from right to left and
thus recognizing no problem, since from the mathematical view -
but also practically - a concrete representable value of a here
irrelevant sub-expression isn't necessary. Or another try of a
(paraphrased) formulation; for the determination of constraint
violations does the expression have strict (sort of) sequencing
points _after each term_ (and each left-to-right sub-expression
has to be well-defined) or can it be valued/analyzed as a whole
not putting any preconditions about evaluation order etc. when
determining the overall value?

PS: One yet non-considered question that was part of my original
post was: "Is there any rationale from the _software designer_'s perspective?"

From a programmer's perspective, it's good to have consistent
rules rather than leaving the decision of whether an expression
is a constant expression up to the undocumented vagaries of how
clever a compiler happens to be.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, June 05, 2026 11:01:24

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[...]

The line `# 2 "hello.c"` is, according to the C standard, a
"non-directive", which is a kind of directive. Executing a
non-directive has undefined behavior,

Since it is gcc that is generating the non-directives, for
internal purposes, and gcc that is consuming them, it hardly
seems worth worrying about whether their behavior is defined
or not.

I wasn't worried. I just mentioned in in passing.

You quoted most of the article, but snipped relevant context in
the middle of a sentence.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, June 05, 2026 11:09:34

Bart <bc@freeuk.com> writes:

On 05/06/2026 08:29, David Brown wrote:

On 04/06/2026 21:29, Bart wrote:

[...]

TR:

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

BC:

Actual examples of too many parentheses?

TR:

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

Here it is clear that 'too many' was just a paraphrase of
'unnecessary'.

No, it is clear that "too many" and "unnecessary" have two different
meanings.

I think you and I agree that the parentheses in `a << (b + c)`
are *unnecessary* (in the specific sense that they do not affect
the semantics of the expression), but they are not *too many*
(in the sense that they are helpful to most human readers).

The idea that "too many" and "unnecessary" mean the same thing
is your own invention.

[...]

Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Please give a reference for him saying that.� (I'll save you the
bother, he has not made any remarks remotely like this in
c.l.c. since I have been here.)

Find out what was the subject of the 99.9% (even if that was an exaggeration). Then we'll talk.

Only Tim can clarify that point, and he's made it clear that he's
not interested in doing so. Please don't complain to the rest of
us about that.

No, he didn't use the word 'machines'; I paraphrased to suggest
supernormal people who know everything and never make mistakes.

You're going to argue about this now?

Bart, when you make ridiculous and/or false statements, people are going
to argue with you. When you double down on such statements, people are
going to continue to argue with you.

Your use of the word "machines" was ridiculous and false.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, June 05, 2026 11:24:52

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Note that in a context that requires a constant expression, overflow is >>>> a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

gcc disagrees with you.

What makes you think so?

[...]

I'm skipping this and proceeding on to the original question.

Why?

You made a statement, "gcc disagrees with you". I demonstrated,
in text that you snipped, that gcc does in fact agree with me.
You were wrong. I don't know the basis of your error, so I asked.
Or maybe I'm missing something, and you had a valid point that I
didn't understand.

You're not required to answer my question, which I think was
an extremely reasonable one, but quoting it and then explicitly
refusing to answer it is pointlessly rude.

I'd like to know whether you still think you were right. If so,
I'd like to see your explanation. If not, an admission that you
made a mistake would be appreciated. But I expect neither from you.

[SNIP]

I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.

The actual text of the standard implies that 42 is not an expression.
I rely on the obvious intent to conclude that it is.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Friday, June 05, 2026 11:53:05

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[...]

The line `# 2 "hello.c"` is, according to the C standard, a
"non-directive", which is a kind of directive. Executing a
non-directive has undefined behavior,

Since it is gcc that is generating the non-directives, for
internal purposes, and gcc that is consuming them, it hardly
seems worth worrying about whether their behavior is defined
or not.

I wasn't worried. I just mentioned in in passing.

You quoted most of the article, but snipped relevant context in
the middle of a sentence.

It wasn't relevant to what I wanted to say.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Friday, June 05, 2026 20:29:16

On 05/06/2026 19:09, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 05/06/2026 08:29, David Brown wrote:

On 04/06/2026 21:29, Bart wrote:

[...]

TR:

Sadly the idea of writing in a way that is "most easily understood"
has resulted in a race to the bottom, where writers are more and
more encouraged to take the view that (some) readers are pretty
much arbitrarily stupid, with the result that expressions become
littered with scads of unnecessary parentheses that actually
detract from ease of reading. Good writing is always a balance
between too much and too little.

BC:

Actual examples of too many parentheses?

TR:

The point of my comment is that either too many or too few is a
subjective judgment, not an objective one.

Here it is clear that 'too many' was just a paraphrase of
'unnecessary'.

No, it is clear that "too many" and "unnecessary" have two different meanings.

I was replying to a comment that used "unnecessary" and "too much".
Presumably they are connected.

Maybe I should asked for examples of "too much parentheses"!

(How would you have phrased it? Bear in mind you will had the benefit of dozens of posts showing the pitfalls in this group of choosing words
that people will seize upon mercilessly.)

The idea that "too many" and "unnecessary" mean the same thing
is your own invention.

But "too much" and "unnecessary" are perfectly fine!

[...]

Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Please give a reference for him saying that.� (I'll save you the
bother, he has not made any remarks remotely like this in
c.l.c. since I have been here.)

Find out what was the subject of the 99.9% (even if that was an
exaggeration). Then we'll talk.

Only Tim can clarify that point, and he's made it clear that he's
not interested in doing so. Please don't complain to the rest of
us about that.

No, he didn't use the word 'machines'; I paraphrased to suggest
supernormal people who know everything and never make mistakes.

You're going to argue about this now?

Bart, when you make ridiculous and/or false statements, people are going
to argue with you. When you double down on such statements, people are
going to continue to argue with you.

Your use of the word "machines" was ridiculous and false.

But this statement from Tim isn't ridiculous at all:

"If someone really can't learn the rules of expression syntax for the
language they are using, they should be advised to try a different
language, or perhaps give up programming altogether. It's silly to
worry about something that 999 people out of a 1000 (and the actual
numbers are undoubtedly much higher) are able to navigate without
difficulty."

999 out of 1000? And he says 'much higher' so, what, 99999 out of 100000?

If C programmers were really that perfect, then they probably /are/
machines (ie. AI).

But I curious: why has nobody but me picked up on this exaggeration?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Chris M. Thomasson@3:633/10 to All on Friday, June 05, 2026 14:27:06

On 6/5/2026 5:58 AM, Waldek Hebisch wrote:

Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:

On 6/4/2026 4:44 PM, Bart wrote:

On 05/06/2026 00:09, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

On 04/06/2026 19:54, David Brown wrote:

[...]

Again - /please/ stop trying to guess what people say or put words >>>>>>>> in their mouths.� I can't remember ever seeing you do so accurately. >>>>>>>

This is what you actually said:

It is an objective fact, therefore, that "(a*a) + (b*b)" has more >>>>>>>> parentheses than needed in the context of most programming languages. >>>>>>>>
"(a*a) + (b*b) has too many parentheses", on the other hand, is a >>>>>>>> purely
subjective opinion.� Even if it is true that this is "commonly agreed >>>>>>>> to" (and AFAIK you have no basis for that claim), that would still >>>>>>>> be a
subjective opinion - no matter how common that opinion is.

You're saying that:

*� "more than needed" is objective
*� "too many" is subjective

Stop it.� He's not saying that.

That is EXACTLY what he's saying: "It is an OBJECTIVE fact .. has more >>>>> ... than needed", and:

� "has too many ... is ... purely subjective".

You're taking phrases out of context and making false claims that the >>>>>> full statement was far more general than it actually was.

And this is exactly what other people are doing.

Taken literally, your statement implies that you admit that that's
what you're doing.� Is that what you meant?� If so, I suggest you
*stop* making such false claims.� If not, what did you actually mean?

So I used TOO MANY instead of MORE THAN NEEDED to describe the exact >>>>> same phenomenon.

That's not the problem.� There is an actual meaningful distinction
here, between what's needed by the compiler and what's useful to
improve clarity for human readers.� I have found some of what you've
written to be unclear about that distinction.

Can we agree that the question of whether parentheses in a C
expression are necessary to the compiler can be answered objectively?
Can we agree that the question of whether extra parentheses are
helpful to a human reader is at least partly subjective, and
varies from case to case?� Is there really anything else that we
fundamentally disagree about?

(1) Why are you all making such a big fucking deal of this?

Why are you?

I didn't start this business of something being subjective or objective, >>> or suggesting than one turn of phrase to discuss the same thing was
subjective and the other objective (implying that a subjective opinion
had less worth). TR started that and several people backed him up.

Myself I wouldn't even use those terms. My point was that some overuses
of () for commonly known precedences are more overkill than others.

If that's subjective then so be it; it is not some fundamental law of
the universe. I would just call it common sense.

> Why are you?

Since you ask, I was defending my point of view then got sidetracked by
this subjective/objective nonsense. I notice that TR has disappeared
from this subthread.

Wrt the number of ()'s? Might as well go to sleep with the following
song playing in the background:

(The Fate of Ophelia - Taylor Swift (Lyrics) Charlie Puth ft. Selena
Gomez, the weekd, ariana grande)

AFAICS outer parentheses there are excessive, inner ones look OK.

That's fine. Btw, have you ever looked at some of the generated code
from the chaos pp lib?

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, June 06, 2026 03:10:20

In article <10vt7b9$pi3s$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsnl7$lkmu$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior. >>>>>

To clarify, the comments in my posting were meant to be read as >>>>>saying the given text is the entire program, and that it is strictly >>>>>conforming with respect to conforming hosted implementations. >>>>>(Incidentally, given the rules for freestanding implementations, I'm >>>>>not sure that it is even possible for any program to be strictly >>>>>conforming with respect to conforming freestanding implementations. >>>>>In any case my statements were meant only in the context of hosted >>>>>implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for >>>>>not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that >>>>>possibility.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

N3220 5.1.2.4, Program semantics.

It defines the *observable behavior* of a program, which consists of >>>accesses to volatile objects, data written to files, and I/O dynamics of >>>interactive devices.

Yes, but it does so for strictly-conforming programs with no UB.

It does so for programs in general, not just strictly conforming
ones. If a program has undefined behavior, all bets are off,
but for example a program that evaluates `printf("%d\n", INT_MAX)`
is not strictly conforming, but it's fully subject to 5.1.2.4.

To understand conformance, we have to jump over to section 4,
which explicitly says that, 'Undefined behavior is otherwise
indicated in this document by the words "undefined behavior" or
by the omission of any explicit definition of behavior.' As it
does not say that a program with an instance of undefined
behavior in an integer constant expression that is not executed
must otherwise behave in any given manner, what the program does
is undefined. A constaint violation mandates a diagnostic, but
beyond that, the standard is (AFAICT) silent.

I don't think an integer constant expression can have undefined
behavior. INT_MAX+1 and 1/0 are not constant expressions, because
neither "evaluate(s) to a constant that is in the range of
representable values for its type".

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

That's a bold claim, but I think I see why you're saying that.

The program in question, quoted above, has:

int zero = (INT_MAX+1)*0;

`(INT_MAX+1)*0` is not a constant expression, not because of the
overflow, but because a constant expression is not required in
that context. "constant-expression" is defined by a production in
the grammar (it reduces to "conditional-expression"). Even in

int n = 42;

42 is not a a constant expression, because the grammar doesn't
call for a constant expression in that context -- even though it
looks like one. Similarly, in `a + b * c`, `a + b` looks like an
additive expression, but it isn't one. (Not a perfect analogy.)

Right; I see what you mean. In this case, the
`assignment-expression` production applies, not
`constant-expression`.

Undefined Behavior, in turn, is not defined as specific only to
execution: the standard simply says that it is "behavior, upon
use of a *nonportable or erroneous program construct*..." for
which there are no requirements, and there are examples of
things that are explicitly UB at translation time, such as
improperly terminated lexemes and so forth.

Yes, there are constructs that are explicitly UB at translation time.
(I think that's unfortunate, and there are efforts to clear up some
such cases in C2y.)

It's unclear to me how it could be any other way. If UB was
_only_ an issue at runtime, then how could a compiler take
advantage of it to perform optimizations during translation?
We know that compilers do this.

Signed integer overflow is not one of those constructs.

This I'm not sure I agree with. It the compiler detects signed
integer overflow in (perhaps not relevant in _this_ example) an
integer constant expression, I still don't see anthing that
makes that anything other than UB. It's a constaint violation,
sure, but nothing says it is not also UB.

Any undefined behavior from evaluating INT_MAX+1 happens during
execution (barring constraint violations).

I'm not sure the standard says that. The standard says this
happens during _evaluation_, and that evaluation must be
performed in accordance with the rules of the abstract syntax
machine. But it doesn't precisely specify _when_ evaluation
takes place, and in particular, there are places in the standard
that explicitly mention evaluation during translation. I still
don't see anything that prohibits a compiler from evaluating
that expression at compile time (indeed, it clearly does, as it
generates a diagnostic about the overflow).

I suppose that changes the matter: does the language merely
leave that unspecified, in which case, this program is not
strictly conforming, or does it say that it _cannot_ make any
translation-time decisions about it? I cannot find a satisfying
argument for the latter.

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

It satisfies the requirements for an integer constant expression in
6.6p8, but it violates the constraint in 6.6p4. (I presume that an
"integer constant expression" must be a "constant expression".)
But since "constant-expression" is a grammatical production,
it doesn't have to satisfy that constraint, and no diagnostic
is required. (A warning is certainly permitted.)

Fair point. It's grammatical position makes it an
assignment-expression. I clearly misinterpreted that before.

Similarly, this:
int n = INT_MAX + 1;
at block scope doesn't require a diagnostic, though of course it
has undefined behavior -- but at file scope, the initializer is a
constant expression, so that would be a constraint violation.

Right. The semantics of this are defined in sec 6.7.11 para 5.

Morever, sec 6.6 para 17 says that, "the semantic rules for
evaluation of a constant expression are the same as for
nonconstant expressions." This brings us back to 5.1.2.4,
though I submit that para (4) is a stronger argument for what
you and Tim are saying, as it reads in part, "An actual
implementation is not required to evaluate part of an expression
if it can deduce that its value is not used and that no needed
side effects are produced (including any caused by calling a
function or through volatile access to an object)." I interpret
this to mean that, if the implementation can determine that
there is no way that `foo` can be called, it does not _have_ to
evaluate the above expression. However, it must satisfy the
range constraint from section 6.6, so it likely will, and in any
event, the standard does not say that it, "shall not" evaluate
it, or when.

Overflow in a constant expression is not undefined behavior. It's a >constraint violation. But that doesn't apply here, because the
initializer is not a constant expression. (Sorry if I'm repeating
myself.)

Where does it say that UB and constraint violations are mutually
exclusive? I don't see any such statement in the standard. Am
I missing it?

The standard says that if a constraint is violated, a diagnostic
must be emitted, regardless of whether or not the constraint
violation is the result of something that is UB not; that is, if
a constraint violation occurs due to something that is UB, the
implementation must still emit a diagnostic: UB is not an escape
hatch from that requirement.

It also says, 'If a "shall" or "shall not" requirement that
appears outside of a constraint or runtime-constraint is
violated, the behavior is undefined. Undefined behavior is
otherwise indicated in this document by the words "undefined
behavior" or by the omission of any explicit definition of
behavior.' However, that does not preclude such behavior being
undefined; it just means that the words "shall" and "shall not"
in a constraint violation do not a priori describe behavior vis
definition.

Once the compiler does that, if it does, and observes UB, the
standard is silent on what requirements it imposes, which means
the behavior is undefined. I see no reason it couldn't arrange
to invoke `foo` at that point.

Any UB in the program would occur during execution,

I suppose; but it's not clear to me that UB is tied _only_ to
execution time.

The standard is explicit that there _are_ things that are
evaluated at translation time, like the initializer for an
object with storage class `constexpr`. It is not clear me that
a compiler is otherwise _prohibited_ from evaluating an
expression during translation; indeed, one could imagine it
doing so to perform constant folding, and I do not believe there
exists any normative text defining it as such.

I realize this is an extreme interpretation, and not one that is
not widely shared. Personally, I think it's rather silly.

However, I that is _a_ danger of the informality of the C
specification; it does not define the semantics of the abstract
machine in the formally precise way that, say, the SML spec
defines that language's semantics. Rather, it informally
specifies them in prose, and that prose is ambiguous.

Probably much good would be done if C's semantics _were_
rigorously defined, but they are not. Thus, they are open to
radical interpretation, and as extreme as those may be, I do not
see how the normative text of the standard explicitly
_prohibits_ them.

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values
would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

Regardless, I think you highlighted an actual problem with the
spec; I don't think that behavior is _explicitly_ prohibited,
therefore, it is likely undefined, but at a minimum unspecified,
whether it actually could happen. If the argument against that
is that this renders the language essentially unusuable, then
my response is, "yeah, well, welcome to programming in C in the
2020s." Most compilers would never be that extreme, but I see
no evidence that it would not be an invalid reading of the
literal text of the standard if they did.

So no, I do not see how execution according to the rules of the
abstract machine is not guaranteed, here. I certainly see no
way in which this can be regarded as a strictly conforming
program.

foo()'s behavior would be undefined if it were called. It *isn't*
called, so there's no actual UB. The program does not violate any
of the other requirements for strict conformance.

I understand _what_ you're saying: despite the expression itself
manifesting undefined behavior, in this case it's not UB because
`foo` is never executed. What I'm saying is that I don't see
anything in the standard that restricts UB to _only_ executed
code. A reputable compiler obviously instruments `foo` with
code to trap into ubsan; if it's not UB, since it's not
executed, then why do so? Granted, that's not evidence of
anything other than the behavior of those compilers, but still.

It is clearly the _intent_ that this be a strictly conforming
program. The C standard, as an imprecise, informal document,
cannot guarantee it.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and >>>dangerous.

Two separate things. My point earlier was that code can
obviously run after `main` terminates. Moreoever, I can't
imagine what would _prevent_ a runtime system that invokes
`main` from doing something like printing, "PROGRAM STOPPED"
after `main` returned. C imposes no requirements here.

Yes, it does. An OS can print "PROGRAM STOPPED", but not as part
of the execution of the program. On my system, a shell prompt is
printed after a program terminates, but not by the program. If I
execute a "hello, world" program with its output redirected to a file
(on a system that supports that), the resulting file cannot contain
"PROGRAM STOPPED". The requirements in 5.1.2.4 specify both what
the execution of a program must do and what it must not do.

Files are a separate case. There's no guarantee that the
standard output refers to a file; it may well refer to an
"interactive device", the semantics of which are (necessarily)
unspecified.

Here's an example: consider an interactive user who uses a
screen reader device. Suppose that user makes use of an
implementation that includes runtime support for that device,
and that precedes invocation of `main` with a command sequence
causing the screen reader to (perhaps) change intonation; and
suceeds return from main by outputing another command sequence
that resets to the original state.

I do not see how C could prohibit that, assuming that the
implementation takes care to detect whether standard output
really refers to the screen reader, and does emit the control
sequences if output is redirected to a file. Another user who
runs that same program without a screen reader may see the
standard text printed on the screen, without the control
sequence sandwich.

I don't think a conforming implementation can prohibit that kind
of thing.

Whether foo() has external linkage or internal
linkage doesn't change that.

I disagree. There's no possible way for the implementation to
know whether a function with external linkage will be ultimately
invoked or not; consider a system that supports loadable shared
modules. Nothing prevents even this simple program from being
compiled as a shared module, dynamically loaded, the loading
program explicitly searching for and finding the symbol
corresponding to the `foo` function, and invoking it.

Remember that linking is translation phase 8. The compiler is not
the entire implementation.

Exactly my point. The compiler cannot know how `foo` might be
used, or how the translated object might be exercised. There's
I don't see how it could possibly know that, given that `foo`
has external linkage.

We were presented with a complete translation unit that included a
function definition for "main". It's a complete program. There's no
valid way for some other program to call foo. If OS provided such
a mechanism, it would be outside the scope of C.

Given an excessively pedantic and literal reading of the text of
the standard, I don't think an implementation is explicitly
prohibited from evaluating the initializer at translation time,
deducing that the behavior is undefined, and blaming it on the
program, at which point, all bets are off.

Hence, the compiler _must_ treat with UB as written, which is
why `ubsan` inserts trapping code in `foo`.

I don't know what "_must_ treat with UB" means.

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid. But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. It can
reject it if it can prove that it *always* has undefined behavior
during execution.

What I'm saying is that, `foo` has undefined behavior _period_.
That's manifest in an integer constant expression, whether it is
executed at runtime or not. I believe that the standard forces
the expression to be evaluated at translation time, via the
"shall" mandate when checking the constraint on the range in sec
6.6 para 4. Further, that evaluation must happen in accordance
with the rules of the abstract machine, as per 5.1.2.4 para 17.
The diagnostic is mandated, as is the translation-time
evaluation. The expression is itself manifestly exhibits UB,
and so therefore the result of the rest of the translation is
undefined.

foo is a function. foo does not have undefined behavior; it has no
behavior at all. A *call* to foo during execution has undefined
behavior. (`foo;` is a statement-expression that does nothing;
it does not have undefined behavior.)

The _evaluation_ of that expression in `foo` has undefined
behavior. The standard does not say that it _cannot_ be
evaluated at translation time.

[SNIP]

I think the question of whether the initializer is a
constant-expression or not has caused some not entirely relevant
confusion.

Here's another example that avoids that issue.

#include <limits.h>

int foo(void) {
int zero;
zero = INT_MAX;
zero ++;
zero *= 0;
return zero;
}

int main(void) {
return 0;
}

Given my grammatical argument above, I would say that this program
has no constant expressions.

Agreed, if by "constant expressions" you mean those mandated to
use the `constant-expression` grammatical production.

Whether that argument is correct or
not, it certainly has no constant expressions that violate any
constraint or that have undefined behavior. Evaluating `zero ++`
(which doesn't even pretend to be a constant expression) would have
run-time undefined behavior -- *if* foo() were ever called.

Let me turn this around in two ways: suppose that the
translation unit _only_ included `foo`. Could the compiler
deduce that the behavior of `foo`, if called, is undefined? If
not, why not?

Second, suppose that `foo` _were_ called, could the compiler
replace this with a program that was the equivalent of,
`int main(void) {printf("check your nose"); abort();}`? If so
why? If not, why not?

And given this translation unit, I don't think there's any way to
construct a multi-TU program that calls foo, so a compiler *can*
determine that foo is never called (but there's no requirement to
do so, or to make any use of that information).

This is the crux of my point, as well. There's not requirement
for the translator to _not_ evaluate the expression and become
privy to UB.

Would it be stupid if a compiler did that? Yes. Do existing
compilers do so? No, not that I'm aware of. Would some dweeb
nerd compiler douche who thinks this would make a compiler
benchmark some microfraction of a percent faster take advantage
of that? I absolutely think so, yes.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, June 06, 2026 03:22:03

In article <86bjdpayv0.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[snip]
But taking a closer look at the standard, I'm not 100% sure that the
language requires a diagnostic, though I think that's the intent.
The relevant constraint is:

Each constant expression shall evaluate to a constant that is
in the range of representable values for its type.

If I squint really hard, I can argue that the entire expression
has to be a constant expression, but it doesn't say that its
subexpressions are constant expressions -- and *if* INT_MAX +
1 evaluates to INT_MIN in the current implementation, then
(INT_MAX + 1) * 0 evaluates to 0 and therefore satisfies the
constraint.

My reasoning is as follows.

To determine if the constraint is satisfied, the compiler must
first evaluate the expression (INT_MAX + 1) * 0.

To evaluate the expression (INT_MAX + 1) * 0, the compiler must
first evaluate the sub-expression (INT_MAX + 1).

Because the expression (INT_MAX + 1) overflows, the behavior is
undefined, and the compiler is free to decide that the value of
the sub-expression (INT_MAX + 1) is, let's say, 12.

The compiler next evaluates the overall expression as 12*0, which
is 0 (an int).

This result of the overall expression satisfies the constraint,
and so the compiler is not obliged to generate a diagnostic.

The text of the standard explicitly carves this out; or, rather,
it attempts to. If the result of an expression is not
representable in the target type, _regardless of whether that's
due to UB or not_, a diagnostic is required.

But as it happens, I think I can see how your interpretation may
be valid: if, as a result of UB, the expression evaluates to "0"
(or 12 or something simiilar) that _is_ representable, then
there _is no constraint violation_ and so no diagnostic is
required.

I do not believe that that is the intent. But it _is_
conformant with the text of the standard.

This is a problem with the C standard: it is insufficiently
precise, as the semantics of the language are not formally
defined.

[snip]
I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.

The same could be said to you, as well. There exists a reading
of the standard by which your `foo`-containing program is not
strictly conforming . But that way lies madness; C is not a
formally specified language. Given that as an objective fact,
we must accept intent, consistency, and other "soft" aspects
when considering its definition.

That sort of sucks, but here we are.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, June 06, 2026 03:44:26

In article <10vu703$11s5q$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 05/06/2026 08:53, Tim Rentsch wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsrpo$men2$2@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 04/06/2026 22:06, Keith Thompson wrote:

Bart <bc@freeuk.com> writes:

[snip]
Tim Rentsch I'm sure will prefer the latter because 99.9% of C
programmers are machines, according to him.

Tim didn't say or imply that.

So what was his 99.9% all about? Nobody has a clue, except they are
certain that what I think it is is wrong!

Have you thought about, I don't know, maybe asking him?

Asking him straight questions is usually futile. You can probably guess
this from the response below.

I agree that that response was both unhelpful and hypocritical.

Notice he hasn't tried to enlighten anyone about that 99.9%.

I think my explanation was actually pretty close. YMMV.

That may just have been a throwaway line like when I say 'nobody likes
X', but I would still dispute that, if it's about what I think it is,
it's anything like a super-majority.

The point still stands. You should know your audience:
comp.lang.c is a forum that prizes a certain kind of semantic
precision. Perhaps your intent when you say things of the form,
"X has too many parentheses" is to be informal; it will
certainly not be taken that way here. And you _do_ have a track
record of being wrong enough that you are unlikely to be
afforded the benefit of the doubt.

At the risk of saying what may be obvious to everyone, Bart has
shown that he has no interest in having a serious, constructive,
useful, or productive conversation with anyone. His questions
are all rhetorical; he hasn't asked me a straight question
because he isn't really interested in what I would say. In
short, Bart isn't looking for an answer, he's looking for an
argument. My recommendation is just stop responding to him
altogether. My response to him upthread was a sincere effort to
provide a neutral and helpful answer to his question. Maybe my
remarks were helpful to other people, and if they were that's
good. Any further efforts to interact with Bart are not just a
waste of time but actually counterproductive. What Bart needs is
not help with understanding C but a good therapist. In any case
I'm confident that whatever Bart's needs may be, no one responding
to his postings here is in a position to provide them. Please
consider these remarks before responding to him further.

Generally speaking, AFAIK, none of the regular posters here are
qualified mental health professionals; as such, we should all
avoid from making armchair psychological diagnoses, the
occasionally midly offcolor joke aside ("that's crazy!").

Stick to C, Tim.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, June 06, 2026 03:45:04

In article <86jysdb1yr.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

I didn't read Bart's posting. Unfortunately it seems
true that any continued interaction with his comments
is counterproductive.

As is your response. I, for one, can conceieve of no purpose to
it other than to goad him. Do better.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, June 06, 2026 03:49:30

In article <DHAUR.47540$0o1c.29921@fx08.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1BoUR.3$lmCb.1@fx22.iad>, Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

[snip]

<snip>

Yeah, that's from `cc.c`, right?

No, it's from cpp.c

$ ls /work/reference/collegetapes/sltape/v6cc/
c0.c c00.c c01.c c02.c c03.c c04.c c05.c c1.h
c10.c c11.c c12.c c13.c c2.h c20.c c21.c cc.c cpp.c

Oh interesting. I don't have a `cpp.c` in my v6 archive.

I wonder what else I'm missing.

[snip]

Thanks! This is an artifact definitely worth preserving. As
far as I know, it's not in any of the extant V6 archives. I'll
shoot you an email, if that's ok.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Saturday, June 06, 2026 07:39:20

On 2026-06-06 05:44, Dan Cross wrote:

In article <10vu703$11s5q$1@dont-email.me>, Bart <bc@freeuk.com> wrote:

On 05/06/2026 08:53, Tim Rentsch wrote:

[...]

[...]

[...]

Generally speaking, AFAIK, none of the regular posters here are
qualified mental health professionals; as such, we should all
avoid from making armchair psychological diagnoses, the
occasionally midly offcolor joke aside ("that's crazy!").

Do we need to know about the particle physics mechanics of
H -> He fusion or Einstein's E = m c^2 to understand that
our sun is emitting energy, giving us light and warms us?

Janis :-}

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, June 05, 2026 23:50:49

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vt7b9$pi3s$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <10vsnl7$lkmu$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior. >>>>>>

To clarify, the comments in my posting were meant to be read as >>>>>>saying the given text is the entire program, and that it is strictly >>>>>>conforming with respect to conforming hosted implementations. >>>>>>(Incidentally, given the rules for freestanding implementations, I'm >>>>>>not sure that it is even possible for any program to be strictly >>>>>>conforming with respect to conforming freestanding implementations. >>>>>>In any case my statements were meant only in the context of hosted >>>>>>implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for >>>>>>not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after >>>>>>> `main` returns (though I can't imagine that would happen in real >>>>>>> life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that >>>>>>possibility.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

N3220 5.1.2.4, Program semantics.

It defines the *observable behavior* of a program, which consists of >>>>accesses to volatile objects, data written to files, and I/O dynamics of >>>>interactive devices.

Yes, but it does so for strictly-conforming programs with no UB.

It does so for programs in general, not just strictly conforming
ones. If a program has undefined behavior, all bets are off,
but for example a program that evaluates `printf("%d\n", INT_MAX)`
is not strictly conforming, but it's fully subject to 5.1.2.4.

To understand conformance, we have to jump over to section 4,
which explicitly says that, 'Undefined behavior is otherwise
indicated in this document by the words "undefined behavior" or
by the omission of any explicit definition of behavior.' As it
does not say that a program with an instance of undefined
behavior in an integer constant expression that is not executed
must otherwise behave in any given manner, what the program does
is undefined. A constaint violation mandates a diagnostic, but
beyond that, the standard is (AFAICT) silent.

I don't think an integer constant expression can have undefined
behavior. INT_MAX+1 and 1/0 are not constant expressions, because
neither "evaluate(s) to a constant that is in the range of
representable values for its type".

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

That's a bold claim, but I think I see why you're saying that.

The program in question, quoted above, has:

int zero = (INT_MAX+1)*0;

`(INT_MAX+1)*0` is not a constant expression, not because of the
overflow, but because a constant expression is not required in
that context. "constant-expression" is defined by a production in
the grammar (it reduces to "conditional-expression"). Even in

int n = 42;

42 is not a a constant expression, because the grammar doesn't
call for a constant expression in that context -- even though it
looks like one. Similarly, in `a + b * c`, `a + b` looks like an
additive expression, but it isn't one. (Not a perfect analogy.)

Right; I see what you mean. In this case, the
`assignment-expression` production applies, not
`constant-expression`.

Undefined Behavior, in turn, is not defined as specific only to
execution: the standard simply says that it is "behavior, upon
use of a *nonportable or erroneous program construct*..." for
which there are no requirements, and there are examples of
things that are explicitly UB at translation time, such as
improperly terminated lexemes and so forth.

Yes, there are constructs that are explicitly UB at translation time.
(I think that's unfortunate, and there are efforts to clear up some
such cases in C2y.)

It's unclear to me how it could be any other way. If UB was
_only_ an issue at runtime, then how could a compiler take
advantage of it to perform optimizations during translation?
We know that compilers do this.

There are instances of undefined behavior that depend on specific characteristics of a source file, not on run-time behavior.
The first example I found (N3220) is in the description of
translation phase 4, 5.1.1.2:

If a character sequence that matches the syntax of a universal
character name is produced by token concatenation (6.10.5.3), the
behavior is undefined.

That's something that can be detected during compilation. It would
be far better if it were either well defined or a syntax rule
violation. And in fact the latest C2y draft doesn't have that
wording. There's an ongoing effort to clean up this kind of thing.

That's not the kind of UB I'm talking about.

Signed integer overflow is not one of those constructs.

This I'm not sure I agree with. It the compiler detects signed
integer overflow in (perhaps not relevant in _this_ example) an
integer constant expression, I still don't see anthing that
makes that anything other than UB. It's a constaint violation,
sure, but nothing says it is not also UB.

An implementation can choose to successfully translate a program that
violates a constraint. In my opinion, the resulting program has (or
should be considered to have) undefined behavior, but the standard
doesn't explicitly say so. My argument is based on the definition
of "constraint": "restriction, either syntactic or semantic,
by which the exposition of language elements is interpreted".
If a constraint is violated, I argue that there is no basis for
interpreting the exposition of language elements, and therefore no
definition of the behavior.

Other interpretations are possible.

So if an overflow in an ICE has undefined behavior, it's merely
an instance of this more general principle, which might even not
be valid.

An unambiguous case is:

case INT_MAX+1:

That's a constraint violation. The expression is required to be an
ICE, but it doesn't "evaluate to a constant that is in the range
of representable values for its type" (unless you want to argue
that it can evaluate to INT_MIN for a particular implementation,
but I really dislike the implications of that). If there's UB,
it's because of the constraint violation. (In fact I'd expect most
compilers to reject it, so there's no behavior at all.)

On the other hand, this:

int n = INT_MAX;
n++;

has undefined behavior and is not a constraint violation. A note on the definition of "undefined behavior" says:

Possible undefined behavior ranges from ignoring the situation
completely with unpredictable results, to behaving during
translation or program execution in a documented manner
characteristic of the environment (with or without the issuance of a
diagnostic message), to terminating a translation or execution (with
the issuance of a diagnostic message).

So a compiler can reject it *if* it can prove that the undefined
behavior will always occur. The standard is not 100% clear about
whether it can be rejected if the code is never executed, or is
executed conditionally, but I think that's not permitted, or at least
it shouldn't be. Rejecting code because the compiler can't prove
the behavior is undefined has some very unpleasant implications.

Any undefined behavior from evaluating INT_MAX+1 happens during
execution (barring constraint violations).

I'm not sure the standard says that. The standard says this
happens during _evaluation_, and that evaluation must be
performed in accordance with the rules of the abstract syntax
machine. But it doesn't precisely specify _when_ evaluation
takes place, and in particular, there are places in the standard
that explicitly mention evaluation during translation. I still
don't see anything that prohibits a compiler from evaluating
that expression at compile time (indeed, it clearly does, as it
generates a diagnostic about the overflow).

I suppose that changes the matter: does the language merely
leave that unspecified, in which case, this program is not
strictly conforming, or does it say that it _cannot_ make any translation-time decisions about it? I cannot find a satisfying
argument for the latter.

Ok, given:

case INT_MAX+1:

a compiler could issue the required diagnostic for the constraint
violation as a non-fatal warning, then generate code that executes
an ADD instruction with operands INT_MAX and 1. That would be
conforming but silly. The compiler has to determine that INT_MAX+1
overflows anyway so it can issue the diagnostic.

Furthermore, the expression above is obviously an integer
constant expression as defined by sec 6.6 para 8. Section 6.6,
para 4, reads in part, "Each constant expression shall evaluate
to a constant that is in the range of representable values for
its type." The expression, `(INT_MAX+1)*0` violates this
constraint, and so therefore a diagnostic is mandated as per
sec 5.1.1.3 para 1. That it appears in code that is not
obviously called from `main` doesn't change that.

It satisfies the requirements for an integer constant expression in
6.6p8, but it violates the constraint in 6.6p4. (I presume that an >>"integer constant expression" must be a "constant expression".)
But since "constant-expression" is a grammatical production,
it doesn't have to satisfy that constraint, and no diagnostic
is required. (A warning is certainly permitted.)

Fair point. It's grammatical position makes it an
assignment-expression. I clearly misinterpreted that before.

Similarly, this:
int n = INT_MAX + 1;
at block scope doesn't require a diagnostic, though of course it
has undefined behavior -- but at file scope, the initializer is a
constant expression, so that would be a constraint violation.

Right. The semantics of this are defined in sec 6.7.11 para 5.

Morever, sec 6.6 para 17 says that, "the semantic rules for
evaluation of a constant expression are the same as for
nonconstant expressions." This brings us back to 5.1.2.4,
though I submit that para (4) is a stronger argument for what
you and Tim are saying, as it reads in part, "An actual
implementation is not required to evaluate part of an expression
if it can deduce that its value is not used and that no needed
side effects are produced (including any caused by calling a
function or through volatile access to an object)." I interpret
this to mean that, if the implementation can determine that
there is no way that `foo` can be called, it does not _have_ to
evaluate the above expression. However, it must satisfy the
range constraint from section 6.6, so it likely will, and in any
event, the standard does not say that it, "shall not" evaluate
it, or when.

Overflow in a constant expression is not undefined behavior. It's a >>constraint violation. But that doesn't apply here, because the
initializer is not a constant expression. (Sorry if I'm repeating
myself.)

Where does it say that UB and constraint violations are mutually
exclusive? I don't see any such statement in the standard. Am
I missing it?

It doesn't.

As a practical matter, when I look at C code, if it violates a
constraint, I typically don't care about its behavior. I want it
to be rejected at compile time (unless it's deliberately taking
advantage of a documented extension). I'll fix it rather than
worrying about its behavior.

(Unless the code has somehow gotten into production and it's my
job to analyze how it misbehaves.)

Yes, a program that violates a constraint can have run-time behavior if
the compiler chooses not to reject it, and that behavior may be
undefined.

The standard says that if a constraint is violated, a diagnostic
must be emitted, regardless of whether or not the constraint
violation is the result of something that is UB not; that is, if
a constraint violation occurs due to something that is UB, the
implementation must still emit a diagnostic: UB is not an escape
hatch from that requirement.

Right.

It also says, 'If a "shall" or "shall not" requirement that
appears outside of a constraint or runtime-constraint is
violated, the behavior is undefined. Undefined behavior is
otherwise indicated in this document by the words "undefined
behavior" or by the omission of any explicit definition of
behavior.' However, that does not preclude such behavior being
undefined; it just means that the words "shall" and "shall not"
in a constraint violation do not a priori describe behavior vis
definition.

Right.

Once the compiler does that, if it does, and observes UB, the
standard is silent on what requirements it imposes, which means
the behavior is undefined. I see no reason it couldn't arrange
to invoke `foo` at that point.

Any UB in the program would occur during execution,

I suppose; but it's not clear to me that UB is tied _only_ to
execution time.

The standard is explicit that there _are_ things that are
evaluated at translation time, like the initializer for an
object with storage class `constexpr`. It is not clear me that
a compiler is otherwise _prohibited_ from evaluating an
expression during translation; indeed, one could imagine it
doing so to perform constant folding, and I do not believe there
exists any normative text defining it as such.

Certainly a compiler can, but need not, evaluate any expression at
compile time if it's able to:

int n;
n = 2 + 2;

I'd be surprised to see an ADD instruction in the generated code, but
a naive compiler could certainly generate one. For that matter, a
perverse compiler could generate code that adds 3 and 1 or divides 28
by 7. Anything that implements the required *observable behavior*
(5.1.2.4 Program semantics) is acceptable. Executing an ADD
instruction is not part of the observable behavior.

I realize this is an extreme interpretation, and not one that is
not widely shared. Personally, I think it's rather silly.

However, I that is _a_ danger of the informality of the C
specification; it does not define the semantics of the abstract
machine in the formally precise way that, say, the SML spec
defines that language's semantics. Rather, it informally
specifies them in prose, and that prose is ambiguous.

There have been attempts to define C's semantics formally, but
those attempts are not part of the standard. Fully defining C's
semantics formally rather than in English would, I imagine it would
be a *lot* of work -- and fewer people would be able to understand
the specification or work on it.

Probably much good would be done if C's semantics _were_
rigorously defined, but they are not. Thus, they are open to
radical interpretation, and as extreme as those may be, I do not
see how the normative text of the standard explicitly
_prohibits_ them.

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values
would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Regardless, I think you highlighted an actual problem with the
spec; I don't think that behavior is _explicitly_ prohibited,
therefore, it is likely undefined, but at a minimum unspecified,
whether it actually could happen. If the argument against that
is that this renders the language essentially unusuable, then
my response is, "yeah, well, welcome to programming in C in the
2020s." Most compilers would never be that extreme, but I see
no evidence that it would not be an invalid reading of the
literal text of the standard if they did.

So no, I do not see how execution according to the rules of the
abstract machine is not guaranteed, here. I certainly see no
way in which this can be regarded as a strictly conforming
program.

foo()'s behavior would be undefined if it were called. It *isn't*
called, so there's no actual UB. The program does not violate any
of the other requirements for strict conformance.

I understand _what_ you're saying: despite the expression itself
manifesting undefined behavior, in this case it's not UB because
`foo` is never executed. What I'm saying is that I don't see
anything in the standard that restricts UB to _only_ executed
code. A reputable compiler obviously instruments `foo` with
code to trap into ubsan; if it's not UB, since it's not
executed, then why do so? Granted, that's not evidence of
anything other than the behavior of those compilers, but still.

Probably the compiler generated the trap code because it didn't
(yet?) know whether foo is ever called. If it were clever enough
to prove that foo is never called, it could generate no code for
it at all.

The note on the definition of undefined behavior is a bit vague.
It permits terminating a translation in response to UB, but that
doesn't address exactly when it can do so. I believe it can do so
only when it can prove that the UB always occurs, but that's not
clearly stated.

However, the behavior of the program as a whole is clearly defined.
It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.

Another argument (subject to interpretation of wording): Undefined
behavior is "behavior, **upon use** of a nonportable or erroneous
program construct or of erroneous data, for which this document
imposes no requirements". The overflowing expression within foo()
is never *used*, so there is no undefined behavior.

To put it another way, undefined behavior is behavior. Something
that never occurs is not behavior.

It is clearly the _intent_ that this be a strictly conforming
program. The C standard, as an imprecise, informal document,
cannot guarantee it.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and >>>>dangerous.

Two separate things. My point earlier was that code can
obviously run after `main` terminates. Moreoever, I can't
imagine what would _prevent_ a runtime system that invokes
`main` from doing something like printing, "PROGRAM STOPPED"
after `main` returned. C imposes no requirements here.

Yes, it does. An OS can print "PROGRAM STOPPED", but not as part
of the execution of the program. On my system, a shell prompt is
printed after a program terminates, but not by the program. If I
execute a "hello, world" program with its output redirected to a file
(on a system that supports that), the resulting file cannot contain >>"PROGRAM STOPPED". The requirements in 5.1.2.4 specify both what
the execution of a program must do and what it must not do.

Files are a separate case. There's no guarantee that the
standard output refers to a file; it may well refer to an
"interactive device", the semantics of which are (necessarily)
unspecified.

The requirements for "observable behavior" cover both files and
interactive devices.

Here's an example: consider an interactive user who uses a
screen reader device. Suppose that user makes use of an
implementation that includes runtime support for that device,
and that precedes invocation of `main` with a command sequence
causing the screen reader to (perhaps) change intonation; and
suceeds return from main by outputing another command sequence
that resets to the original state.

I do not see how C could prohibit that, assuming that the
implementation takes care to detect whether standard output
really refers to the screen reader, and does emit the control
sequences if output is redirected to a file. Another user who
runs that same program without a screen reader may see the
standard text printed on the screen, without the control
sequence sandwich.

I don't think a conforming implementation can prohibit that kind
of thing.

I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.

Whether foo() has external linkage or internal
linkage doesn't change that.

I disagree. There's no possible way for the implementation to
know whether a function with external linkage will be ultimately
invoked or not; consider a system that supports loadable shared
modules. Nothing prevents even this simple program from being
compiled as a shared module, dynamically loaded, the loading
program explicitly searching for and finding the symbol
corresponding to the `foo` function, and invoking it.

Remember that linking is translation phase 8. The compiler is not
the entire implementation.

Exactly my point. The compiler cannot know how `foo` might be
used, or how the translated object might be exercised. There's
I don't see how it could possibly know that, given that `foo`
has external linkage.

We were presented with a complete translation unit that included a
function definition for "main". It's a complete program. There's no
valid way for some other program to call foo. If OS provided such
a mechanism, it would be outside the scope of C.

Given an excessively pedantic and literal reading of the text of
the standard, I don't think an implementation is explicitly
prohibited from evaluating the initializer at translation time,
deducing that the behavior is undefined, and blaming it on the
program, at which point, all bets are off.

An implementation can certainly evaluate the initializer at
translation time, deduce that the behavior would be undefined
*if the initializer were evaluated*, and blame it on the program.
That doesn't mean it can reject a strictly conforming program.

Hence, the compiler _must_ treat with UB as written, which is
why `ubsan` inserts trapping code in `foo`.

I don't know what "_must_ treat with UB" means.

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid. But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. It can >>>>reject it if it can prove that it *always* has undefined behavior >>>>during execution.

What I'm saying is that, `foo` has undefined behavior _period_.
That's manifest in an integer constant expression, whether it is
executed at runtime or not. I believe that the standard forces
the expression to be evaluated at translation time, via the
"shall" mandate when checking the constraint on the range in sec
6.6 para 4. Further, that evaluation must happen in accordance
with the rules of the abstract machine, as per 5.1.2.4 para 17.
The diagnostic is mandated, as is the translation-time
evaluation. The expression is itself manifestly exhibits UB,
and so therefore the result of the rest of the translation is
undefined.

foo is a function. foo does not have undefined behavior; it has no >>behavior at all. A *call* to foo during execution has undefined
behavior. (`foo;` is a statement-expression that does nothing;
it does not have undefined behavior.)

The _evaluation_ of that expression in `foo` has undefined
behavior. The standard does not say that it _cannot_ be
evaluated at translation time.

If a compiler sees a subexpression INT_MAX+1 it can attempt to
evaluate it at compile time. But it can't just blindly add the
values if overflow would cause a fatal trap, crashing the compiler.
That would be a serious compiler bug. The behavior *of the compiler*
is not undefined.

[SNIP]

I think the question of whether the initializer is a
constant-expression or not has caused some not entirely relevant
confusion.

Here's another example that avoids that issue.

#include <limits.h>

int foo(void) {
int zero;
zero = INT_MAX;
zero ++;
zero *= 0;
return zero;
}

int main(void) {
return 0;
}

Given my grammatical argument above, I would say that this program
has no constant expressions.

Agreed, if by "constant expressions" you mean those mandated to
use the `constant-expression` grammatical production.

Yes, that's what I mean by it.

Whether that argument is correct or
not, it certainly has no constant expressions that violate any
constraint or that have undefined behavior. Evaluating `zero ++`
(which doesn't even pretend to be a constant expression) would have >>run-time undefined behavior -- *if* foo() were ever called.

Let me turn this around in two ways: suppose that the
translation unit _only_ included `foo`. Could the compiler
deduce that the behavior of `foo`, if called, is undefined? If
not, why not?

Certainly.

Second, suppose that `foo` _were_ called, could the compiler
replace this with a program that was the equivalent of,
`int main(void) {printf("check your nose"); abort();}`? If so
why? If not, why not?

Yes, if foo were called in every possible execution of the program,
the program's behavior would be undefined. The compiler could also
reject it.

And given this translation unit, I don't think there's any way to
construct a multi-TU program that calls foo, so a compiler *can*
determine that foo is never called (but there's no requirement to
do so, or to make any use of that information).

This is the crux of my point, as well. There's not requirement
for the translator to _not_ evaluate the expression and become
privy to UB.

I believe there is. The program is strictly conforming, which means,
among other things, that it does not produce output depending on any
undefined behavior. There is no undefined behavior because foo() is
never called.

A *strictly conforming program* shall use only those features of the
language and library specified in this document. It shall not
produce output dependent on any unspecified, undefined, or
implementation- defined behavior, and shall not exceed any minimum
implementation limit.

...

A *conforming hosted implementation* shall accept any strictly
conforming program.

An implementation that rejects the program quoted at the top of this
article is non-conforming.

Would it be stupid if a compiler did that? Yes. Do existing
compilers do so? No, not that I'm aware of. Would some dweeb
nerd compiler douche who thinks this would make a compiler
benchmark some microfraction of a percent faster take advantage
of that? I absolutely think so, yes.

And I'd submit a bug report.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, June 05, 2026 23:56:52

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

The text of the standard explicitly carves this out; or, rather,
it attempts to. If the result of an expression is not
representable in the target type, _regardless of whether that's
due to UB or not_, a diagnostic is required.

[...]

How would an expression (appearing in a context that requires an
integer constant expression) not "evaluate to a constant that is in
the range of representable values for its type" other than by UB?
I can't think of an example, but I'd be interested in seeing one.

Note in particular that UINT_MAX+1U is well defined, not an overflow.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Saturday, June 06, 2026 15:13:30

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <DHAUR.47540$0o1c.29921@fx08.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1BoUR.3$lmCb.1@fx22.iad>, Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

[snip]

<snip>

Yeah, that's from `cc.c`, right?

No, it's from cpp.c

$ ls /work/reference/collegetapes/sltape/v6cc/
c0.c c00.c c01.c c02.c c03.c c04.c c05.c c1.h
c10.c c11.c c12.c c13.c c2.h c20.c c21.c cc.c cpp.c

Oh interesting. I don't have a `cpp.c` in my v6 archive.

I wonder what else I'm missing.

[snip]

Thanks! This is an artifact definitely worth preserving. As
far as I know, it's not in any of the extant V6 archives. I'll
shoot you an email, if that's ok.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Saturday, June 06, 2026 17:53:01

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <DHAUR.47540$0o1c.29921@fx08.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1BoUR.3$lmCb.1@fx22.iad>, Scott Lurndal <slp53@pacbell.net> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

[snip]

<snip>

Yeah, that's from `cc.c`, right?

No, it's from cpp.c

$ ls /work/reference/collegetapes/sltape/v6cc/
c0.c c00.c c01.c c02.c c03.c c04.c c05.c c1.h
c10.c c11.c c12.c c13.c c2.h c20.c c21.c cc.c cpp.c

Oh interesting. I don't have a `cpp.c` in my v6 archive.

I wonder what else I'm missing.

[snip]

Thanks! This is an artifact definitely worth preserving. As
far as I know, it's not in any of the extant V6 archives. I'll
shoot you an email, if that's ok.

A a version of cpp that was used with the portable C compiler (PCC)
is here.

It has a -C option to preserve comments in the processed output.

https://github.com/IanHarvey/pcc/blob/master/cc/cpp/cpp.c

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Saturday, June 06, 2026 15:47:07

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

Right. This question came up years ago in a Defect Report. The
response from the Committee was basically the same as what you
said: the 6.6 constraints for constant expressions apply only in
situations where the C standard expressly requires a constant
expression. (I don't have the DR in front of me; I'm summarizing
based on memory, but am confident the actual wording is consistent
with what I just said.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From dave_thompson_2@3:633/10 to All on Saturday, June 06, 2026 19:02:11

On Mon, 1 Jun 2026 09:52:08 +0200, David Brown
<david.brown@hesbynett.no> wrote:

On 31/05/2026 19:11, Bart wrote:

...

Actual examples of too many parentheses?

Any source code written in LISP :-)

(And for too few parentheses, any source code in Forth.)

FORTH uses parentheses for stack diagrams -- a semi-standard type of comment/documentation -- and of course good code (using my subjective definition of good :-) ) always has sufficient documentation :-)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Saturday, June 06, 2026 16:15:05

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

PS: One yet non-considered question that was part of my original
post was: "Is there any rationale from the _software designer_'s perspective?"

I didn't respond to your original question because it was based on a misconception. Whether a given expression is a constant expression,
in the sense of needing to satisfy the constraints of 6.6, depends
not on the form of the expression but on the context in which it
appears. The 6.6 constraints apply only in situations where the C
standard expressly requires a constant expression. Other cases,
such as a use like this

int
whatever(){
int r = (int)(-1u/2) + 1;
return r;
}

do not need to satisfy the 6.6 constraints, because the C standard
doesn't require a constant expression in that context. (Note that
the initializing expression for 'r' does overflow the range of int
in implementations where UINT_MAX == INT_MAX*2.)

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Saturday, June 06, 2026 16:36:14

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

Right. This question came up years ago in a Defect Report. The
response from the Committee was basically the same as what you
said: the 6.6 constraints for constant expressions apply only in
situations where the C standard expressly requires a constant
expression. (I don't have the DR in front of me; I'm summarizing
based on memory, but am confident the actual wording is consistent
with what I just said.)

C99 DR 261 looks similar to what you're talking about.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_261.htm

The Committee Response section says:

In general, the interpretation of an expression for constantness
is context sensitive. For any expression which contains only
constants:

- If the syntax or context only permits a constant expression, the
constraints of 6.6#3 and 6.6#4 shall apply.
- Otherwise, if the expression meets the requirements of 6.6
(including any form accepted in accordance with 6.6#10), it is a
constant expression.
- Otherwise it is not a constant expression.

That's close to what I claimed, but the second bullet point differs.
My claim was that, given:

n = 2+2;

2+2 is not a constant expression because the grammar doesn't require
a constant expression in that context. The Committee's opinion
(at least at the time) was that it is a constant expression because
it meets the requirements of 6.6.

But I *think* it's a distinction without a difference. Calling 2+2
a constant expression has no effect on the semantics, and does not
require or forbid the implementation from, for example, generating
an ADD instruction. The distinction would matter for an expression
that has UB and/or does not yield a value of the type, but that
falls through to the third bullet.

I found another interesting tidbit, C90 DR 031, relevant to another
point I made elsethread:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_031.html

case (INT_MAX*4)/4: is a constraint violation.
When subclause 6.4 says on page 55, lines 11-12:
Each constant expression shall evaluate to a constant that is in
the range of representable values for its type.
the Committee's judgement of the intent is that the
``representable'' requirement applies to each subexpression of a
constant expression, as shown in the third example. A constant
expression is meant as defined by the syntax rules.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Saturday, June 06, 2026 16:43:53

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

I claim that an expression that looks like a constant expression
*isn't* a constant-expression if it doesn't appear in a context
that requires a constant-expression.

Right. This question came up years ago in a Defect Report. The
response from the Committee was basically the same as what you
said: the 6.6 constraints for constant expressions apply only in
situations where the C standard expressly requires a constant
expression. (I don't have the DR in front of me; I'm summarizing
based on memory, but am confident the actual wording is consistent
with what I just said.)

C99 DR 261 looks similar to what you're talking about.

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_261.htm

The Committee Response section says:

In general, the interpretation of an expression for constantness
is context sensitive. For any expression which contains only
constants:

- If the syntax or context only permits a constant expression, the
constraints of 6.6#3 and 6.6#4 shall apply.
- Otherwise, if the expression meets the requirements of 6.6
(including any form accepted in accordance with 6.6#10), it is a
constant expression.
- Otherwise it is not a constant expression.

That's close to what I claimed, but the second bullet point differs.
My claim was that, given:

n = 2+2;

2+2 is not a constant expression because the grammar doesn't require
a constant expression in that context. The Committee's opinion
(at least at the time) was that it is a constant expression because
it meets the requirements of 6.6.

But I *think* it's a distinction without a difference. [...]

Right. The key point is that the constraints need to be satisfied
only in situations where the C standard expressly requires a
constant expression. Whether a given expression is called a
"constant expression" doesn't matter; all that does matter is
whether the constraints need to be satisfied.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Saturday, June 06, 2026 17:41:34

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[...]

That's close to what I claimed, but the second bullet point differs.
My claim was that, given:

n = 2+2;

2+2 is not a constant expression because the grammar doesn't require
a constant expression in that context. The Committee's opinion
(at least at the time) was that it is a constant expression because
it meets the requirements of 6.6.

But I *think* it's a distinction without a difference. [...]

Right. The key point is that the constraints need to be satisfied
only in situations where the C standard expressly requires a
constant expression. Whether a given expression is called a
"constant expression" doesn't matter; all that does matter is
whether the constraints need to be satisfied.

Well, it matters a little bit, at least to me, even though the
distinction doesn't seem to affect the validity or semantics of
any C code.

A clear and unambiguous definition of what is or is not a "constant
expression" would make the language just a bit easier to understand
and explain. I'd even be satisified with the definition given in
the DR *if* it were clearly expressed in the standard.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Saturday, June 06, 2026 18:06:37

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly
conforming with respect to conforming hosted implementations.
(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly
conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted
implementations.)

[...]

foo() has undefined behavior if it's called, so replacing its
body with trapping code is valid.

Right.

But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. [...]

Right.

In your example, `foo` clearly exhibits UB; I think your
argument is whether that has a realized effect or not, since the
UB is not invoked. I'm saying that in general a compiler cannot
possibly know that when it compiles `foo`, and is free to assume
the worst.

foo() exhibits UB if and only if it's called during execution.

Right.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Sunday, June 07, 2026 13:37:35

In article <1100gbk$1lt8i$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

The text of the standard explicitly carves this out; or, rather,
it attempts to. If the result of an expression is not
representable in the target type, _regardless of whether that's
due to UB or not_, a diagnostic is required.

[...]

How would an expression (appearing in a context that requires an
integer constant expression) not "evaluate to a constant that is in
the range of representable values for its type" other than by UB?

It wouldn't. But because it's UB, it could evaluate to
anything, including something that didn't violate the
constraint.

I can't think of an example, but I'd be interested in seeing one.

In terms of a practical, working compiler? I doubt that one
exists.

Note in particular that UINT_MAX+1U is well defined, not an overflow.

Yes.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Sunday, June 07, 2026 15:09:43

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100gbk$1lt8i$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

The text of the standard explicitly carves this out; or, rather,
it attempts to. If the result of an expression is not
representable in the target type, _regardless of whether that's
due to UB or not_, a diagnostic is required.

[...]

How would an expression (appearing in a context that requires an
integer constant expression) not "evaluate to a constant that is in
the range of representable values for its type" other than by UB?

It wouldn't. But because it's UB, it could evaluate to
anything, including something that didn't violate the
constraint.

I can't think of an example, but I'd be interested in seeing one.

In terms of a practical, working compiler? I doubt that one
exists.

I actually meant in terms of the standard, not of any particular
compiler.

I can't think of an example, but maybe someone else can.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 08, 2026 02:20:51

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values
would grant the same permission.

But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

So there are two things that are at play here.

First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.

Indeed, I would go so far as to suggest that _most_ instances of
UB are detected and used (by the translator) during translation.

So to say that, "this program doesn't have UB because the
statement that contains UB is never executed" doesn't make a lot
of sense to me. It would be closer to being correct if one said
"this program is unaffected by UB since the expression that has
UB is never evaluated when the program executes": again, in this
case (as, I suspect, in most cases) the UB simply _is_: the
expression `INT_MAX + 1` does not become well-defined just
because it is never executed.

Second, there's this notion that the standard is just
underspecified with respect to these matters, specifically, it
does not _prohibit_ a translation from implementing an emulator
for the abstract machine that evaluates code at translation
time. Indeed, I suspect that _most_ compilers do something
largely analogous to that; that's how they detect UB so that
they can take advantage of it when optimizing. But if that's
the case, then nothing prohibits them from relieving themselves
of their obligation to follow the standard once they observe
that some bit of code has UB.

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

foo()'s behavior would be undefined if it were called. It *isn't* >>>called, so there's no actual UB. The program does not violate any
of the other requirements for strict conformance.

I understand _what_ you're saying: despite the expression itself
manifesting undefined behavior, in this case it's not UB because
`foo` is never executed. What I'm saying is that I don't see
anything in the standard that restricts UB to _only_ executed
code. A reputable compiler obviously instruments `foo` with
code to trap into ubsan; if it's not UB, since it's not
executed, then why do so? Granted, that's not evidence of
anything other than the behavior of those compilers, but still.

Probably the compiler generated the trap code because it didn't
(yet?) know whether foo is ever called. If it were clever enough
to prove that foo is never called, it could generate no code for
it at all.

The note on the definition of undefined behavior is a bit vague.
It permits terminating a translation in response to UB, but that
doesn't address exactly when it can do so. I believe it can do so
only when it can prove that the UB always occurs, but that's not
clearly stated.

Exactly. That it's not clearly stated makes be believe that it
is open to interpretation.

However, the behavior of the program as a whole is clearly defined.

Is it? I am unable to locate where the standard _actually says
that it is_. That is my whole point.

It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.

I have yet to find or be shown a way in which the standard
actually guarantees that.

Another argument (subject to interpretation of wording): Undefined
behavior is "behavior, **upon use** of a nonportable or erroneous
program construct or of erroneous data, for which this document
imposes no requirements". The overflowing expression within foo()
is never *used*, so there is no undefined behavior.

To put it another way, undefined behavior is behavior. Something
that never occurs is not behavior.

And yet the standard does not say that. That is an
interpretation; I assume it is universally shared, but if we
want to limit ourselves to what the standard _actually says_ it
is woefully underspecified in this regard.

There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied. We're well past that; now it's
a vehicle for compiler writers to make benchmarks faster, but is
(generally) hostile to programmers. A lot of hay is made about
it in this group, but at the core, it's just (ironically) not
well-defined.

It is clearly the _intent_ that this be a strictly conforming
program. The C standard, as an imprecise, informal document,
cannot guarantee it.

If the usual "Hello, world" program prints "Hello, world" followed
by "Goodbye", the implementation is non-conforming. If it formats
my hard drive after printing "Goodbye", it's non-conforming and >>>>>dangerous.

Two separate things. My point earlier was that code can
obviously run after `main` terminates. Moreoever, I can't
imagine what would _prevent_ a runtime system that invokes
`main` from doing something like printing, "PROGRAM STOPPED"
after `main` returned. C imposes no requirements here.

Yes, it does. An OS can print "PROGRAM STOPPED", but not as part
of the execution of the program. On my system, a shell prompt is
printed after a program terminates, but not by the program. If I
execute a "hello, world" program with its output redirected to a file
(on a system that supports that), the resulting file cannot contain >>>"PROGRAM STOPPED". The requirements in 5.1.2.4 specify both what
the execution of a program must do and what it must not do.

Files are a separate case. There's no guarantee that the
standard output refers to a file; it may well refer to an
"interactive device", the semantics of which are (necessarily)
unspecified.

The requirements for "observable behavior" cover both files and
interactive devices.

Ok, but irrelevant.

Here's an example: consider an interactive user who uses a
screen reader device. Suppose that user makes use of an
implementation that includes runtime support for that device,
and that precedes invocation of `main` with a command sequence
causing the screen reader to (perhaps) change intonation; and
suceeds return from main by outputing another command sequence
that resets to the original state.

I do not see how C could prohibit that, assuming that the
implementation takes care to detect whether standard output
really refers to the screen reader, and does emit the control
sequences if output is redirected to a file. Another user who
runs that same program without a screen reader may see the
standard text printed on the screen, without the control
sequence sandwich.

I don't think a conforming implementation can prohibit that kind
of thing.

I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.

Exactly. It could also emit the string, "GOODBYE WORLD."

[snip for size]
Given an excessively pedantic and literal reading of the text of
the standard, I don't think an implementation is explicitly
prohibited from evaluating the initializer at translation time,
deducing that the behavior is undefined, and blaming it on the
program, at which point, all bets are off.

An implementation can certainly evaluate the initializer at
translation time, deduce that the behavior would be undefined
*if the initializer were evaluated*, and blame it on the program.
That doesn't mean it can reject a strictly conforming program.

This is circular reasoning. You're saying that something that
is provably UB in this program cannot prevent that program from
being strictly confirming because the program is strictly
confirming.

This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.

Does any compiler actually do this? No, probably not. Does the
standard explicitly prevent it? I haven't seen an argument for
that does not rely on either history or a subjective
interpretation.

foo is a function. foo does not have undefined behavior; it has no >>>behavior at all. A *call* to foo during execution has undefined >>>behavior. (`foo;` is a statement-expression that does nothing;
it does not have undefined behavior.)

The _evaluation_ of that expression in `foo` has undefined
behavior. The standard does not say that it _cannot_ be
evaluated at translation time.

If a compiler sees a subexpression INT_MAX+1 it can attempt to
evaluate it at compile time. But it can't just blindly add the
values if overflow would cause a fatal trap, crashing the compiler.
That would be a serious compiler bug. The behavior *of the compiler*
is not undefined.

I did not say that the behavior of the _compiler_ is undefined.

I said that a translator is not prohibited from evaluating the
expression at translation time, observing that the behavior is
undefined, and erroring out. There is no reason a translator
cannot include a simple emulator for the abstract machine as
specified in the standard for that purpose; it's behavior would
not be undefined, but it could detect undefined behavior.

[SNIP]
I think the question of whether the initializer is a
constant-expression or not has caused some not entirely relevant >>>confusion.

Here's another example that avoids that issue.

#include <limits.h>

int foo(void) {
int zero;
zero = INT_MAX;
zero ++;
zero *= 0;
return zero;
}

int main(void) {
return 0;
}

Given my grammatical argument above, I would say that this program
has no constant expressions.

Agreed, if by "constant expressions" you mean those mandated to
use the `constant-expression` grammatical production.

Yes, that's what I mean by it.

Whether that argument is correct or
not, it certainly has no constant expressions that violate any
constraint or that have undefined behavior. Evaluating `zero ++`
(which doesn't even pretend to be a constant expression) would have >>>run-time undefined behavior -- *if* foo() were ever called.

Let me turn this around in two ways: suppose that the
translation unit _only_ included `foo`. Could the compiler
deduce that the behavior of `foo`, if called, is undefined? If
not, why not?

Certainly.

Ok, so in that case, would we say that "`foo` has undefined
behavior?" The qualification, "...if called" seems superfluous,
and I don't see anything in the standard that explicitly
disagrees.

Second, suppose that `foo` _were_ called, could the compiler
replace this with a program that was the equivalent of,
`int main(void) {printf("check your nose"); abort();}`? If so
why? If not, why not?

Yes, if foo were called in every possible execution of the program,
the program's behavior would be undefined. The compiler could also
reject it.

UB can time-travel, however. Because it's undefined, the
compiler is free to assume that it never executes, or that it
always executes.

And given this translation unit, I don't think there's any way to >>>construct a multi-TU program that calls foo, so a compiler *can* >>>determine that foo is never called (but there's no requirement to
do so, or to make any use of that information).

This is the crux of my point, as well. There's not requirement
for the translator to _not_ evaluate the expression and become
privy to UB.

I believe there is. The program is strictly conforming, which means,
among other things, that it does not produce output depending on any >undefined behavior. There is no undefined behavior because foo() is
never called.

You _say_ the program is stictly conforming. The brunt of what
I am saying is that I do not believe that the text of the
standard actually guarantees that. It is an assumption. Not an
unreasonable one, mind, but it's not guaranteed.

A *strictly conforming program* shall use only those features of the
language and library specified in this document. It shall not
produce output dependent on any unspecified, undefined, or
implementation- defined behavior, and shall not exceed any minimum
implementation limit.

...

A *conforming hosted implementation* shall accept any strictly
conforming program.

An implementation that rejects the program quoted at the top of this
article is non-conforming.

So any program that produces no output at all is strictly
conforming? Then what about this?

#include <limits.h>

int
zero(void)
{
return (INT_MAX + 1) * 0;
}

int
main(void)
{
(void)zero();
return 0;
}

This program produces no output, yet clearly executes a function
that contains an expression that induces undefined behavior when
evaluated. I suppose an argument could be made that it _might_
generate output due to UB, as UB imposes no requirements Not to
do so, so perhaps the _absence_ of output depends on UB.

Would it be stupid if a compiler did that? Yes. Do existing
compilers do so? No, not that I'm aware of. Would some dweeb
nerd compiler douche who thinks this would make a compiler
benchmark some microfraction of a percent faster take advantage
of that? I absolutely think so, yes.

And I'd submit a bug report.

I would go further and chuck that compiler in the trashcan.

However, I can find no normative textin the standard preventing
it.

In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.

I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 08, 2026 02:33:58

In article <1104q77$2qkh5$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100gbk$1lt8i$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

The text of the standard explicitly carves this out; or, rather,
it attempts to. If the result of an expression is not
representable in the target type, _regardless of whether that's
due to UB or not_, a diagnostic is required.

[...]

How would an expression (appearing in a context that requires an
integer constant expression) not "evaluate to a constant that is in
the range of representable values for its type" other than by UB?

It wouldn't. But because it's UB, it could evaluate to
anything, including something that didn't violate the
constraint.

I can't think of an example, but I'd be interested in seeing one.

In terms of a practical, working compiler? I doubt that one
exists.

I actually meant in terms of the standard, not of any particular
compiler.

I can't think of an example, but maybe someone else can.

[...]

Oh. Well, I suppose something that relied on _IB_, like
conversion from a large unsigned integer type to a smaller
signed integer type that led to a trap, might fall into that
category.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Sunday, June 07, 2026 22:34:52

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[...]

But (I'm reasonably sure that)
an implementation cannot reject a program just because it can't
prove that it has no undefined behavior during execution. [...]

Right.

Expanding on that, there is no requirement even to try to
prove such a conjecture. An implementation could simply
give a warning like "there may be undiagnosed constraint
violations in this compilation", and accept the TU no
matter what (except of course for the dreaded #error
preprocessing directive, which if encountered in a live
portion of the translation must result in a rejection).

I presume none of what I'm saying here is news to the usual
suspects; mostly I'm saying it just to remind myself.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Monday, June 08, 2026 00:16:43

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86bjdpayv0.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[snip]
But taking a closer look at the standard, I'm not 100% sure that the
language requires a diagnostic, though I think that's the intent.
The relevant constraint is:

Each constant expression shall evaluate to a constant that is
in the range of representable values for its type.

If I squint really hard, I can argue that the entire expression
has to be a constant expression, but it doesn't say that its
subexpressions are constant expressions -- and *if* INT_MAX +
1 evaluates to INT_MIN in the current implementation, then
(INT_MAX + 1) * 0 evaluates to 0 and therefore satisfies the
constraint.

My reasoning is as follows.

To determine if the constraint is satisfied, the compiler must
first evaluate the expression (INT_MAX + 1) * 0.

To evaluate the expression (INT_MAX + 1) * 0, the compiler must
first evaluate the sub-expression (INT_MAX + 1).

Because the expression (INT_MAX + 1) overflows, the behavior is
undefined, and the compiler is free to decide that the value of
the sub-expression (INT_MAX + 1) is, let's say, 12.

The compiler next evaluates the overall expression as 12*0, which
is 0 (an int).

This result of the overall expression satisfies the constraint,
and so the compiler is not obliged to generate a diagnostic.

The text of the standard explicitly carves this out; or, rather,
it attempts to. If the result of an expression is not
representable in the target type, _regardless of whether that's
due to UB or not_, a diagnostic is required.

That's right. However, the key point here is that it is the
implementation that determines (following the semantic rules given
in the C standard) what the value is, and so whether the value is
representable in the type of the expression. Because of the
undefined behavior present in the expression in question, the
implementation is free to choose a value that /is/ representable,
and of the appropriate type, in which case no diagnostic is
required.

But as it happens, I think I can see how your interpretation may
be valid: if, as a result of UB, the expression evaluates to "0"
(or 12 or something simiilar) that _is_ representable, then
there _is no constraint violation_ and so no diagnostic is
required.

Right. In fact the reasoning doesn't have to be so elaborate.
Just by looking at the types of the operands, a compiler can
determine the result is type int. Then, just by noticing the
multiplication by 0, a compiler could decide that the result is
zero, because the compiler is free to assume that there was no
undefined behavior in the left-hand side expression. Whether the
left-hand size expression has undefined behavior doesn't even have
to be checked to decide that (INT_MAX+1)*0 is 0, and so it can
satisfy the constraints of an integer constant expression.

I do not believe that that is the intent. But it _is_
conformant with the text of the standard.

I think your intuition is leading you astray. The people who
wrote the C standard have gone to great lengths to say (write)
what they mean and mean what they say (write). I don't see any
evidence to suggest that this property doesn't apply in the
situation being discussed.

This is a problem with the C standard: it is insufficiently
precise, as the semantics of the language are not formally
defined.

On the contrary, the C standard is quite precise: when a program
construct is encountered that the Standard identifies as having
undefined behavior, the Standard IMPOSES NO REQUIREMENTS on what
behavior may result. That rule may not be what someone wants, but
there isn't any question about what is allowed, which is anything
at all.

[snip]
I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.

The same could be said to you, as well. There exists a reading
of the standard by which your `foo`-containing program is not
strictly conforming .

The difference is my reading is based on what the C standard
actually says, and not on any guesses about "intent" or whether
the result "makes sense". When reading the C standard it's
important to develop the habit of reading the text as neutrally as
one can, and not inject any subconscious ideas about what it ought
to be saying.

But that way lies madness; C is not a
formally specified language. Given that as an objective fact,
we must accept intent, consistency, and other "soft" aspects
when considering its definition.

The C standard is not written in formal mathematical language, but
it is written in formal English. Certainly there are places in
the standard where what is said does a poor job of conveying what
is expected. But the particular case of (INT_MAX+1)*0 is not one
of them.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 08, 2026 12:41:11

In article <8633yxa4dg.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

[snip]
But as it happens, I think I can see how your interpretation may
be valid: if, as a result of UB, the expression evaluates to "0"
(or 12 or something simiilar) that _is_ representable, then
there _is no constraint violation_ and so no diagnostic is
required.

Right. In fact the reasoning doesn't have to be so elaborate.
Just by looking at the types of the operands, a compiler can
determine the result is type int. Then, just by noticing the
multiplication by 0, a compiler could decide that the result is
zero, because the compiler is free to assume that there was no
undefined behavior in the left-hand side expression. Whether the
left-hand size expression has undefined behavior doesn't even have
to be checked to decide that (INT_MAX+1)*0 is 0, and so it can
satisfy the constraints of an integer constant expression.

I understand why you are saying this. The relevant part of the
syntax is,

multiplicative-expression:
...
multiplicative-expression * cast-expression
...

Since UB imposes no requirements of any kind, the translator is
free to assume that the evaluation of the
`multiplicative-expression` component is well-defined, and
then multiplication by a `cast-expression` that evaluates to 0
is just 0.

But it is a good exercise to back that up by the letter of the
standard. Recall that in this context, Keith was talking about
`case` labels.

The grammar given in the standard explicitly says that, in that
position, the expression must be a `constant-expression` (sec
6.8.2 ["Labeled statements"] para 1, "Syntax"). Constant
expressions must be evaluated in accordance with the semantic
rules of the abstract machine (sec 6.6 para 17), the rules for
which are spelled out in sec 5.1.2.4, specifically para 1, which
read, "The semantic descriptions in this document describe the
behavior of an abstract machine in which issues of optimization
are irrelevant" and para 4, which says that "all expressions
are evaluated as specified by the semantics."

Para (4) goes on to say, "an actual implementation is not
required to evaluate part of an expression if it can deduce that
its value is not used and that no needed side effects are
produced (including any caused by calling a function or through
volatile access to an object)." So a translator is free to
replace, e.g., `(2 + 2)*0)` with 0.

But that doesn't mean that the presence of UB in this context is
insignificant; in fact, this entire interpretation rests on it.

I do not believe that that is the intent. But it _is_
conformant with the text of the standard.

I think your intuition is leading you astray. The people who
wrote the C standard have gone to great lengths to say (write)
what they mean and mean what they say (write). I don't see any
evidence to suggest that this property doesn't apply in the
situation being discussed.

This is not an arugment; it's an assertion, based on your own
intuition and your feelings about the text of the standard and
the level of precision you assume it takes.

In this specific context, Keith raised a valid point: how can
the constraint mentioned in sec 6.6 para 4 ever be violated
_without_ UB? And since C imposes no requirement _at all_ with
respect to how a translator assesses the behavior of evaluating
a UB-bearing expression, then how can the diagnostic requirement
for a constraint violation ever be fulfilled here?

My example is this:

constexpr int A = ~0U;

The type of the rhs is `int` and the value is not representable
in a signed int. As expected, this fails to compile, with a
diagnostic about the constraint violation:

```
term% cc -std=c23 -c constraint.c
constraint.c:1:19: error: constexpr initializer evaluates to 4294967295 which is not exactly representable in type 'const int'
1 | constexpr int A = ~0U;
| ^
1 error generated.
term%
```

Adding an `(int)` cast takes advantage of IB, but allows the
program to compile:

```
term% cat constraint.c
constexpr int A = (int)~0U;
term% clang -Werror -pedantic -std=c23 -c constraint.c
term%
```

This is a problem with the C standard: it is insufficiently
precise, as the semantics of the language are not formally
defined.

On the contrary, the C standard is quite precise:

Consider that your definition of "precise" may not be shared.
This is an example of your own subjectivity.

when a program
construct is encountered that the Standard identifies as having
undefined behavior, the Standard IMPOSES NO REQUIREMENTS on what
behavior may result. That rule may not be what someone wants, but
there isn't any question about what is allowed, which is anything
at all.

My statement, that you quoted and responded to, was not limited
to undefined behavior. Rather, it was a general statement about
the imprecision of the C standard.

For whatever reason you have chosen to take that general
statement, and respond to it by posting about one thing that is
clearly defined in the standard, and that further, I believe
every participant in this discussion agrees on. But clarity and
precision on a single point does not mean that the entire
standard, taken as a whole, is similarly precise and clear.

In fact, what I have been arguing all along is exactly what you
wrote above: the standard imposes _no requirements_ on the
resultant behavior of a program when a translator encounters
a program construct (for example, something like `INT_MAX + 1`,
whether immediately multiplied by zero or not) in the course of
translating that program. The C standard does _not_ explicitly
say otherwise.

Unfortunately, the C standard is simply not a precise, formal
document. This is well-known, and it's hardly C's fault: indeed
most of the applications of formalized descriptions of PL
semantics to practical programming languages postdates C's
invention; Dana Scott didn't introduce the term, "operational
semantics" until 1970, and it didn't start to make a serious
impact on languages until later.

That you would limit that statement to UB only betrays a lack of
understanding of what you responded to. Whether that is my
fault or yours, I don't know.

[Note: I feel obliged to say that this is not the fault of the C
committee, Dennis Ritchie, Ken Thompson, Brian Kernighan, PJ
Plauger, Jean-Heyd Meneide, or anyone else; rather, it is an
unfortunate consequence of history, and one that cannot
reasonably be corrected.]

[snip]
I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.

The same could be said to you, as well. There exists a reading
of the standard by which your `foo`-containing program is not
strictly conforming .

The difference is my reading is based on what the C standard
actually says, and not on any guesses about "intent" or whether
the result "makes sense". When reading the C standard it's
important to develop the habit of reading the text as neutrally as
one can, and not inject any subconscious ideas about what it ought
to be saying.

Your reading of the standard is subjective and, as far as I can
tell, based on your own intuitions and presumptions of intent.
Indeed we see direct evidence of this in the message I quoted
above, where you ascribe a certain type of formality to that
document, that is not warranted. Notice that your response to
my post about _an_ intepretation of that document is not to
point out how the text invalidates that reading, but rather, an
admonission on how to read the standard.

You would do well to take a moment and examine your own
preconceptions in how you read that document and approach the C
langauge.

But that way lies madness; C is not a
formally specified language. Given that as an objective fact,
we must accept intent, consistency, and other "soft" aspects
when considering its definition.

The C standard is not written in formal mathematical language, but
it is written in formal English.

Correct.

The C standard is written in the formal _register_; that is a
matter of voice, but is dramatically different than a formal
document with rigorously defined semantics. It is full of terms
of art, and things that are imprecisely and informally specified
in prose. The C standard strives, but sadlyfalls short in many
places.

Plenty of documents are written in a formal register and are
still ambiguous and imprecise. That is one of the problems with
trying to define something like a programming language precisely
in prose.

Gogol wrote a rather famous story about a non-existent officer
who was accidentally manifested by a missing comma. He didn't
exist, yet rose to become one of the Czar's favorites, as he
never made a mistake (since he did not exist).

Perhaps you will meditate on the implied analogy.

Certainly there are places in
the standard where what is said does a poor job of conveying what
is expected. But the particular case of (INT_MAX+1)*0 is not one
of them.

This is a strawman. You are correct that the standard is clear
as to the meaning, or more precisely lack thereof, of
`(INT_MAX + 1)*0`, when considered as an isolated expression.
What is much less clear are the implications of that when
translated.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Monday, June 08, 2026 08:35:42

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Note that in a context that requires a constant expression, overflow is >>>>> a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

gcc disagrees with you.

What makes you think so?

[...]

I'm skipping this and proceeding on to the original question.

Why?

gcc is not authoritative. I didn't want to get into an argument
about whether gcc is conforming, or which version of gcc was used,
or any similar distractions. The C standard /is/ authoritative,
and I thought it would save time to cut to the chase.

You made a statement, "gcc disagrees with you". I demonstrated,
in text that you snipped, that gcc does in fact agree with me.

No, you didn't.

You were wrong.

No, I wasn't. Your testing was faulty.

I don't know the basis of your error, so I asked.
Or maybe I'm missing something, and you had a valid point that I
didn't understand.

I'm offended that you think I have an obligation to remedy your
habit of lazy thinking, especially when as here the answer was
staring you right in the face, and you simply ignored it.

You're not required to answer my question, which I think was
an extremely reasonable one, but quoting it and then explicitly
refusing to answer it is pointlessly rude.

I wasn't refusing to answer. What I was doing was trying to
answer the original question, and answer it in a way that wouldn't
get lost in pointless bickering. Silly me.

I'd like to know whether you still think you were right. If so,
I'd like to see your explanation. If not, an admission that you
made a mistake would be appreciated. But I expect neither from you.

I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.

In investigating this question, I have run compilations using
multiple versions of gcc, on two different platforms. I have looked
carefully through the gcc man page. I have also run compilations
using multiple versions of clang, on two different platforms. After
doing all that, I ran compilations using godbolt, so I could check
the latest, or maybe almost latest, versions of gcc and clang. All
the different versions of gcc and clang that I have tried support my
hypothesis that gcc (and now also clang) interpret the C standard so
as to conclude that conforming to the C standard need not require a
diagnostic for situations like the code under discussion..

I'd like to ask you to do two things. First, read through the
reasoning given in my previous post, try to assess whether that
reasoning is sound, and post the results of yours contemplations.
Second, look again at the question of whether gcc (and also clang,
if you're up to it) support the hypothesis that a conforming
implementation need not give a diagnostic for code like that under
discussion. See if you can find a way of framing the question that
supports my statement, rather than simply looking for one that
supports your preconceived ideas. Post the results of your
investigations, both what other experiments you tried, and what your
assessment is of the results you got.

Do these two things and I will endeavor to explain my views on the
questions you have raised here, if such explanations are still
needed after your further examinations and comments.

[SNIP]

I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.

The actual text of the standard implies that 42 is not an expression.
I rely on the obvious intent to conclude that it is.

Now it is you who is changing the subject. Besides not being on
point to the question being considered, it's a silly argument, and I
would hope you are smart enough to realize that. However, if you do
what I have asked in the previous paragraph, I can try to explain
why I think your views on this unrelated matter are wrongheaded.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 08, 2026 17:33:42

In article <86y0gp82pd.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Note that in a context that requires a constant expression, overflow is >>>>>> a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

gcc disagrees with you.

What makes you think so?

[...]

I'm skipping this and proceeding on to the original question.

Why?

gcc is not authoritative.

You, Tim, wrote the words, "gcc disagrees with you."

If you didn't want to bring GCC into it, because it is not
authoritative (which is true), then why did you mention it in
the first place?

I didn't want to get into an argument
about whether gcc is conforming, or which version of gcc was used,
or any similar distractions.

You opened that door and walked through it.

The C standard /is/ authoritative,
and I thought it would save time to cut to the chase.

Then you should have done that from the start, and not mentioned
GCC.

[snip]
I'd like to know whether you still think you were right. If so,
I'd like to see your explanation. If not, an admission that you
made a mistake would be appreciated. But I expect neither from you.

I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.

I am mystified as to why you are bringing my name into this, and
why you think "I may not like the consequences", or even what
that means. In any event, you are evidently laboring under some
assumption about what I think about this matter that is probably
incorrect.

Because I am not you, I cannot know this for a fact, let alone
why it may be. Regardless, I suggest you don't do that, or at a
minimum seek clarity from the referent of your assumptions,
before making claims about they may think.

In investigating this question, I have run compilations using
multiple versions of gcc, on two different platforms. I have looked >carefully through the gcc man page. I have also run compilations
using multiple versions of clang, on two different platforms. After
doing all that, I ran compilations using godbolt, so I could check
the latest, or maybe almost latest, versions of gcc and clang. All
the different versions of gcc and clang that I have tried support my >hypothesis that gcc (and now also clang) interpret the C standard so
as to conclude that conforming to the C standard need not require a >diagnostic for situations like the code under discussion..

It appears that you are appealing to a certain kind of semantic
precision, that is itself based on a number of assumptions that
are unstated, but that are implicit in your writing. Further,
you give every indication of believing that a reader should
simply intuitively know.

In fact, both GCC and clang (the versions I tried on the
platforms I tried on) emit a diagnostic for the code under
consideration. Your assertion appears to be that that is
unrelated to the constraint in section 6.6 para 4, which seems
accurate.

But you did not say that: instead, you just made a vague
statement that "gcc disagrees with you." That's not useful, and
no one can reasonably know what you meant unless you elaborated
on it.

When it was pointed out to you that in fact GCC generates a
diagnostic, you had an opportunity to clarify that it was not in
response to the aforementioned constraint violation. You chose
not to do so, and instead of arrogantly accuse others of
laziness and a lack of willingness to understand.

Insisting that your readers adhere to some arbitrary level of
semantic precision you seem to fancy yourself expressing is not
actually a sign of true expertise. Real expertise is most
readily demonstrated through effective communication.

I'd like to ask you to do two things. First, read through the
reasoning given in my previous post, try to assess whether that
reasoning is sound, and post the results of yours contemplations.

Second, look again at the question of whether gcc (and also clang,
if you're up to it) support the hypothesis that a conforming
implementation need not give a diagnostic for code like that under >discussion. See if you can find a way of framing the question that
supports my statement, rather than simply looking for one that
supports your preconceived ideas. Post the results of your
investigations, both what other experiments you tried, and what your >assessment is of the results you got.

Do these two things and I will endeavor to explain my views on the
questions you have raised here, if such explanations are still
needed after your further examinations and comments.

It is rather cavalier to make imperative statements to others
regarding how they must spend their time.

[SNIP]

I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.

The actual text of the standard implies that 42 is not an expression.
I rely on the obvious intent to conclude that it is.

Now it is you who is changing the subject. Besides not being on
point to the question being considered, it's a silly argument, and I
would hope you are smart enough to realize that. However, if you do
what I have asked in the previous paragraph, I can try to explain
why I think your views on this unrelated matter are wrongheaded.

Is it a silly argument?

Perhaps Keith has some reason for suggesting that such an
interpretation is be valid. I'm not aware of what that might
be, but I suspect you are not, either. But without even knowing
what the argument is, how would you know?

You are the one admonishing others to look at the letter of the
standard ("My conclusions are based on what the C standard
actually says..."), yet here you dismiss as "a silly argument",
a thing brought up by someone who has demonstrated that they
generally know what they're talking about, and you have done so
without even bothering to ask what they might be refering to.

In fact, I think this fits a pattern of behavior I observe from
you fairly consistently. You decide on an interpretation,
declare it correct, and appear to scoff at anyone else who does
not immediately share that interpretation as being "lazy" or
worse.

Ironically, you yourself do not do well when you are shown to be
wrong about something; cf your bizarre statement about Rust not
being strongly typed. This does not do well for your
credibility; everyone makes mistakes now and again, and you are
no different, but your seeming inability to admit to it when it
is obvious decreases faith in your interpretations when they are
not obvious.

You would do well to express more humility, and consider how
others might perceive you based on the way you talk to them.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 08, 2026 17:37:52

In article <1106d97$huo$1@reader1.panix.com>,
Dan Cross <cross@spitfire.i.gajendra.net> wrote:

My example is this:

constexpr int A = ~0U;

The type of the rhs is `int` and the value is not representable

*sigh* "The type of the rhs is `unsigned int` and the value is
not representable in a `signed int`.

Perhaps,

constexpr int A = (unsigned int)INT_MAX + 1;

...is an even better example.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 08, 2026 12:39:04

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

I disagree. That's not a sensible interpretation of what the
standard says.

A call to a foo() would have undefined behavior if it occurred. There
is no call to foo().

Similarly:

int a = ..., b = ...;
int c;
if (b != 0) {
c = a / b;
}
else {
c = 0;
}

A division by zero would have undefined behavior if it occurred,
but it never occurs. A compiler cannot reject the above code
because of UB that never happens.

[...]

It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.

I have yet to find or be shown a way in which the standard
actually guarantees that.

How does the standard guarantee *anything*?

This strictly conforming program:

int main(void) { return 0; }

when executed returns a status of 0 from main and does nothing else.
Adding an uncalled function to the same source file doesn't change
that.

[...]

There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied. We're well past that; now it's
a vehicle for compiler writers to make benchmarks faster, but is
(generally) hostile to programmers. A lot of hay is made about
it in this group, but at the core, it's just (ironically) not
well-defined.

The standard does say what UB is meant for. It says what UB
*is*, and what constructs lead to it (by omission in some cases).
Any optimization tricks played by compiler implementers must be
based on that specification.

[...]

I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.

Exactly. It could also emit the string, "GOODBYE WORLD."

No, it couldn't. It must emit "hello, world\n" in some form.
It must emit the character 'h' as represented in the execution
character set, followed by 'e', and so on.

[...]

This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.

It's not UB if it's never called. Behavior that doesn't happen is
not behavior.

I did not presuppose that the program is strictly conforming.
I read the source code and determined that it meets the standard's
definition of a strictly conforming program.

[...]

Ok, so in that case, would we say that "`foo` has undefined
behavior?" The qualification, "...if called" seems superfluous,
and I don't see anything in the standard that explicitly
disagrees.

The qualification "if called" is the whole point.

[...]

UB can time-travel, however. Because it's undefined, the
compiler is free to assume that it never executes, or that it
always executes.

"UB can time-travel" is perhaps an oversimplification. An example is
a bug that occurred in the Linux kernel, something like:

void func(int *ptr) {
do_something_with(*ptr);
if (ptr != NULL) {
blah();
}
}

The compiler, on seeing the expression `*ptr`, assumed that `ptr` is
not null, and elided the test on the following line.

But even assuming that's valid, a compiler absolutely cannot assume that
an instance UB always executes when, according to the semantics of the
program, it provably never executes.

[...]

So any program that produces no output at all is strictly
conforming? Then what about this?

#include <limits.h>

int
zero(void)
{
return (INT_MAX + 1) * 0;
}

int
main(void)
{
(void)zero();
return 0;
}

That's an interesting point. A more terse example:

#include <limits.h>
int main(void) {
int unused = INT_MAX + 1;
}

This program produces no output, yet clearly executes a function
that contains an expression that induces undefined behavior when
evaluated. I suppose an argument could be made that it _might_
generate output due to UB, as UB imposes no requirements Not to
do so, so perhaps the _absence_ of output depends on UB.

The program clearly has undefined behavior when executed, but no
output depends on that undefined behavior. In my humble opinion,
this demonstrates a flaw in the standard's definition of "strictly
conforming program". (As a programmer: Don't do that.)

[...]

In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.

I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.

"There are only two kinds of languages: the ones people complain
about and the ones nobody uses."
-- Bjarne Stroustrup

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 08, 2026 13:40:56

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

Note that in a context that requires a constant expression, overflow is >>>>>> a constraint violation. For example, a case label like:

case (INT_MAX + 1) * 0:

must be diagnosed at compile time.

gcc disagrees with you.

What makes you think so?

[...]

I'm skipping this and proceeding on to the original question.

What question? I made a statement.

Why?

gcc is not authoritative. I didn't want to get into an argument
about whether gcc is conforming, or which version of gcc was used,
or any similar distractions. The C standard /is/ authoritative,
and I thought it would save time to cut to the chase.

I never said gcc is authoritative. *You* brought gcc into the
discussion.

It is a fact that gcc issues a diagnostic for that case label.
It is a fact that it's a non-fatal warning with "-pedantic" and a
fatal error with "-pedantic-errors", which implies, as I understand
it, that the authors of gcc believe that the diagnostic is required
by the standard.

You made a statement, "gcc disagrees with you". I demonstrated,
in text that you snipped, that gcc does in fact agree with me.

No, you didn't.

Yes, I did.

You were wrong.

No, I wasn't. Your testing was faulty.

Yes, you were. My testing was not faulty.

What exactly did you mean by "gcc disagrees with you"? I
think it's sufficiently obvious that gcc does not have opinions,
so you presumably were speaking figuratively in some sense.
Do you not see the same diagnostic I saw?

I don't know the basis of your error, so I asked.
Or maybe I'm missing something, and you had a valid point that I
didn't understand.

I'm offended that you think I have an obligation to remedy your
habit of lazy thinking, especially when as here the answer was
staring you right in the face, and you simply ignored it.

OK. I'm offended by your superior attitude. I'm offended by your
refusal to consider that you might have made a mistake. I'm offended
by your refusal to explain what you meant by an unclear statement
after I repeatedly ask you to do so. I'm offended by your apparent
assumption that if the rest of us just *think really hard*, we'll
inevitably agree with you.

You're not required to answer my question, which I think was
an extremely reasonable one, but quoting it and then explicitly
refusing to answer it is pointlessly rude.

I wasn't refusing to answer. What I was doing was trying to
answer the original question, and answer it in a way that wouldn't
get lost in pointless bickering. Silly me.

I'm assuming that by "the original question", you're referring to my *statement* that a diagnostic is required for the above case label.
If you have some other "original question" in mind, please specify
it. Please do not insult me by assuming that I'll know exactly
what you mean if I just reread what you wrote and think hard enough.

If you were trying to answer the "original question", you failed.
You expressed your supposed disagrement by asserting, without
further explanation, that gcc disagrees with me -- when, in fact,
it does not, and when gcc's behavior is not directly relevant to
the original statement anyway (since, as you correctly point out,
gcc is not authoritative).

I'd like to know whether you still think you were right. If so,
I'd like to see your explanation. If not, an admission that you
made a mistake would be appreciated. But I expect neither from you.

I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.

That explanation is not relevant to your claim that gcc disagrees
with me, which is what I asked you about.

In investigating this question, I have run compilations using
multiple versions of gcc, on two different platforms. I have looked carefully through the gcc man page. I have also run compilations
using multiple versions of clang, on two different platforms. After
doing all that, I ran compilations using godbolt, so I could check
the latest, or maybe almost latest, versions of gcc and clang. All
the different versions of gcc and clang that I have tried support my hypothesis that gcc (and now also clang) interpret the C standard so
as to conclude that conforming to the C standard need not require a diagnostic for situations like the code under discussion..

You've told us what you concluded from your compilations using godbolt.
You haven't told us what those compilations actually told you.

On the off chance that you're willing to answer a straightforward
question:

Here's one result I got on my system:

$ gcc16 --version | head -n 1
gcc16 (GCC) 16.1.0
$ cat c.c
#include <limits.h>
int main(void) {
switch(0) {
case (INT_MAX + 1) * 0:
break;
}
}
$ gcc16 -std=c23 -pedantic-errors -c c.c
c.c: In function ?main?:
c.c:4:23: warning: integer overflow in expression of type ?int? results in ?-2147483648? [-Woverflow]
4 | case (INT_MAX + 1) * 0:
| ^
c.c:4:9: error: overflow in constant expression [-Woverflow]
4 | case (INT_MAX + 1) * 0:
| ^~~~
$

gcc emitted a fatal error message on that case label. Have you
seen any version of gcc, either on your system or on godbolt,
*not* issue a fatal error message when invoked on that source with
"-std=cNN -pedantic-errors" (NN=23, or any valid value you like)?
If so, have you seen it not at least issue a warning?

If not, what is the basis for your claim that gcc disagrees with me?

It's conceivable that what you meant is that gcc happens to issue
a diagnostic, but is not required to. If so, then (a) that's
sufficiently subtle that any reasonable person would have explained
that point, and (b) given that gcc produces a diagnostic, I see no
basis to assume that gcc "thinks" it's not required to do so.

I'd like to ask you to do two things. First, read through the
reasoning given in my previous post, try to assess whether that
reasoning is sound, and post the results of yours contemplations.
Second, look again at the question of whether gcc (and also clang,
if you're up to it) support the hypothesis that a conforming
implementation need not give a diagnostic for code like that under discussion. See if you can find a way of framing the question that
supports my statement, rather than simply looking for one that
supports your preconceived ideas. Post the results of your
investigations, both what other experiments you tried, and what your assessment is of the results you got.

You made a very simple claim, that gcc disagrees with me. I'm asking
you about *that statement*. Do you still assert that gcc disagrees
with me? (That is not a question about the C standard.)

Do these two things and I will endeavor to explain my views on the
questions you have raised here, if such explanations are still
needed after your further examinations and comments.

[SNIP]

I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.

The actual text of the standard implies that 42 is not an expression.
I rely on the obvious intent to conclude that it is.

Now it is you who is changing the subject. Besides not being on
point to the question being considered, it's a silly argument, and I
would hope you are smart enough to realize that. However, if you do
what I have asked in the previous paragraph, I can try to explain
why I think your views on this unrelated matter are wrongheaded.

Please be less condescending.

Leaving gcc aside, my original statement was that a case label like:

case (INT_MAX + 1) * 0:

is a constraint violation (and therefore that it requires a diagnostic).
It's possible that I'm mistaken on that point. The constraint I claim
it violates is that "Each constant expression shall evaluate to a
constant that is in the range of representable values for its type."

We could have discussed that much more briefly if you hadn't dragged
gcc into it.

I acknowledge that it can also be reasonably argued that the
expression as a whole *can*, for a particular implementation, yield
a result of 0, and therefore that a diagnostic is not required *for
such an implementation*.

The committee response to C90 DR #031 contradicts that argument:

case (INT_MAX*4)/4: is a constraint violation.
When subclause 6.4 says on page 55, lines 11-12:

Each constant expression shall evaluate to a constant that is in the
range of representable values for its type.

the Committee's judgement of the intent is that the
``representable'' requirement applies to each subexpression of
a constant expression, as shown in the third example. A constant
expression is meant as defined by the syntax rules.

My judgement of the intent agrees with the Committee's, and, as
far as I can tell, with gcc's.

(I do think that the wording in the standard could and should be
improved.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 08, 2026 14:05:06

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]

The actual text of the standard implies that 42 is not an expression.
I rely on the obvious intent to conclude that it is.

I made the above statement to demonstrate that just following the exact
wording of the standard, without thinking about the (sometimes unclear)
intent behind it, can lead to absurd results.

I've discussed this particular glitch before, but it's been a while.

N3220 6.5.1 says:

An *expression* is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs
a combination thereof.

I believe the wording is unchanged from C90 up to the latest C202y
draft. Since the word "expression" is in italics, this is the
standard's definition of the word.

This is a flawed definition. The terms "operator" and "operand"
are defined in 6.4.6:

*punctuator: one of
[ ] ( )
[snip]

A punctuator is a symbol that has independent syntactic and semantic
significance. Depending on context, it may specify an operation to
be performed (which in turn may yield a value or a function
designator, produce a side effect, or some combination thereof) in
which case it is known as an *operator* (other forms of operator also
exist in some contexts). An *operand* is an entity on which an
operator acts.

Consider this expression statement:

42;

Is `42` an expression? Clearly it's intended to be, but there is no
operator, and therefore there is no operand, so it doesn't meet the
standard's definition of the word "expression".

For that matter, consider:

(void)0;

It's "obvious" that `(void)0` is an expression. It consists of one
operator `(void)` and one operand `0` (I'll ignore the fact that
the definition uses plurals for both), but it does not specify
computation of a value, or designate an object or a function,
or generates side effects, or perform a combination thereof.

The fact that the standard's definition of "expression" is flawed is
not much of a problem in practice. Virtually everyone, implementers
and programmers, assumes the obvious intent. Nobody believes that
`42` isn't an expression. But it is my strongly held opinion that
the wording should be improved in a future edition of the standard.

I think it should say something to the effect that the meaning
of the term "expression" is defined by the grammar. The current
wording that claims to be the definition of the term could, with
a few tweaks, still be turned into a valid normative statement
*about* expressions.

I have a similar issue with the standard's definition of "value":
"precise meaning of the contents of an object when interpreted as
having a specific type". It's obvious that the result of evaluating
a non-void expression (such as the infamous `42`) is a "value",
but the definition implies that a "value" can only be the meaning
of the contents of an object. Nobody is actually misled by the
current definition, but it should be improved.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 08, 2026 23:15:48

In article <11075os$3fm4u$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

I disagree. That's not a sensible interpretation of what the
standard says.

I agree it's not sensible. But sadly, the standard does not
seem to explicitly prohibit it, either. This is the point: we
necessarily rely on a "reasonable interpretation" of the
standard to be able to usefully write C code. An adversarial
interpretation is not sensible, but it appears that such is
possible given the standard as written. This is a danger with a
language that is not formally specified.

A call to a foo() would have undefined behavior if it occurred.

What I'm really trying to get at is that the behavior of
`int zero = (INT_MAX + 1)*0;` is undefined in all cases. There
is no input for which it is valid at all. It is qualitatively
different than other examples where UB cannot be detected
_except_ at runtime.

In particular, it does not become defined just because it's in a
function that is not called; the behavior is UB on its face. It
is utterly meaningless as far as C is concerned; it is what
Regehr calls a "Type 3" function in his taxonomy at https://blog.regehr.org/archives/213: it literally has no
definition.

There
is no call to foo().

What I am further saying is that I do not see where the C
standard puts additional constraints on an implementation so
that it _must_ accept a program with such a construct in it, as
sensible as that may otherwise be (I actually don't think that
is very sensible, but that's my opinion). The specific wording
of the standard appears to allow a compiler to halt translation
if it observes that expression, whether it's in a function that
is called or not.

I readily concede that I may be wrong. But the arguments I have
heard opposing this interpration are not well-supported by the
text. I would be happy if someone could provide such an
argument that did not ultimately rely on either intuition or
assumptions about reasonable behavior, but so far, none have
been proferred.

Similarly:

int a = ..., b = ...;
int c;
if (b != 0) {
c = a / b;
}
else {
c = 0;
}

A division by zero would have undefined behavior if it occurred,
but it never occurs. A compiler cannot reject the above code
because of UB that never happens.

This I also agree with. But assuming this is in some function
that is otherwise well-defined, this is what Regehr calls a
"Type-1" function: there is no input for which it is undefined.

In this regard, it is qualitatively different than the `foo`
example that is the subject of this thread. I suggest that that
qualitative difference actually matters.

[...]

It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.

I have yet to find or be shown a way in which the standard
actually guarantees that.

How does the standard guarantee *anything*?

The thrust of what I have been driving at is that the standard
actually guarantees a lot less than people take for granted.

This strictly conforming program:

int main(void) { return 0; }

when executed returns a status of 0 from main and does nothing else.

Actually, does it? It also implicitly closes the standard
input, output, and error streams. That could have side effects.

Adding an uncalled function to the same source file doesn't change
that.

But it's not _just_ an uncalled function. It's an uncalled
function that is manifestly gibberish because there is no input
for which that expression is well-defined.

I have not found evidence that the standard explicitly prohibits
a pathological compiler from doing something unexpected in that
case. An adversarial read of the standard could allow a
compiler to treat this in a manner similar to a syntax error.

[...]

There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied. We're well past that; now it's
a vehicle for compiler writers to make benchmarks faster, but is
(generally) hostile to programmers. A lot of hay is made about
it in this group, but at the core, it's just (ironically) not
well-defined.

The standard does say what UB is meant for. It says what UB
*is*, and what constructs lead to it (by omission in some cases).
Any optimization tricks played by compiler implementers must be
based on that specification.

Yes. Just so. And it also says that anything not explicitly
stated in the standard is UB.

As we all know, the definition of UB in the standard is,
"behavior, upon use of a nonportable or erroneous program
construct or of erroneous data, for which this document imposes
no requirements."

Behavior is defined as, "external appearance or action". Note
that this does not explicitly state that "behavior" is only
applicable during execution, and we know that the standard, as
written today, says that some behaviors are "undefined" _at
translation time_. I cannot find something forbidding an
implementation from interpreting "external appearance or action"
to refer to the success or failure of translation and production
of an associated artifact. Translation phase 7 then says that
the after all of the preprocessing and so forth, "the resulting
tokens are syntactically and semantically analyzed and
translated as a translation unit." As written, a compiler could
certainly detect that that expression, whether executed or not,
is UB.

Indeed, sec 3.5.3 para 2, "Note 1 to entry", explicitly mentions
terminating translation as one of a few sample "undefined
behaviors". It doesn't say that the compiler _has_ to do that,
but does not say that it _must not_, either.

Sec 3.5.3 para 4 ("Note 3 to entry") is the closest I see to
mandating the interpretation you and Rentsch have taken, but
that is specific to _execution time_, not _translation time_,
and the latter is not outright banned from responding to UB: the
text of the standard imposes no requirements in this context.
Dare I say that the translation-time behavior is undefined?

[...]

I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.

Exactly. It could also emit the string, "GOODBYE WORLD."

No, it couldn't. It must emit "hello, world\n" in some form.
It must emit the character 'h' as represented in the execution
character set, followed by 'e', and so on.

I didn't say that it wouldn't; I was referring specifically to
the behavior on closing stdout. You are right, it must emit
something corresponding to, "hello, world\n"; but what it does
after that is up to the implementation. We agree that it could
emit a terminal reset sequence; there is no reason that sequence
couldn't be, "GOODBYE WORLD." It'd be a weird one, but it's not
impossible.

[...]

This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.

It's not UB if it's never called. Behavior that doesn't happen is
not behavior.

See above. The standard simply does not say that. The standard
merely says that behavior is something that manifests as
"external appearance or action." Translation is certainly an
action with an "external appearance" and nothing says that
behavior _during translation_ is any less "behavior" than
behavior during execution. In fact, the standard explicitly
mentions undefined behavior and translation.

I did not presuppose that the program is strictly conforming.

Well, you kinda did: you said that the program is strictly
conforming, and then said that it must be accepted because it is
strictly conforming. That acceptance is predicated on it being
strictly conforming.

I read the source code and determined that it meets the standard's
definition of a strictly conforming program.

I have presented what I think is an equally valid, alternative
reading of the text of the standard where that does not hold.

That reading is, admittedly, adversarial. That does not mean it
is wrong. I am saying that this is a weakness of the standard,
not a good interpretation.

40 years ago people thought the idea of that a post-modern
compiler time-travelling in the pursuit of optimization when UB
is detected during translation was an adversarial read of the
standard. And yet, here we are.

[...]

Ok, so in that case, would we say that "`foo` has undefined
behavior?" The qualification, "...if called" seems superfluous,
and I don't see anything in the standard that explicitly
disagrees.

The qualification "if called" is the whole point.

Except it's not. The behavior of that expression is simply
undefined; whether executed or not, there's no way it _could_ be
defined.

[...]

UB can time-travel, however. Because it's undefined, the
compiler is free to assume that it never executes, or that it
always executes.

"UB can time-travel" is perhaps an oversimplification.

An example is
a bug that occurred in the Linux kernel, something like:

void func(int *ptr) {
do_something_with(*ptr);
if (ptr != NULL) {
blah();
}
}

The compiler, on seeing the expression `*ptr`, assumed that `ptr` is
not null, and elided the test on the following line.

But even assuming that's valid, a compiler absolutely cannot assume that
an instance UB always executes when, according to the semantics of the >program, it provably never executes.

Time travel is a term of art, here. I posted this elsewhere in
the thread, and I think he does a much better job explaining it
than I can:
https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633

Reading a bit more, I think that C23 sec 3.5.3 para 4 appears
to be trying to reign that in. Hope springs eternal.

[...]

So any program that produces no output at all is strictly
conforming? Then what about this?

#include <limits.h>

int
zero(void)
{
return (INT_MAX + 1) * 0;
}

int
main(void)
{
(void)zero();
return 0;
}

That's an interesting point. A more terse example:

#include <limits.h>
int main(void) {
int unused = INT_MAX + 1;
}

Sure. Or consider this program:

```
#include <limits.h>

int
foo(int a)
{
extern int int_max;
int_max = INT_MAX + 1;
return int_max;
}

int
main(void)
{
return 0;
}
```

Suppose that no definition for `int_max` is provided; is this a
strictly conforming program? Consider section 6.9.1, which
describes external definitions. The relevant paragraph is 5,
which reads in part, "If an identifier declared with external
linkage is used in an expression somewhere in the entire program
there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one."

But as has been argued, `int_max` is not actually _used_, since
`foo` is never called. If that holds, then this ought to be
accepted by a conforming implementation. Yet, this fails to
build with both gcc and clang, clearly both consider `int_max`
to be "used". Ok, so what about this?

#include <limits.h>

int
foo(int a)
{
extern int int_max;
if ((INT_MAX + 1)*0) {
int_max = INT_MAX + 1;
}
return 0;
}

int
main(void)
{
return 0;
}

This _does_ build.

So it appears that, at least for `gcc` and `clang`, merely not
calling `foo` is insufficient.

This program produces no output, yet clearly executes a function
that contains an expression that induces undefined behavior when
evaluated. I suppose an argument could be made that it _might_
generate output due to UB, as UB imposes no requirements Not to
do so, so perhaps the _absence_ of output depends on UB.

The program clearly has undefined behavior when executed, but no
output depends on that undefined behavior. In my humble opinion,
this demonstrates a flaw in the standard's definition of "strictly
conforming program". (As a programmer: Don't do that.)

That's kind of what I'm saying. Though this interpretation
hinges on whether the absence of output can be defined as output
in some sense; in this case, the compiler could emit code that
says, "this program has UB", and I think that would be fine with
respect to the standard.

But the standard says that an implementation can stop
translating a program if it detects UB, and nothing appears to
limit that to functions that have been called from `main`.

[...]

In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.

I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.

"There are only two kinds of languages: the ones people complain
about and the ones nobody uses."

Yup.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Monday, June 08, 2026 18:51:58

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <11075os$3fm4u$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

I disagree. That's not a sensible interpretation of what the
standard says.

I agree it's not sensible. But sadly, the standard does not
seem to explicitly prohibit it, either. This is the point: we
necessarily rely on a "reasonable interpretation" of the
standard to be able to usefully write C code. An adversarial
interpretation is not sensible, but it appears that such is
possible given the standard as written. This is a danger with a
language that is not formally specified.

I started to compose a followup, but I found that I was mostly
repeating things I've already written.

I see no semantic difference between code in a function that's never
called and code that simply isn't in the program. Neither allows
an implementation to reject a strictly conforming program -- and
yes, the program we've been discussing is as strictly conforming as
`int main(void){}`.

There's nothing special about functions as units of a program
subject to undefined behavior. These two programs are semantically
equivalent:
void foo(void) { do_something(); }
int main(void) { foo(); }
and
int main(void) { do_something(); }

A simpler demonstration program might be:

#include <limits.h>
int main(void) {
return 0;
INT_MAX+1;
}

I assert that it is strictly conforming.

The permission for UB to result in terminating a translation
isn't even in normative text. It's in a non-normative note,
which in principle means that it should be derivable from the
normative text of the standard. (I'm not entirely sure it can be.)
It certainly doesn't override the requirement that a conforming
hosted implementation shall accept any strictly conforming program.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Monday, June 08, 2026 23:05:24

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-01 00:54, Keith Thompson wrote:

[...]

Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.

This is something I really don't get in the actual C-logic...

Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? [...]

There's an important distinction to make here. Consider this
program:

#include <limits.h>

int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}

int
main(){
return 0;
}

This program does not transgress the bounds of undefined behavior.

To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly
conforming with respect to conforming hosted implementations.
(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly
conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted
implementations.)

Ok.

[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.

I explained the context of my previous statements above. Sorry for
not saying that in the original message.

In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).

The semantics described in the ISO C standard don't admit that
possibility.

I have read through much of what has been said in the subthread
following this posting. I expect I will not be responding to much
of it; my overall sense is that the discussion is mostly confused.
I would like to say one thing here, and see if that helps things.

Could you please point to where it says this, in the C standard?

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

The logic here is backwards. The C standard is prescriptive: it
says what _does_ happen, not what _doesn't_ happen. If one wants
to establish that some "action" takes place, it is necessary to
find a passage, or passages, in the C standard that, if all are
taken together, shows that the "action" occurs, or at least that it
can occur. The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur. If there is no such chain of reasoning, naming
the pertinent passages in the C standard, to establish a possible
call, then there is no possible call. In other words the burden of
proof for a claim that some action could occur rests on whoever is
making the claim; there is no need to look for something in the C
standard that says something cannot occur.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Tuesday, June 09, 2026 00:54:08

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86y0gp82pd.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.

I am mystified as to why you are bringing my name into this, and
why you think "I may not like the consequences", or even what
that means. In any event, you are evidently laboring under some
assumption about what I think about this matter that is probably
incorrect.

In a response to another posting of mine, you wrote this:

But as it happens, I think I can see how your interpretation may
be valid: if, as a result of UB, the expression evaluates to "0"
(or 12 or something simiilar) that _is_ representable, then
there _is no constraint violation_ and so no diagnostic is
required.

I do not believe that that is the intent. But it _is_
conformant with the text of the standard.

I based my statement that begins "An explanation that Dan Cross
agreed with, ..." on those two paragraphs.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 09, 2026 09:46:01

In article <1107rk3$3ldg4$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <11075os$3fm4u$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.

Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.

I disagree. That's not a sensible interpretation of what the
standard says.

I agree it's not sensible. But sadly, the standard does not
seem to explicitly prohibit it, either. This is the point: we
necessarily rely on a "reasonable interpretation" of the
standard to be able to usefully write C code. An adversarial
interpretation is not sensible, but it appears that such is
possible given the standard as written. This is a danger with a
language that is not formally specified.

I started to compose a followup, but I found that I was mostly
repeating things I've already written.

Yeah, I feel we're going around in circles, here.

I see no semantic difference between code in a function that's never
called and code that simply isn't in the program. Neither allows
an implementation to reject a strictly conforming program -- and
yes, the program we've been discussing is as strictly conforming as
`int main(void){}`.

That's the crux of the issue. I'm not convinced that it is. I
can see an argument for it (and it's a pretty strong one) but I
can see an argument against, and the standard as written is
underspecified in my opinion. Really, that's it.

There's nothing special about functions as units of a program
subject to undefined behavior. These two programs are semantically >equivalent:
void foo(void) { do_something(); }
int main(void) { foo(); }
and
int main(void) { do_something(); }

A simpler demonstration program might be:

#include <limits.h>
int main(void) {
return 0;
INT_MAX+1;
}

I assert that it is strictly conforming.

The permission for UB to result in terminating a translation
isn't even in normative text. It's in a non-normative note,
which in principle means that it should be derivable from the
normative text of the standard. (I'm not entirely sure it can be.)

That specific instance is not, no; that's in a note as you point
out. I believe deriving it from the normative text is based on
UB imposing no requirement at all on the implementation.

It certainly doesn't override the requirement that a conforming
hosted implementation shall accept any strictly conforming program.

...assuming the program is strictly conforming.

I have arrived at the same place you are with your "42 is not an
expression" example. The wording of the standard could be
improved to avoid things like this.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 09, 2026 10:08:09

In article <86pl2087z3.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86y0gp82pd.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[...]

I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.

I am mystified as to why you are bringing my name into this, and
why you think "I may not like the consequences", or even what
that means. In any event, you are evidently laboring under some
assumption about what I think about this matter that is probably
incorrect.

In a response to another posting of mine, you wrote this:

But as it happens, I think I can see how your interpretation may
be valid: if, as a result of UB, the expression evaluates to "0"
(or 12 or something simiilar) that _is_ representable, then
there _is no constraint violation_ and so no diagnostic is
required.

I do not believe that that is the intent. But it _is_
conformant with the text of the standard.

I based my statement that begins "An explanation that Dan Cross
agreed with, ..." on those two paragraphs.

Nothing in those two paragraphs asserts that I am unhappy with
the consequences; I neither like nor dislike the "consequences."
I simply don't think that was the intent of people who wrote the
standard.

Before asserting a subjective interpretation of what someone
else feels about a thing, you should seek to clarify if what you
intent to say is accurate. Better yet, just don't do it. And
of course, what I think about the matter is irrelevant to what
you wrote to Keith, which I found sufficiently distasteful that
I rather wish you hadn't mentioned my name in it at all.

The rest of my earlier response stands.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 09, 2026 10:19:21

In article <86tsrc8d0b.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[snip]

I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.

The logic here is backwards. The C standard is prescriptive: it
says what _does_ happen, not what _doesn't_ happen.

The definition of undefined behavior in the standard says that
it _imposes no requirements._ It is explicit that it says it
mandates neither "what _does_ happen" nor "what _doesn't_
happen."

If one wants
to establish that some "action" takes place, it is necessary to
find a passage, or passages, in the C standard that, if all are
taken together, shows that the "action" occurs, or at least that it
can occur.

So you're saying that the proverbial nasal demons quip about UB
is incorrect, since it's not proscribed by the standard. Thanks
for clarfiying that.

The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:

```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}

void
foo(void)
{
printf("never called\n");
}
```

The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.

If there is no such chain of reasoning, naming
the pertinent passages in the C standard, to establish a possible
call, then there is no possible call. In other words the burden of
proof for a claim that some action could occur rests on whoever is
making the claim; there is no need to look for something in the C
standard that says something cannot occur.

See above.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tuesday, June 09, 2026 15:17:29

On 2026-06-08 23:05, Keith Thompson wrote:

[...]
I've discussed this particular glitch before, but it's been a while.

N3220 6.5.1 says:

An *expression* is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs
a combination thereof.

I believe the wording is unchanged from C90 up to the latest C202y
draft. Since the word "expression" is in italics, this is the
standard's definition of the word.

This is a flawed definition. The terms "operator" and "operand"
are defined in 6.4.6:

*punctuator: one of
[ ] ( )
[snip]

A punctuator is a symbol that has independent syntactic and semantic
significance. Depending on context, it may specify an operation to
be performed (which in turn may yield a value or a function
designator, produce a side effect, or some combination thereof) in
which case it is known as an *operator* (other forms of operator also
exist in some contexts). An *operand* is an entity on which an
operator acts.

Consider this expression statement:

42;

Is `42` an expression? Clearly it's intended to be, but there is no operator, and therefore there is no operand, so it doesn't meet the standard's definition of the word "expression".

Above you used the term "expression statement", and then compare the
"42" to an "expression".

I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.

I know from various languages' syntax definitions that a number like
'42' is a sensible form for an expression (and no operators required).
It's also depending on the context. Where expressions may be written
(and where not) depends on the concrete language; syntactically and
also semantically.

Usually I'd expect above "expression-statement" to serve some purpose, semantically. I don't recall that in "C" such an expression-statement
would serve any purpose. (Or that they'd show any observable behavior,
if that term fits the C-parlance better?)

Or do these stand-alone values (the "expression-statement") have some practically useful semantics?

In other languages such stand-alone values serve a purpose; e.g. they
may determine the result value of a block that can then be used in an
outer context; but in "C" such constructs are obviously not possible.

What purpose serve such stand-alone numbers in places where statements
are expected?

[...]

The fact that the standard's definition of "expression" is flawed is
not much of a problem in practice. Virtually everyone, implementers
and programmers, assumes the obvious intent. Nobody believes that
`42` isn't an expression. But it is my strongly held opinion that
the wording should be improved in a future edition of the standard.

I think it should say something to the effect that the meaning
of the term "expression" is defined by the grammar. The current
wording that claims to be the definition of the term could, with
a few tweaks, still be turned into a valid normative statement
*about* expressions.

I have a similar issue with the standard's definition of "value":
"precise meaning of the contents of an object when interpreted as
having a specific type". It's obvious that the result of evaluating
a non-void expression (such as the infamous `42`) is a "value",
but the definition implies that a "value" can only be the meaning
of the contents of an object. Nobody is actually misled by the
current definition, but it should be improved.

Janis

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Tuesday, June 09, 2026 14:53:35

On 09/06/2026 14:17, Janis Papanagnou wrote:

On 2026-06-08 23:05, Keith Thompson wrote:

[...]
I've discussed this particular glitch before, but it's been a while.

N3220 6.5.1 says:

�� An *expression* is a sequence of operators and operands that
�� specifies computation of a value, or that designates an object
�� or a function, or that generates side effects, or that performs
�� a combination thereof.

I believe the wording is unchanged from C90 up to the latest C202y
draft.� Since the word "expression" is in italics, this is the
standard's definition of the word.

This is a flawed definition.� The terms "operator" and "operand"
are defined in 6.4.6:

�� *punctuator: one of
�� [ ] ( )
�� [snip]
�� A punctuator is a symbol that has independent syntactic and semantic
�� significance. Depending on context, it may specify an operation to
�� be performed (which in turn may yield a value or a function
�� designator, produce a side effect, or some combination thereof) in
�� which case it is known as an *operator* (other forms of operator
also
�� exist in some contexts). An *operand* is an entity on which an
�� operator acts.

Consider this expression statement:

�� 42;

Is `42` an expression?� Clearly it's intended to be, but there is no
operator, and therefore there is no operand, so it doesn't meet the
standard's definition of the word "expression".

Above you used the term "expression statement", and then compare the
"42" to an "expression".

I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.

I know from various languages' syntax definitions that a number like
'42' is a sensible form for an expression (and no operators required).
It's also depending on the context. Where expressions may be written
(and where not) depends on the concrete language; syntactically and
also semantically.

Usually I'd expect above "expression-statement" to serve some purpose, semantically. I don't recall that in "C" such an expression-statement
would serve any purpose. (Or that they'd show any observable behavior,
if that term fits the C-parlance better?)

Or do these stand-alone values (the "expression-statement") have some practically useful semantics?

In other languages such stand-alone values serve a purpose; e.g. they
may determine the result value of a block that can then be used in an
outer context; but in "C" such constructs are obviously not possible.

What purpose serve such stand-alone numbers in places where statements
are expected?

I think it is just difficult for the syntax to ban certain expressons
and not others. How would you express that in the grammar?

If you ramp up the warnings, then you'll get messages like 'statement
with no effect' or 'computed value not used', since sometimes there are side-effects that are needed:

f() + g();

f() and g() both do something, but nothing is done with their sum.

In my projects, such standalone expressions are always a hard error. The
main exceptions include (using C syntax):

f();
++a;
a = b;

These are expressions that can return values, but that can sensibly be
used standalone too. (I don't support value-returning compound assignments.)

(I first introduced this check because in the past, if I'd been writing
some C, I might write 'a = b' instead of 'a := b'. The first does
nothing (compares then discards result), but it is not what I'd intended.)

Anyway, I don't have it as a syntax violation either.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tuesday, June 09, 2026 16:05:03

On 2026-06-08 14:41, Dan Cross wrote:

[...]

Unfortunately, the C standard is simply not a precise, formal
document. This is well-known, and it's hardly C's fault: indeed
most of the applications of formalized descriptions of PL
semantics to practical programming languages postdates C's
invention; Dana Scott didn't introduce the term, "operational
semantics" until 1970, and it didn't start to make a serious
impact on languages until later.

Disclaimer: I haven't read Dana Scott's source that you refer to.
Myself I've heard that term at university during the early 1980's.
In 1970 my "knowledge" about computers was on Star-Trek level only.

I just want to point out Algol 68's formal specification (pre-1970).

And provide this quote on "Operational Semantic" (from Wikipedia):
"The concept of operational semantics was used for the first time
in defining the semantics of Algol 68."

But Algol 68 was certainly outstanding here, concerning its formal specification, compared to most other languages back these days.

Janis

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Tuesday, June 09, 2026 16:30:23

On 2026-06-09 15:53, Bart wrote:

On 09/06/2026 14:17, Janis Papanagnou wrote:

On 2026-06-08 23:05, Keith Thompson wrote:

[...]
I've discussed this particular glitch before, but it's been a while.

N3220 6.5.1 says:

�� An *expression* is a sequence of operators and operands that
�� specifies computation of a value, or that designates an object
�� or a function, or that generates side effects, or that performs
�� a combination thereof.

I believe the wording is unchanged from C90 up to the latest C202y
draft.� Since the word "expression" is in italics, this is the
standard's definition of the word.

This is a flawed definition.� The terms "operator" and "operand"
are defined in 6.4.6:

�� *punctuator: one of
�� [ ] ( )
�� [snip]
�� A punctuator is a symbol that has independent syntactic and
semantic
�� significance. Depending on context, it may specify an operation to
�� be performed (which in turn may yield a value or a function
�� designator, produce a side effect, or some combination thereof) in
�� which case it is known as an *operator* (other forms of operator
also
�� exist in some contexts). An *operand* is an entity on which an
�� operator acts.

Consider this expression statement:

�� 42;

Is `42` an expression?� Clearly it's intended to be, but there is no
operator, and therefore there is no operand, so it doesn't meet the
standard's definition of the word "expression".

Above you used the term "expression statement", and then compare the
"42" to an "expression".

I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.

I know from various languages' syntax definitions that a number like
'42' is a sensible form for an expression (and no operators required).
It's also depending on the context. Where expressions may be written
(and where not) depends on the concrete language; syntactically and
also semantically.

Usually I'd expect above "expression-statement" to serve some purpose,
semantically. I don't recall that in "C" such an expression-statement
would serve any purpose. (Or that they'd show any observable behavior,
if that term fits the C-parlance better?)

Or do these stand-alone values (the "expression-statement") have some
practically useful semantics?

In other languages such stand-alone values serve a purpose; e.g. they
may determine the result value of a block that can then be used in an
outer context; but in "C" such constructs are obviously not possible.

What purpose serve such stand-alone numbers in places where statements
are expected?

I think it is just difficult for the syntax to ban certain expressons
and not others. How would you express that in the grammar?

Well, I'd do that as it's done in other languages.

Define _statements_ and define _expressions_. And defined expressions
in contexts where a sensible operational semantics can be defined (as
in mathematical formulas, actual function parameter lists, etc.), but
not in places where statements are expected.

If you ramp up the warnings, then you'll get messages like 'statement
with no effect' or 'computed value not used', since sometimes there are side-effects that are needed:

�� f() + g();

f() and g() both do something, but nothing is done with their sum.

Right. And I wouldn't allow a mathematical formula where the results
are calculated but not used, here an expression, as a statement.

But your example may indeed lead to the actual answer to my question;
when writing just

f();

There's no distinction of procedures and functions in "C". One cannot
tell whether that f() is a "procedure" (i.e. a function with no return
value, or one with return value but the call just relying on the side
effects). In "C" any value of f() just gets discarded in this context.

That of course doesn't mean that it could be handled by the compilers
and sensibly defined by the language, depending on how f() is actually
defined. After all, 'f();' is not the same case as '42;'.

But okay, we're talking about "C" here - so own design preferences are
anyway irrelevant here.

Janis

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tuesday, June 09, 2026 17:13:10

On 09/06/2026 16:30, Janis Papanagnou wrote:

On 2026-06-09 15:53, Bart wrote:

On 09/06/2026 14:17, Janis Papanagnou wrote:

On 2026-06-08 23:05, Keith Thompson wrote:

[...]
I've discussed this particular glitch before, but it's been a while.

N3220 6.5.1 says:

�� An *expression* is a sequence of operators and operands that
�� specifies computation of a value, or that designates an object
�� or a function, or that generates side effects, or that performs
�� a combination thereof.

I believe the wording is unchanged from C90 up to the latest C202y
draft.� Since the word "expression" is in italics, this is the
standard's definition of the word.

This is a flawed definition.� The terms "operator" and "operand"
are defined in 6.4.6:

�� *punctuator: one of
�� [ ] ( )
�� [snip]
�� A punctuator is a symbol that has independent syntactic and
semantic
�� significance. Depending on context, it may specify an operation to >>>> �� be performed (which in turn may yield a value or a function
�� designator, produce a side effect, or some combination thereof) in >>>> �� which case it is known as an *operator* (other forms of
operator also
�� exist in some contexts). An *operand* is an entity on which an
�� operator acts.

Consider this expression statement:

�� 42;

Is `42` an expression?� Clearly it's intended to be, but there is no
operator, and therefore there is no operand, so it doesn't meet the
standard's definition of the word "expression".

Above you used the term "expression statement", and then compare the
"42" to an "expression".

I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.

I know from various languages' syntax definitions that a number like
'42' is a sensible form for an expression (and no operators required).
It's also depending on the context. Where expressions may be written
(and where not) depends on the concrete language; syntactically and
also semantically.

Usually I'd expect above "expression-statement" to serve some purpose,
semantically. I don't recall that in "C" such an expression-statement
would serve any purpose. (Or that they'd show any observable behavior,
if that term fits the C-parlance better?)

I don't see why you would expect that. Statements do not have to have observable behaviour - indeed, I don't think any statements in C have observable behaviour in themselves. A "statement" in C is basically
something that does not produce a value - "return", "if ...", "for...",
or it is an "expression statement". Expression statements are the most
common type of statement, I would guess (without having calculated statistics.)

Expressions do not have to have observable behaviour. "x = y + z;" is a perfectly good expression statement, but has no observable behaviour
(unless x, y or z are volatile). Most statements, and most expressions,
do not have observable behaviour. (Again, I have no statistics, but I
think this would be the solid majority of statements and expressions.)

Of course most statements and expressions /contribute/ to later
observable behaviour - such as printing out the result of a calculation.
Otherwise they are not much use (and compilers can eliminate or reduce
them, if the compiler is sure that there is no effect on observable behaviour).

Or do these stand-alone values (the "expression-statement") have some
practically useful semantics?

In other languages such stand-alone values serve a purpose; e.g. they
may determine the result value of a block that can then be used in an
outer context; but in "C" such constructs are obviously not possible.

What purpose serve such stand-alone numbers in places where statements
are expected?

I think it is just difficult for the syntax to ban certain expressons
and not others. How would you express that in the grammar?

Agreed.

"42" is an expression of type "int", and so is 'printf("Hello\n")'. How
(and why) would a language distinguish between them and allow one but
not the other?

Well, I'd do that as it's done in other languages.

Define _statements_ and define _expressions_.

C defines statements and expressions. One type of statement is the "expression statement", consisting of an expression followed by a
semi-colon. The expression is optional - if it is missing, you have a
null statement.

And defined expressions
in contexts where a sensible operational semantics can be defined (as
in mathematical formulas, actual function parameter lists, etc.), but
not in places where statements are expected.

So where would "printf" fit in this picture? A printf call gives a
result - it is an expression. It also has side-effects and observable behaviour. "while (false) ;" is a valid statement, with no
side-effects. The distinction you want to make does not exist in C.
(And I don't think C is special in that regard.)

If you ramp up the warnings, then you'll get messages like 'statement
with no effect' or 'computed value not used', since sometimes there
are side-effects that are needed:

�� f() + g();

f() and g() both do something, but nothing is done with their sum.

Right. And I wouldn't allow a mathematical formula where the results
are calculated but not used, here an expression, as a statement.

If the definitions of "f" and "g" are not visible to the compiler at the
time, how could the compiler know that they have no side-effects? Lots
of operators have side-effects - if you want to allow "x = y;" but
disallow "x + y;" you are going to have to have a lot of special cases
and extra grammar, syntax or constraint rules. It is better to do as C
does, and allow expression statements in the language and let compilers
and other tools help developers spot their mistakes.

But your example may indeed lead to the actual answer to my question;
when writing just

� f();

There's no distinction of procedures and functions in "C". One cannot
tell whether that f() is a "procedure" (i.e. a function with no return
value, or one with return value but the call just relying on the side effects). In "C" any value of f() just gets discarded in this context.

Yes.

It is certainly possible for a language to distinguish between "pure functions" and functions/procedures with side-effects. (C actually lets
you do that, with the [[reproducible]] and [[unsequenced]] attributes in
C23, or compiler extensions before C23.) These can aid compiler static
error checking and optimisation, but do not affect the grammar of the language.

That of course doesn't mean that it could be handled by the compilers
and sensibly defined by the language, depending on how f() is actually defined. After all, 'f();' is not the same case as '42;'.

But okay, we're talking about "C" here - so own design preferences are
anyway irrelevant here.

Janis

[...]

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From tTh@3:633/10 to All on Tuesday, June 09, 2026 19:27:50

On 6/9/26 15:53, Bart wrote:

�� f() + g();

f() and g() both do something, but nothing is done with their sum.

I've just one question : why did you waste your life time
with a lot of non-sense questions ?

--
** **
* tTh des Bourtoulots *
* http://maison.tth.netlib.re/ *
** **

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Tuesday, June 09, 2026 19:19:07

On 09/06/2026 18:27, tTh wrote:

On 6/9/26 15:53, Bart wrote:

�� f() + g();

f() and g() both do something, but nothing is done with their sum.

� I've just one question : why did you waste your life time
� with a lot of non-sense questions ?

I didn't ask any question.

You, on the other hand, did.

I take it that you don't understand what is being discussed, and why. In
that case you're wasting /your/ time posting.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 09, 2026 15:07:54

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <1107rk3$3ldg4$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

[...]

The permission for UB to result in terminating a translation
isn't even in normative text. It's in a non-normative note,
which in principle means that it should be derivable from the
normative text of the standard. (I'm not entirely sure it can be.)

That specific instance is not, no; that's in a note as you point
out. I believe deriving it from the normative text is based on
UB imposing no requirement at all on the implementation.

No, the standard imposes no requirements on the *behavior*.
It still imposes requirements on the implementation.

The requirements imposed on an implementation are of a different
kind than the requirements imposed on a running program.
(An implementation might not even be writtin in C.)

For example, if a program dies with a segfault, it's likely due to
the program having undefined behavior. If a compiler dies with a
segfault, it's always a bug in the compiler (though the standard
doesn't say this).

If, as I suggest, the word "behavior" ("external appearance or
action") refers only to the behavior of a running program, then I
don't see how the non-normative permission to terminate a translation
follows from any normative text.

One possible argument is the statement in Section 4 that "A
*conforming hosted implementation* shall accept any strictly
conforming program", which *might* imply that a conforming hosted implementation is permitted to reject (not accept) any program that
is not strictly conforming. I'm not comfortable with that argument.

It certainly doesn't override the requirement that a conforming
hosted implementation shall accept any strictly conforming program.

...assuming the program is strictly conforming.

Or deriving the fact that a program is strictly conforming by reading
the program and the definition of "strictly conforming program".

I have arrived at the same place you are with your "42 is not an
expression" example. The wording of the standard could be
improved to avoid things like this.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 09, 2026 15:12:42

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:

```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}

void
foo(void)
{
printf("never called\n");
}
```

The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.

[...]

That doesn't make sense to me. Do you have a citation to this incident,
and is it relevant to C?

There is a special rule in C about implementations being allowed
to assume that an infinite loop terminates (N3220 6.8.6.1p4),
but (a) it wouldn't apply to this case, and (b) even if it did,
it wouldn't imply that an implicit call to foo would be permitted.
I can imagine an argument that the program has undefined behavior
and therefore it could print "never called" or "nasal demons",
but I'd have to see the argument.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 09, 2026 15:22:06

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
[...]

Above you used the term "expression statement", and then compare the
"42" to an "expression".

I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.

Sorry, I thought that would be clear enough.

Syntactically, an expression-statement is an optional statement
followed by a semicolon (N3220 6.8.4, glossing over an irrelevant
detail). I merely used it as an easy way to establish a context in
which 42 is obviously a full expression (defined as "an expression
that is not part of another expression, nor part of a declarator
or abstract declarator").

An expression-statement where the expression has no side effects
is not useful, but it's permitted. C tends not to ban things just
because they're not useful. `42;` is useful only to illustrate
the point I was making about expressions.

Since a function call is an expression, this is an expression-statement:

printf("hello, world\n");

[...]

To be clear, I have zero doubt that 42 is an expression. My concern
is that the C standard's English definition of "expression" doesn't
quite say so. I advocate improving the wording so it expresses
the obvious and universally agreed intent.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Tuesday, June 09, 2026 18:29:38

On 2026-06-08 21:25, Waldek Hebisch wrote:

Dan Cross <cross@spitfire.i.gajendra.net> wrote:

In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

...

In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.

So there are two things that are at play here.

First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.

The committee has decided otherwise. The committee's resolution to DR
109 said:

"A conforming implementation must not fail to translate a strictly
conforming program simply because some possible execution of that
program would result in undefined behavior. Because foo might never be
called, the example given must be successfully translated by a
conforming implementation."

The module in question defined a function with a line that contained the expression-statement

1/0;

and that statement was absolutely guaranteed to be executed if the
function was called. However, since the module did not contain any calls
to that function, the committee ruled that an implementation was not
allowed to refuse to translate it.

If linked to another module that contained a call to that function,
whether or not the implementation could refuse translation depends upon
what could be said about the call:

1. If the call to that function was guaranteed to be executed upon
starting the program, the implementation may refuse translation.

2. If the call to that function was guaranteed to never be executed, the undefined behavior associated with 1/0 has no effect.

3. If the call to that function might or might not be executed, the
undefined behavior associated with 1/0 cannot have effect until
execution of that call becomes inevitable.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 09, 2026 15:34:06

David Brown <david.brown@hesbynett.no> writes:
[...]

"42" is an expression of type "int", and so is 'printf("Hello\n")'.
How (and why) would a language distinguish between them and allow one
but not the other?

[...]

Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.

In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.

For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored). In Ada, an error in the
equivalent Put_Line("Hello, world") raises an exception, which
can't easily be ignored.

Both approaches are valid.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Tuesday, June 09, 2026 16:01:14

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

The committee has decided otherwise. The committee's resolution to DR
109 said:

"A conforming implementation must not fail to translate a strictly
conforming program simply because some possible execution of that
program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a
conforming implementation."

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_109.html

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wednesday, June 10, 2026 09:04:26

On 10/06/2026 00:34, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

"42" is an expression of type "int", and so is 'printf("Hello\n")'.
How (and why) would a language distinguish between them and allow one
but not the other?

[...]

Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.

I don't know enough about Ada to be sure, but Pascal does not do this -
see below.

In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.

Sure. But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/. A "print" function in Pascal that returned the number of characters printed would be a function, used in an expression,
not a procedure used in a statement.

The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this. What cannot easily be
done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).

In C, an expression statement "expr;" causes the expression to be
evaluated as a void expression for its side effects (?6.8.4p2). You
can, arguably, say that C also requires all statements to be of "void"
type, just like Pascal - but the cast-to-void is done implicitly to
treat "expr;" as "(void) expr;".

For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored). In Ada, an error in the
equivalent Put_Line("Hello, world") raises an exception, which
can't easily be ignored.

Both approaches are valid.

Indeed they are.

It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour. (A "pure procedure" would not do anything.)
As far as I remember, Pascal does not make that distinction.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Bart@3:633/10 to All on Wednesday, June 10, 2026 11:10:29

On 10/06/2026 08:04, David Brown wrote:

On 10/06/2026 00:34, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

"42" is an expression of type "int", and so is 'printf("Hello\n")'.
How (and why) would a language distinguish between them and allow one
but not the other?

[...]

Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.

I don't know enough about Ada to be sure, but Pascal does not do this -
see below.

In both languages, functions and procedures are distinct.� Functions
return values; procedures do not.� An expression cannot be turned
into a statement just by adding a semicolon.� A function call is
an expression.� A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.

Sure.� But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/.� A "print" function in Pascal that returned the number of characters printed would be a function, used in an expression,
not a procedure used in a statement.

The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
return type.� It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this.� What cannot easily be
done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).

In C, an expression statement "expr;" causes the expression to be
evaluated as a void expression for its side effects (?6.8.4p2).

In C201x draft. 6.8.4p2 is about selection statements.

� You
can, arguably, say that C also requires all statements to be of "void"
type, just like Pascal - but the cast-to-void is done implicitly to
treat "expr;" as "(void) expr;".

That's not quite the same thing. If I write:

int a;
a;

then gcc -Wall will report a warning. But write it as (void)a, then it doesn't.

While this is awkward to express in a language's grammar, it can choose
to list the kinds of expressions that /are/ allowed to be statements,
rather than leave it to the whim of an implemenation. (The ones that
aren't allowed would be a much bigger, unlimited set.)

For example:

E(...); // function call
++E; // increment
E = E; // assigment (and compound assignment)

E is any expression term. Here, the call/increment/assignment is the
top-level AST mode.

(I do this in my stuff, and there I can override the restriction using
'eval': eval a + b, which turns it into an allowed form.

Mainly this is for convenience of testing, but it was also used to
ensure an expression ended up in the primary register for subsequent
inline assembly.)

For I/O, the equivalent of printf is a procedure.� In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).� In Ada, an error in the
equivalent Put_Line("Hello, world") raises an exception, which
can't easily be ignored.

Both approaches are valid.

Indeed they are.

Distinguishing between function and procedure is incredibly rare in
modern languages. There the preoccupation seems to be to unify
everything: everything is a function, even if-statements and loops.
Every function is a closure, etc. I do not consider that useful.

It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour.� (A "pure procedure" would not do anything.)
As far as I remember, Pascal does not make that distinction.

This goes the other way and is a better idea!

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Wednesday, June 10, 2026 03:17:41

David Brown <david.brown@hesbynett.no> writes:

On 10/06/2026 00:34, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

"42" is an expression of type "int", and so is 'printf("Hello\n")'.
How (and why) would a language distinguish between them and allow one
but not the other?

[...]
Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.

I don't know enough about Ada to be sure, but Pascal does not do this
- see below.

You seem to disagree with me, but then you describe most of what
I wrote. I'm not sure where you disagree, or where our signals
got crossed.

Ada and Pascal don't have expression statements. The Pascal
(writeln(...)) and Ada (Put_Line(...)) constructs most similar
to C's printf("Hello\n") are procedure calls. 42 can't made into
a statement by adding a semicolon. Neither can any function call.
But a procedure call can. That's how and why Pascal and Ada allow
one but not the other. (And both languages deliberately make it
awkward to ignore the value returned by a function.)

In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.

Sure. But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/. A "print" function in Pascal that returned
the number of characters printed would be a function, used in an
expression, not a procedure used in a statement.

Right, and a Pascal function that prints its argument and returns an
integer value could not be used by itself as a statement.

The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this. What cannot easily
be done in a clear and consistent way is to distinguish between two expressions of type "int" (or any other general non-void type).

Right. Which is why the I/O and similar subroutines that you'd want to
use as statements are procedures, not functions.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wednesday, June 10, 2026 13:29:13

On 10/06/2026 12:10, Bart wrote:

On 10/06/2026 08:04, David Brown wrote:

On 10/06/2026 00:34, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

"42" is an expression of type "int", and so is 'printf("Hello\n")'.
How (and why) would a language distinguish between them and allow one
but not the other?

[...]

Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.

I don't know enough about Ada to be sure, but Pascal does not do this
- see below.

In both languages, functions and procedures are distinct.� Functions
return values; procedures do not.� An expression cannot be turned
into a statement just by adding a semicolon.� A function call is
an expression.� A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.

Sure.� But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/.� A "print" function in Pascal that returned
the number of characters printed would be a function, used in an
expression, not a procedure used in a statement.

The rough equivalent of the distinction between Pascal procedures and
functions is that procedures are like C functions that have "void"
return type.� It's fine (and not at all a bad idea) for a language to
distinguish between void and non-void like this.� What cannot easily
be done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).

In C, an expression statement "expr;" causes the expression to be
evaluated as a void expression for its side effects (?6.8.4p2).

In C201x draft. 6.8.4p2 is about selection statements.

C23 is the latest C standard, so that was what I was using (n3220.pdf).
It is unfortunate that C23 has slightly different numbers for some
sections - the standards authors have previously managed a higher
consistency between versions. Section 6.8.3p2 is the number for C11 (as
you have probably found already).

� You can, arguably, say that C also requires all statements to be of
"void" type, just like Pascal - but the cast-to-void is done
implicitly to treat "expr;" as "(void) expr;".

That's not quite the same thing. If I write:

�� int a;

(Just to be clear that we agree - "int a;" is a declaration, not a
statement, expression, or expression statement.)

�� a;

then gcc -Wall will report a warning. But write it as (void)a, then it doesn't.

Yes. But that's a matter of warnings and conventional idioms, not the C language. "a;" and "(void) a;" both mean the same thing in the C
language. gcc, like many compilers, has warnings on unused variables
and parameters, and set-but-unused variables, as these are often the
result of mistakes in the code. And tools that have such warnings have
ways to mark intentionally unused variables and parameters - such as __attribute__(("unused")) or C23's "[[maybe_unused]]". A common idiom
is that casting an expression or variable to void tells the compiler
that you know the variable or parameter is unused, and only evaluated
for its side-effects (if any).

While this is awkward to express in a language's grammar, it can choose
to list the kinds of expressions that /are/ allowed to be statements,
rather than leave it to the whim of an implemenation. (The ones that
aren't allowed would be a much bigger, unlimited set.)

Yes, a language could do that. In C, the language chooses to allow expressions of any type - that's the simplest to express!

For example:

�� E(...);�� // function call
�� ++E;�� // increment
�� E = E;�� // assigment (and compound assignment)

E is any expression term. Here, the call/increment/assignment is the top-level AST mode.

(I do this in my stuff, and there I can override the restriction using 'eval': eval a + b, which turns it into an allowed form.

Mainly this is for convenience of testing, but it was also used to
ensure an expression ended up in the primary register for subsequent
inline assembly.)

A better choice for a language that wanted to restrict the kinds of expressions that can be used as statements would be to do as Pascal does
- allow only what C would consider "void" expressions as statements, and
make things like assignment void expressions. Saying that "x = 1" is an expression of type "int" that can be used as a statement while "x + 1"
is an expression of type "int" that cannot be used as a statement would
likely require significant complication in the language rules to work
well. Saying that "x = 1" is a void expression and can therefore be
used as a statement, while "x + 1" is a non-void expression and can
therefore not be used as a statement, is simple and clear. The cost -
or the benefit, depending on your viewpoint and preferences - is that it
is no longer possible to write "x = y = 1" or "while (x = read())...".

For I/O, the equivalent of printf is a procedure.� In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).� In Ada, an error in the
equivalent Put_Line("Hello, world") raises an exception, which
can't easily be ignored.

Both approaches are valid.

Indeed they are.

Distinguishing between function and procedure is incredibly rare in
modern languages. There the preoccupation seems to be to unify
everything: everything is a function, even if-statements and loops.
Every function is a closure, etc. I do not consider that useful.

Fair enough. There are pros and cons to any such choices.

It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour.� (A "pure procedure" would not do
anything.) As far as I remember, Pascal does not make that distinction.

This goes the other way and is a better idea!

I personally think the "purity" of a function/procedure is a more
important distinction than whether or not it evaluates to a non-void.
But it is hard to see how it would work well in a compiled imperative
language - an emphasis on pure functions is more the domain of function programming languages. But a discussion on that would be more for comp.lang.misc than comp.lang.c

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Wednesday, June 10, 2026 13:43:01

On 10/06/2026 12:17, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 10/06/2026 00:34, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

"42" is an expression of type "int", and so is 'printf("Hello\n")'.
How (and why) would a language distinguish between them and allow one
but not the other?

[...]
Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.

I don't know enough about Ada to be sure, but Pascal does not do this
- see below.

You seem to disagree with me, but then you describe most of what
I wrote. I'm not sure where you disagree, or where our signals
got crossed.

It was most likely a misunderstanding or misinterpretation of what you
wrote - or what I wrote in the earlier post. We agree on how Pascal
(and, AFAIUI, Ada) work, and we can let that stand as a clarification
rather than risk yet another endless thread on the details of exactly
what words were used.

Ada and Pascal don't have expression statements. The Pascal
(writeln(...)) and Ada (Put_Line(...)) constructs most similar
to C's printf("Hello\n") are procedure calls. 42 can't made into
a statement by adding a semicolon. Neither can any function call.
But a procedure call can. That's how and why Pascal and Ada allow
one but not the other. (And both languages deliberately make it
awkward to ignore the value returned by a function.)

Agreed.

In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.

Sure. But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/. A "print" function in Pascal that returned
the number of characters printed would be a function, used in an
expression, not a procedure used in a statement.

Right, and a Pascal function that prints its argument and returns an
integer value could not be used by itself as a statement.

Agreed.

The rough equivalent of the distinction between Pascal procedures and
functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to
distinguish between void and non-void like this. What cannot easily
be done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).

Right. Which is why the I/O and similar subroutines that you'd want to
use as statements are procedures, not functions.

That is often the case, but is certainly not required by the language.
Even standard functions can have side-effects (like "random"), though idiomatic Pascal typically uses procedures where the results are
obtained by passing result variables by reference.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Wednesday, June 10, 2026 12:36:28

In article <110a5vr$b2kq$5@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]

The committee has decided otherwise. The committee's resolution to DR
109 said:

"A conforming implementation must not fail to translate a strictly
conforming program simply because some possible execution of that
program would result in undefined behavior. Because foo might never be
called, the example given must be successfully translated by a
conforming implementation."

https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_109.html

[...]

That does appear to settle the matter definitively, thanks.

Ok, I was wrong and I concede that the program we've been
discussing is strictly conforming, regardless of however
antagnostic a reader of the standard may be.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Wednesday, June 10, 2026 14:37:01

In article <110a34q$b2kq$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:

```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}

void
foo(void)
{
printf("never called\n");
}
```

The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.

[...]

That doesn't make sense to me. Do you have a citation to this incident,

Yes: https://godbolt.org/z/d1WP4KP99

There was such an outcry when this was discovered that the C++
standard was modified to add a note explicitly allowing,
"trivial infinite loops, which cannot be removed or reordered." https://eel.is/c++draft/intro.progress

That change is commit 29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e (https://github.com/cplusplus/draft/commit/29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e)
in response to P2809: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2809r3.html

and is it relevant to C?

Here's a C version with the same behavior:

```
term% cat weird.c
#include <stdio.h>

int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}

void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```

There is a special rule in C about implementations being allowed
to assume that an infinite loop terminates (N3220 6.8.6.1p4),

The program above meets the criteria in sec 6.8.6.1 para 4 that
allows an implementation to assume that the loop terminates.
Godbolt link: https://godbolt.org/z/q46o5cYGM

but (a) it wouldn't apply to this case, and (b) even if it did,
it wouldn't imply that an implicit call to foo would be permitted.
I can imagine an argument that the program has undefined behavior
and therefore it could print "never called" or "nasal demons",
but I'd have to see the argument.

Regehr aluded to this with his taxonomy of undefined functions.
For a function that is always undefined (a "Type 3" function), a
compiler is under no obligation to even produce a return
instruction for it, and the behavior of a call to such a
function is totally undefined. Nothing stops it from cascading
into whatever the linker happens to put after it.

Therefore, given UB, it is not necessary to have a reference to
some function in a program's source text in order for it to be
executed.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Wednesday, June 10, 2026 18:30:53

In article <110bsqd$9ab$1@reader1.panix.com>,
Dan Cross <cross@spitfire.i.gajendra.net> wrote:

In article <110a34q$b2kq$2@kst.eternal-september.org>,
[snip]
Here's a C version with the same behavior:

```
term% cat weird.c
#include <stdio.h>

int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}

void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```

Replying to myself here, but...this is another example of weird
behavior:

```
term% cat boo.c
#include <limits.h>

int
monstartup(void)
{
return INT_MAX + 1;
}

int
main(void)
{
return 0;
}
term% clang --version | sed 1q
FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2)
term% clang -Wall -Wextra -pedantic -pedantic-errors -pg -fsanitize=undefined -o boo boo.c
boo.c:6:17: warning: overflow in expression; result is -2'147'483'648 with type 'int' [-Winteger-overflow]
6 | return INT_MAX + 1;
| ~~~~~~~~^~~
1 warning generated.
term% ./boo
boo.c:6:17: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior boo.c:6:17
term%
```

(I admit that I am cheating a bit, but I claim that this program
is strictly conforming.)

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Wednesday, June 10, 2026 14:08:52

David Brown <david.brown@hesbynett.no> writes:
[...]

In C, an expression statement "expr;" causes the expression to be
evaluated as a void expression for its side effects (?6.8.4p2). You
can, arguably, say that C also requires all statements to be of "void"
type, just like Pascal - but the cast-to-void is done implicitly to
treat "expr;" as "(void) expr;".

[...]

In an expression statement, the expression is "evaluated as a void
expression for its side effects". I think that's equivalent to
convert (not casting!) it to void, but the standard doesn't describe
it that way.

6.3.2.2: "If an expression of any other type [other than void]
is evaluated as a void expression, its value or designator is
discarded."

But statements have no type.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Wednesday, June 10, 2026 14:47:10

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110a34q$b2kq$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:

```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}

void
foo(void)
{
printf("never called\n");
}
```

The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.

[...]

That doesn't make sense to me. Do you have a citation to this incident,

Yes: https://godbolt.org/z/d1WP4KP99

There was such an outcry when this was discovered that the C++
standard was modified to add a note explicitly allowing,
"trivial infinite loops, which cannot be removed or reordered." https://eel.is/c++draft/intro.progress

That change is commit 29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e (https://github.com/cplusplus/draft/commit/29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e)
in response to P2809: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2809r3.html

So the reason the behavior was conforming was that the behavior of
the infinite loop is undefined. I dislike the way the C++ standard
expresses this. It says "The implementation *may assume* that any
thread will eventually do one of the following" (emphasis added).
More on that later in the context of the similar C rule.

and is it relevant to C?

Here's a C version with the same behavior:

```
term% cat weird.c
#include <stdio.h>

int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}

void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```

There is a special rule in C about implementations being allowed
to assume that an infinite loop terminates (N3220 6.8.6.1p4),

The program above meets the criteria in sec 6.8.6.1 para 4 that
allows an implementation to assume that the loop terminates.
Godbolt link: https://godbolt.org/z/q46o5cYGM

Right. ("for (;;);" in the original program does not.)

Note that the C++ special rule applies only when the condition is
equivalent to a constant `true` and the body of the loop is empty.
An implementation can "assume" that any other loop will eventually
finish.

The rule in C is (6.8.6.1p4):

An iteration statement may be assumed by the implementation
to terminate if its controlling expression is not a constant
expression, and none of the following operations are performed
in its body, controlling expression or (in the case of a for
statement) its expression-3
? input/output operations
? accessing a volatile object
? synchronization or atomic operations.

`for (;;)` is treated as having a constant controlling expression.

This covers more cases than the C++ rule.

I dislike it for most of the same reasonss. It should be phrased
in terms of the permitted behavior of a program, not what an
implementation is allowed to "assume".

In addition to that, I dislike the whole idea. I think it's
intended to enable optimizations, but it means that for this
contrived program:

#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}

the implementation is allowed to "assume" that the loop eventually
terminates. It's not clear what permissions the implementation is being
given if the assumption is violated. I think the program could legally
print "never reached", but if violating the assumption implies undefined behavior it could do anything.

A programmer could easily write a program similar to the above
and think that the meaning is perfectly clear, have it behave very
differently because of one obscure subclause in the standard.

but (a) it wouldn't apply to this case, and (b) even if it did,
it wouldn't imply that an implicit call to foo would be permitted.
I can imagine an argument that the program has undefined behavior
and therefore it could print "never called" or "nasal demons",
but I'd have to see the argument.

Regehr aluded to this with his taxonomy of undefined functions.
For a function that is always undefined (a "Type 3" function), a
compiler is under no obligation to even produce a return
instruction for it, and the behavior of a call to such a
function is totally undefined. Nothing stops it from cascading
into whatever the linker happens to put after it.

Therefore, given UB, it is not necessary to have a reference to
some function in a program's source text in order for it to be
executed.

Of course. Given UB, anything can happen. There's nothing special
about a function that's never called in that context. It just
happens to be the way it showed up in the C++ incident.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Wednesday, June 10, 2026 14:55:00

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

Replying to myself here, but...this is another example of weird
behavior:

```
term% cat boo.c
#include <limits.h>

int
monstartup(void)
{
return INT_MAX + 1;
}

int
main(void)
{
return 0;
}

[SNIP]

(I admit that I am cheating a bit, but I claim that this program
is strictly conforming.)

I agree that the program is strictly conforming.

I don't know the details, but I think "monstartup" is a special name,
and that the program would behave as expected if a different name
were used. Since "monstartup" is not reserved, an implementation
that visibly treats it specially is not conforming.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Tim Rentsch@3:633/10 to All on Wednesday, June 10, 2026 15:11:46

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86tsrc8d0b.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[...]

The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
[...]

This is comp.lang.c. My comments were only about C, and not
about C++. But of course you already knew that.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Wednesday, June 10, 2026 22:44:26

In article <86ldcm82ql.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <86tsrc8d0b.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[...]

The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.

Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
[...]

This is comp.lang.c. My comments were only about C, and not
about C++. But of course you already knew that.

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning 1 warning generated.
term% ./what
Hello, World!
term%
```

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Wednesday, June 10, 2026 16:19:34

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^ what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220
6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

Of course since the behavior is undefined, *anything* could happen.
I don't know what happened inside clang (or the minds of its
maintainers) that caused it to generate code that executes a
statement in the body of a function that's never called, but that's
just one of the infinitely many allowed behaviors. A quick look at the generated code indicates that there's no x86-64 "retq" instruction
for either main() or hello(), and apparently control falls through
from the end of main() to the body of hello(). That seems weird.

It might just be a bug (but not one that, as far as I can tell,
violates the C standard).

A function whose body contains a construct that would have undefined
behavior if the function were called (not the case here) does not
cause undefined behavior if there are no calls to the function.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Wednesday, June 10, 2026 23:32:47

In article <110cmfk$116qm$3@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

Replying to myself here, but...this is another example of weird
behavior:

```
term% cat boo.c
#include <limits.h>

int
monstartup(void)
{
return INT_MAX + 1;
}

int
main(void)
{
return 0;
}

[SNIP]

(I admit that I am cheating a bit, but I claim that this program
is strictly conforming.)

I agree that the program is strictly conforming.

I don't know the details, but I think "monstartup" is a special name,
and that the program would behave as expected if a different name
were used. Since "monstartup" is not reserved, an implementation
that visibly treats it specially is not conforming.

That's why it's cheating: `monstartup` is a function called from
the C runtime when using the `gprof` profiler, before `main` is
called, and I just happen to know that the csu code will call a
function by that name if compiled with profiling enabled. Thus,
this program can tickle the UB in `monstartup` in some weird
configurations. This is outside of the domain of strictly
defined C, but it is the sort of thing that happens in the real
world. Caveat emptor.

- Dan C.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thursday, June 11, 2026 08:56:25

On 10/06/2026 23:47, Keith Thompson wrote:

Right. ("for (;;);" in the original program does not.)

Note that the C++ special rule applies only when the condition is
equivalent to a constant `true` and the body of the loop is empty.
An implementation can "assume" that any other loop will eventually
finish.

The rule in C is (6.8.6.1p4):

An iteration statement may be assumed by the implementation
to terminate if its controlling expression is not a constant
expression, and none of the following operations are performed
in its body, controlling expression or (in the case of a for
statement) its expression-3
? input/output operations
? accessing a volatile object
? synchronization or atomic operations.

`for (;;)` is treated as having a constant controlling expression.

This covers more cases than the C++ rule.

I dislike it for most of the same reasonss. It should be phrased
in terms of the permitted behavior of a program, not what an
implementation is allowed to "assume".

In addition to that, I dislike the whole idea. I think it's
intended to enable optimizations, but it means that for this
contrived program:

#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}

the implementation is allowed to "assume" that the loop eventually terminates. It's not clear what permissions the implementation is being given if the assumption is violated. I think the program could legally
print "never reached", but if violating the assumption implies undefined behavior it could do anything.

A programmer could easily write a program similar to the above
and think that the meaning is perfectly clear, have it behave very differently because of one obscure subclause in the standard.

The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."

The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such as
the compiler seeing that the "keep_going" variable is not volatile and
its value is never used, so assignments to it can be elided, or moving
other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result of
bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

Equally, I don't think it is likely that compilers will often be able to
use this rule to improve code generation - it would only help in a
situation where the loop's controlling expression is too complicated for
the compiler to be sure that it will terminate, but where the loop body
ends up effectively empty. I doubt if that turns up often in real code either.

So while I agree that this kind of thing can lead to curiosities and
behaviour that seems counter-intuitive, and is popular with the "modern compilers are evil" crowd, I really do not see it as an issue in
practice. There are many other mistakes programmers can make, or UB
that they hit accidentally - this is a drop in the ocean IMHO.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thursday, June 11, 2026 09:10:29

On 10/06/2026 23:08, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

In C, an expression statement "expr;" causes the expression to be
evaluated as a void expression for its side effects (?6.8.4p2). You
can, arguably, say that C also requires all statements to be of "void"
type, just like Pascal - but the cast-to-void is done implicitly to
treat "expr;" as "(void) expr;".

[...]

In an expression statement, the expression is "evaluated as a void
expression for its side effects". I think that's equivalent to
convert (not casting!) it to void, but the standard doesn't describe
it that way.

Agreed (I also agree on the correction of terminology).

6.3.2.2: "If an expression of any other type [other than void]
is evaluated as a void expression, its value or designator is
discarded."

But statements have no type.

Correct.

I did not mean to suggest that statements in C actually have a type, and
that their type is "void". It was a philosophical wandering - I was not trying to stay true to the grammar and terminology of either the C or
Pascal language standards.

What I meant was that if you were to think that statements /did/ have
type void, the resulting language would be basically the same. It gives
a way to think about C and Pascal that shows that though they appear to
have a different model of statements and expressions, they are
fundamentally similar - the distinction being that C has an explicit conversion to void when non-void expressions are used in a statement
context.

--- PyGate Linux v1.5.15
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Thursday, June 11, 2026 11:38:35

In article <110dm6p$17r3s$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

Right. ("for (;;);" in the original program does not.)

Note that the C++ special rule applies only when the condition is
equivalent to a constant `true` and the body of the loop is empty.
An implementation can "assume" that any other loop will eventually
finish.

The rule in C is (6.8.6.1p4):

An iteration statement may be assumed by the implementation
to terminate if its controlling expression is not a constant
expression, and none of the following operations are performed
in its body, controlling expression or (in the case of a for
statement) its expression-3
? input/output operations
? accessing a volatile object
? synchronization or atomic operations.

`for (;;)` is treated as having a constant controlling expression.

This covers more cases than the C++ rule.

I dislike it for most of the same reasonss. It should be phrased
in terms of the permitted behavior of a program, not what an
implementation is allowed to "assume".

In addition to that, I dislike the whole idea. I think it's
intended to enable optimizations, but it means that for this
contrived program:

#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}

the implementation is allowed to "assume" that the loop eventually
terminates. It's not clear what permissions the implementation is being
given if the assumption is violated. I think the program could legally
print "never reached", but if violating the assumption implies undefined
behavior it could do anything.

A programmer could easily write a program similar to the above
and think that the meaning is perfectly clear, have it behave very
differently because of one obscure subclause in the standard.

The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."

The loop might originally have contained source code, but become empty >through pre-processing, or from other compiler transformations (such as
the compiler seeing that the "keep_going" variable is not volatile and
its value is never used, so assignments to it can be elided, or moving
other things outside the loop body).

I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,

#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif

...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}

If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result of
bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

Equally, I don't think it is likely that compilers will often be able to
use this rule to improve code generation - it would only help in a
situation where the loop's controlling expression is too complicated for
the compiler to be sure that it will terminate, but where the loop body
ends up effectively empty. I doubt if that turns up often in real code >either.

So while I agree that this kind of thing can lead to curiosities and >behaviour that seems counter-intuitive, and is popular with the "modern >compilers are evil" crowd, I really do not see it as an issue in
practice. There are many other mistakes programmers can make, or UB
that they hit accidentally - this is a drop in the ocean IMHO.

As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.

Personally, I don't think C should be in the business of doing
such things. But it is what it is.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Thursday, June 11, 2026 11:50:04

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c
what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Of course since the behavior is undefined, *anything* could happen.
I don't know what happened inside clang (or the minds of its
maintainers) that caused it to generate code that executes a
statement in the body of a function that's never called, but that's
just one of the infinitely many allowed behaviors. A quick look at the >generated code indicates that there's no x86-64 "retq" instruction
for either main() or hello(), and apparently control falls through
from the end of main() to the body of hello(). That seems weird.

Here's a slightly better version of `what.c` (that removes the
annoying "loop is body, move the semicolon to the next line"
warning):

```
#include <stdio.h>
int main(void) { unsigned int k = 0; while (k != 1) k += 2; return 0; }
void hello(void) { printf("Hello, World!\n"); }
```

I think the reasoning goes something like this: in optimization
phase $n$, the compiler determines that `k` can never be 1, and
thus the loop does not terminate, and therefore, `return 0;` is
inaccessible, so it's removed. Then, in phase $n + k$, for

0$, it applies the rules of sec 6.8.6 para 4, assumes that

the loop must terminate, and therefore can be removed, and
removes it. The `return` is already gone. So what you're left
with is an label that just cascades into whatever is next in
object code; that just happens to be `hello`.

It might just be a bug (but not one that, as far as I can tell,
violates the C standard).

It's known. It was known when first reported a couple of years
ago in the C++ context, and I suspect they know about it now. I
can ask someone who works on LLVM. I suspect the reasoning will
be that this is important to guarantee forward progress, and
that they can't solve the halting problem, therefore such loops
can be removed. If that causes your program to do something
weird, then, well, don't do that.

A function whose body contains a construct that would have undefined
behavior if the function were called (not the case here) does not
cause undefined behavior if there are no calls to the function.

True, but irrelevant to the point I was making, which is that UB
can induce a "call" to a function, even without a reference to
it appearing in the source text.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Thursday, June 11, 2026 14:05:28

On 11/06/2026 13:38, Dan Cross wrote:

In article <110dm6p$17r3s$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

Right. ("for (;;);" in the original program does not.)

Note that the C++ special rule applies only when the condition is
equivalent to a constant `true` and the body of the loop is empty.
An implementation can "assume" that any other loop will eventually
finish.

The rule in C is (6.8.6.1p4):

An iteration statement may be assumed by the implementation
to terminate if its controlling expression is not a constant
expression, and none of the following operations are performed
in its body, controlling expression or (in the case of a for
statement) its expression-3
? input/output operations
? accessing a volatile object
? synchronization or atomic operations.

`for (;;)` is treated as having a constant controlling expression.

This covers more cases than the C++ rule.

I dislike it for most of the same reasonss. It should be phrased
in terms of the permitted behavior of a program, not what an
implementation is allowed to "assume".

In addition to that, I dislike the whole idea. I think it's
intended to enable optimizations, but it means that for this
contrived program:

#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}

the implementation is allowed to "assume" that the loop eventually
terminates. It's not clear what permissions the implementation is being >>> given if the assumption is violated. I think the program could legally
print "never reached", but if violating the assumption implies undefined >>> behavior it could do anything.

A programmer could easily write a program similar to the above
and think that the meaning is perfectly clear, have it behave very
differently because of one obscure subclause in the standard.

The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."

The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such as
the compiler seeing that the "keep_going" variable is not volatile and
its value is never used, so assignments to it can be elided, or moving
other things outside the loop body).

I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,

#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif

...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}

If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.

I don't know about "original intent" - I was quoting a footnote in the C standard, but I have not done any research like reading through the
rationale documents.

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result of
bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

Equally, I don't think it is likely that compilers will often be able to
use this rule to improve code generation - it would only help in a
situation where the loop's controlling expression is too complicated for
the compiler to be sure that it will terminate, but where the loop body
ends up effectively empty. I doubt if that turns up often in real code
either.

So while I agree that this kind of thing can lead to curiosities and
behaviour that seems counter-intuitive, and is popular with the "modern
compilers are evil" crowd, I really do not see it as an issue in
practice. There are many other mistakes programmers can make, or UB
that they hit accidentally - this is a drop in the ocean IMHO.

As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.

Personally, I don't think C should be in the business of doing
such things. But it is what it is.

- Dan C.

I agree there. It is up to programmers to write useful programs - I
don't think it makes sense for a language standard to say that programs
have to either do something observable, or get out of the way and don't
block something else from being useful. But I have difficulty seeing
that this rule in the C standards would make much real-world difference
one way or the other.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 11, 2026 16:49:09

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal specifications.

Janis

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Thursday, June 11, 2026 15:20:01

In article <110eht5$1naub$5@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal >specifications.

One hopes that a formal specification (that's a term of art, and
implies something that's mathematically precise) would be
accompanied by a commentary for more casual reading. However,
the truly precise, formal specification would be considered
definitive.

I think the odds of this ever happening for C are slim to none,
but it would be useful.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 11, 2026 17:34:35

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
�� bool keep_going = true;
�� while (keep_going) {
�� keep_going = true;
�� }
�� puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become empty through pre-processing, or from other compiler transformations (such as
the compiler seeing that the "keep_going" variable is not volatile and
its value is never used, so assignments to it can be elided, or moving
other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.� But is it likely?

I think we should not make any assumptions about the "creativity" of a programmer ("C" or else). - Semantics should be well defined, and then
clear to the programmer.

In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result of
bugs in the code that accidentally run forever.� If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

[...]

So while I agree that this kind of thing can lead to curiosities and behaviour that seems counter-intuitive, and is popular with the "modern compilers are evil" crowd, I really do not see it as an issue in
practice.� There are many other mistakes programmers can make, or UB
that they hit accidentally - this is a drop in the ocean IMHO.

Languages shall be sensibly and clearly defined. For bad designs (or
bad standards) the language or standard should be blamed, and not the
critics badly and inappropriately despised as ''"modern compilers are
evil" crowd''. - Programmers are at the final end of the "food chain".
And there's a lot of horrible pits in the C-language where programmers
"made the mistake" to fall in; don't blame them, neither the ones who
silently suffer nor the ones who shout out.

Janis

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 11, 2026 17:45:31

On 2026-06-10 16:37, Dan Cross wrote:

[...]

Here's a C version with the same behavior:

```
term% cat weird.c
#include <stdio.h>

int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}

void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```

Wow, that's really fascinating! (In a bad sense.)

And (in clang) just an effect of the '-O1' (as I notice).

I may have missed the "programming language design" wisdom of the
past decades. Back then we had the conception that "optimization"
is a method to transform a program to a _functionally equivalent_
code (one that is faster, requires less memory, or some such).

Janis

[...]

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 11, 2026 18:08:39

On 2026-06-11 17:20, Dan Cross wrote:

In article <110eht5$1naub$5@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.

One hopes that a formal specification (that's a term of art, and
implies something that's mathematically precise) would be
accompanied by a commentary for more casual reading.

Commentaries generally make sense, and they are one possibility
to serve the needs also of programmers. But a more formal text
would also help the authors of textbooks to provide a clearer
description for those programmers that are repelled by standards
papers.

However,
the truly precise, formal specification would be considered
definitive.

Yes. (That's what I intended to express.)

I think the odds of this ever happening for C are slim to none,
but it would be useful.

I agree. (And I don't wait for that; I'm taking "C" as it is.)

Janis

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 11, 2026 20:12:32

On 2026-06-10 00:34, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

"42" is an expression of type "int", and so is 'printf("Hello\n")'.
How (and why) would a language distinguish between them and allow one
but not the other?

[...]

Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.

Right.

What I'm not sure about is the predominance of "these" or "those"
languages. - Is that clear distinction of procedures and function
the typical case, or are the "C-derived" languages predominant and
languages with a clear distinction (meanwhile?) just outliers?

There's of course also other languages that distinguish procedures
from functions "only" by the 'void' "return type", but are anyway
able to diagnose the appropriate context and emit error messages
when inappropriately used.

In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.

For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).

Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)

Janis

[...]

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 11, 2026 20:29:16

On 2026-06-10 09:04, David Brown wrote:

[...]

The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
return type.� It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this.� What cannot easily be
done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).

Here I cannot follow you. - The C-compiler can analyze code to do
optimizations and even (as so often stated) "assume" things about
the intent concerning UB and optimization but cannot value facts
about types and context? - If so, then it sounds rather arbitrary.

[...]

It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour.� (A "pure procedure" would not do anything.)

By "would not do anything" you probably mean that it would not have side-effects on/with relatively global entities in the program?

As far as I remember, Pascal does not make that distinction.

Pascal functions and procedures can affect and be affected by global
entities. Predefined functions and procedures can have side effects
also unrelated to global entities in the program (e.g. print effect).
A procedure/function not affecting the global (or surrounding stack) environment could likely be identified. But here we're anyway talking
about the (clean!) return-interface of functions (as opposed to the procedures).

Janis

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Thursday, June 11, 2026 20:52:30

On 2026-06-11 18:30, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.

You sniped most of what I wrote.

Yes, because I acknowledged it by my above on-line remark already
(and I didn't want to waste space unnecessarily). (No offense!)

I intended to comment just on the one paragraph above, with its
assumption that it may be an inherent problem to programmers.

To elaborate only a bit more...
There's folks who have problems with "lawyer's speech" standards.
There's folks who have problems with formal mathematical standards.
But, as to my observation, there's *no* strict or natural hierarchy
that one would imply the other.

You said: "They already struggle with current standard text."
as if there would be a strict "one implies the other" fact; there
isn't one, or to be more cautious, "there isn't necessarily one".
(I used the wording "necessarily" already in my original comment.)

I certainly would prefer standard
that is less lawyerish and more mathematical, say written in similar
way to Pascal standard. But there is a _big_ gap between normal
mathematical text and a formal mathematical text (and let me note that
Pascal standard is less formal than normal mathematics).

I agree.

Normal
mathematical text depends on human understanding to disambiguate
and bridge small inconsistencies. Formal one has parts which
are there only because authors were not able to avoid
ambiguity in simpler way. And once things are written in a way
that is well fit to formalizm they tend to be much less
understandable to uninitiated.

(I'll leave that uncommented. - I've said all I intended to say.)

Janis

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Thursday, June 11, 2026 15:13:09

On 2026-06-11 14:12, Janis Papanagnou wrote:

On 2026-06-10 00:34, Keith Thompson wrote:

...

For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).

Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)

Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 11, 2026 13:29:10

David Brown <david.brown@hesbynett.no> writes:
[...]

The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."

The loop might originally have contained source code, but become empty through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result
of bugs in the code that accidentally run forever. If the loop is accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

How about a loop that has a non-constant condition, but that is
not expected to terminate in normal usage?

while (! something_really_bad_happened()) {
sleep(1);
}
self_destruct();

A compiler could "assume" that the loop terminates, even if something_really_bad never happens, and that assumption could result in
a call to self_destruct(). There are probably better ways to do that,
but it's straightforward code with seemingly obvious semantics that
an implementation is permitted to make unwarrated assumptions about.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Friday, June 12, 2026 00:37:03

On 2026-06-11 21:13, James Kuyper wrote:

On 2026-06-11 14:12, Janis Papanagnou wrote:

On 2026-06-10 00:34, Keith Thompson wrote:

...

For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).

Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)

Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?

Oh, actually I indeed thought that printing a constant string would not
create any error that would then be indicated by printf's return value.

I'd indeed also expected that, say, printing a string value with a '%d' specifier would produce an error, but I saw that it doesn't; while the
compiler creates just a warning, execution provides some random output
and a _non-negative_ string-length value as printf's return value. Not
exactly what I'd expect from a language.

Concerning the "guarantees" that you're asking for I sadly have to say
that I meanwhile expect nothing sensible at all any more from "C". ;-)

But to be more serious again...

The man-page is very unspecific on that; 'man 3 printf' says:
"If an output error is encountered, a negative value is returned."

Now of course an error can occur with that simple 'printf' above, for
example, by issuing an 'fclose (stdout);' before the 'printf (...);'
But what can I as a C-programmer derive from that; how would one act
on that. (That's just rhetorical.)

Obviously (because of that?) I've never seen anyone test such a call
by, say,

int rc = printf("Hello, world\n");
if (rc < 0) {
/* umm.. */
}

Are you - plural, all CLC audience - writing such code with 'printf()', honestly? - Same question with 'int rc = fclose (...);' - what can one
do about that, then? (Write a logfile entry, maybe? - and then?)

But yes, I'm aware of negative OS function or library function output.

Our rules (back in my C/C++ days) suggested to catch any sensible and
possible error indications to quickly localize any potential issues.

Janis

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 11, 2026 15:38:41

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,

#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif

...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}

If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.

I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of
the loop, and a compiler can elide the entire loop anyway.

[...]

As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.

Personally, I don't think C should be in the business of doing
such things. But it is what it is.

I agree.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thursday, June 11, 2026 23:05:17

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-11 21:13, James Kuyper wrote:

On 2026-06-11 14:12, Janis Papanagnou wrote:

On 2026-06-10 00:34, Keith Thompson wrote:

...

For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).

Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)

Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?

Oh, actually I indeed thought that printing a constant string would not >create any error that would then be indicated by printf's return value.

The manual page also notes for the cases where printf returns -1:

For the conditions under which [CX] [Option Start] dprintf(), [Option End] fprintf(),
and printf() fail and may fail, refer to fputc() or fputwc().

In addition, all forms of fprintf() shall fail if:

[EILSEQ]
[CX] [Option Start] A wide-character code that does not correspond to a valid character has been detected. [Option End]
[EOVERFLOW]
[CX] [Option Start] The value to be returned is greater than {INT_MAX}. [Option End]

[CX] [Option Start] The asprintf() function shall fail if:

[ENOMEM]
Insufficient storage space is available.

The dprintf() function may fail if:

[EBADF]
The fildes argument is not a valid file descriptor.

[Option End]

The [CX] [Option Start] dprintf(), [Option End] fprintf(), and printf() functions may fail if:

[ENOMEM]
[CX] [Option Start] Insufficient storage space is available. [Option End]

The fputc(3) errors:

ERRORS

The fputc() function shall fail if either the stream is unbuffered or the stream's buffer needs to be flushed, and:

[EAGAIN]
[CX] [Option Start] The O_NONBLOCK flag is set for the file descriptor underlying stream and the thread would be delayed in the write operation. [Option End]
[EBADF]
[CX] [Option Start] The file descriptor underlying stream is not a valid file descriptor open for writing. [Option End]
[EFBIG]
[CX] [Option Start] An attempt was made to write to a file that exceeds the maximum file size. [Option End]
[EFBIG]
[CX] [Option Start] An attempt was made to write to a file that exceeds the file size limit of the process.
[Option End] [XSI] [Option Start] A SIGXFSZ signal shall also be generated for the thread. [Option End]
[EFBIG]
[CX] [Option Start] The file is a regular file and an attempt was made to write at or beyond the offset maximum. [Option End]
[EINTR]
[CX] [Option Start] The write operation was terminated due to the receipt of a signal, and no data was transferred. [Option End]
[EIO]
[CX] [Option Start] A physical I/O error has occurred, or the process is a member of a background process group attempting to write to its controlling terminal, TOSTOP is set, the calling thread is not blocking SIGTTOU, the process is not ignoring SIGTTOU, and the process group of the process is orphaned. This error may also be returned under implementation-defined conditions. [Option End]
[ENOSPC]
[CX] [Option Start] There was no free space remaining on the device containing the file. [Option End]
[EPIPE]
[CX] [Option Start] An attempt is made to write to a pipe or FIFO that is not open for reading by any process. A SIGPIPE signal shall also be sent to the thread. [Option End]

The fputc() function may fail if:

[ENOMEM]
[CX] [Option Start] Insufficient storage space is available. [Option End]
[ENXIO]
[CX] [Option Start] A request was made of a nonexistent device, or the request was outside the capabilities of the device. [Option End]

The '[Option start]' '[Option end]' tags describe behavior aligned with the C standard.

https://pubs.opengroup.org/onlinepubs/9799919799/functions/fputc.html https://pubs.opengroup.org/onlinepubs/9799919799/functions/printf.html

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Thursday, June 11, 2026 23:07:00

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,

#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif

...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}

If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.

I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of

...unless 'i' or 'n' is modified in the body of

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Friday, June 12, 2026 01:18:17

On 2026-06-12 01:05, Scott Lurndal wrote:

The manual page also notes for the cases where printf returns -1:

The man page on my Linux doesn't. :-(

[snip error list]

Thanks for the error list.

[snip opengroup-links]

Yeah, and these links; always useful to look up these resources.

Janis

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 11, 2026 16:28:47

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >>6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". (Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

[SNIP]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Thursday, June 11, 2026 23:46:14

In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >>>6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined >>>behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 11, 2026 17:41:38

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 2026-06-11 21:13, James Kuyper wrote:

On 2026-06-11 14:12, Janis Papanagnou wrote:

On 2026-06-10 00:34, Keith Thompson wrote:

...

For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).

Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)

Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?

Oh, actually I indeed thought that printing a constant string would not create any error that would then be indicated by printf's return value.

Linux has a device called "/dev/full". It acts like it has no data
on input, and like it's full on output. You can redirect a program's
stdout to /dev/full. It's useful for testing, and much easier than
finding a writable filesystem with no remaining space. (/dev/null
accepts and discards as much intput as you send to it.)

On my system, a small write to /dev/full will typically succeed, since
the output is buffered rather than being immediately sent to the
file. It fails with ENOSPC after about 4 kbytes.

If I use fopen() to open /dev/full, then write to it, then fclose()
it, the fclose() fails. Since files are implicitly closed when
main() finishes, this is likely to go undetected.

A common pattern is "some_program > some_file", which redirects
stdout to a file but leaves stderr going to the default (typically
the tty).

I'd indeed also expected that, say, printing a string value with a '%d' specifier would produce an error, but I saw that it doesn't; while the compiler creates just a warning, execution provides some random output
and a _non-negative_ string-length value as printf's return value. Not exactly what I'd expect from a language.

Calling printf with a mismatch between the format string and
an argument has undefined behavior. Some compilers will warn
about this in most cases, but in general the format string is not
necessarily known at compile time. No diagnostic or other error
indication is required.

Concerning the "guarantees" that you're asking for I sadly have to say
that I meanwhile expect nothing sensible at all any more from "C". ;-)

But to be more serious again...

The man-page is very unspecific on that; 'man 3 printf' says:
"If an output error is encountered, a negative value is returned."

Now of course an error can occur with that simple 'printf' above, for example, by issuing an 'fclose (stdout);' before the 'printf (...);'
But what can I as a C-programmer derive from that; how would one act
on that. (That's just rhetorical.)

Obviously (because of that?) I've never seen anyone test such a call
by, say,

int rc = printf("Hello, world\n");
if (rc < 0) {
/* umm.. */
}

Quick-and-dirty programs like the classic "hello, world" often don't
bother to check. The above could print an error message to stderr and
call exit(EXIT_FAILURE). Even if stdout and stderr both produce errors,
the caller should be able to detect the error status. (I've configured
my shell to print a message when a program dies with an error status.)

But most production programs don't just blindly print stuff to stdout.

For example, GNU coreutils "cat" and "echo" both print "write error:
No space left on device" on stderr and exit with a status of 1 when
output is redirected to /dev/full -- if the output is big enough.
I haven't checked the source, but they must be explicitly checking
the result of both whatever output routine(s) they use and the
fclose(), or perhaps doing some fancy system-specific stuff that
has the same effect.

Are you - plural, all CLC audience - writing such code with 'printf()', honestly? - Same question with 'int rc = fclose (...);' - what can one
do about that, then? (Write a logfile entry, maybe? - and then?)

Write the error message to stderr, optionally log it somewhere,
and exit with an error code.

But yes, I'm aware of negative OS function or library function output.

Our rules (back in my C/C++ days) suggested to catch any sensible and possible error indications to quickly localize any potential issues.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Thursday, June 11, 2026 20:41:49

On 2026-06-11 18:37, Janis Papanagnou wrote:

On 2026-06-11 21:13, James Kuyper wrote:

On 2026-06-11 14:12, Janis Papanagnou wrote:

On 2026-06-10 00:34, Keith Thompson wrote:

...

For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).

Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)

Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?

Oh, actually I indeed thought that printing a constant string would not create any error that would then be indicated by printf's return value.

Every I/O function has a way of reporting failure, because every one is
capable of failing. That's because, if nothing else, hardware problems
could prevent I/O from happening. How much attention you need to pay to
that possibility depends upon the context.

I'd indeed also expected that, say, printing a string value with a '%d' specifier would produce an error, but I saw that it doesn't; while the compiler creates just a warning, execution provides some random output
and a _non-negative_ string-length value as printf's return value. Not exactly what I'd expect from a language.

On some systems I've used, it would try to interpret the pointer to the
string as an int, and print the result. On others, it would expect the
int to be stored in one register, whereas the pointer was stored in a
different register, and as a result it would print whatever value was
last stored in the first register. These were natural outcomes for those implementations; had the C standard imposed any conflicting requirements
on the behavior, it would have complicated those implementations.

...

Now of course an error can occur with that simple 'printf' above, for example, by issuing an 'fclose (stdout);' before the 'printf (...);'
But what can I as a C-programmer derive from that; how would one act
on that. (That's just rhetorical.)

Obviously (because of that?) I've never seen anyone test such a call
by, say,

int rc = printf("Hello, world\n");
if (rc < 0) {
/* umm.. */
}

Are you - plural, all CLC audience - writing such code with 'printf()', honestly? - Same question with 'int rc = fclose (...);' - what can one
do about that, then? (Write a logfile entry, maybe? - and then?)

For most of the programs I ever wrote, a single check for ferror(file)
at the end of the program, resulting in exit(EXIT_FAILURE) being called,
would be acceptable. That approach relies on the fact that the error
flag is sticky. Because I made a habit of such checks, we caught a
problem when a disk overflowed before we'd wasted hours "writing" data
to nowhere. If I had sent a message to a log file, it would have been
blocked by the same problem, which is why I used the exit status to
report the problem.
But I was never involved in writing interactive programs, where I
suspect that would not be acceptable.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 11, 2026 17:43:52

scott@slp53.sl.home (Scott Lurndal) writes:

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

[...]

I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of

...unless 'i' or 'n' is modified in the body of

Touch�.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Thursday, June 11, 2026 18:29:54

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >>>>6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined >>>>behavior. I intensely dislike both the rule and the way it's stated, >>>>but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >>(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.

No, those are not the two choices. An assumption made by an
implementation is not behavior ("external appearance or action").
An implementation might invoke some behavior as a result of some
assumption.

If a loop doesn't terminate and the implementation assumes that
it does, the standard says nothing about the resulting behavior.
It doesn't provide two or more options for the actual behavior.
That's classic UB.

We've seen cases here where the actual behavior is falling through
into a function that's never called. That's certainly not a
possibility provided by the standard.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 12, 2026 01:54:09

In article <110fnem$1s3nm$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^ >>>>>> what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220 >>>>>6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined >>>>>behavior. I intensely dislike both the rule and the way it's stated, >>>>>but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >>>(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.

No, those are not the two choices. An assumption made by an
implementation is not behavior ("external appearance or action").
An implementation might invoke some behavior as a result of some
assumption.

If a loop doesn't terminate and the implementation assumes that
it does, the standard says nothing about the resulting behavior.
It doesn't provide two or more options for the actual behavior.
That's classic UB.

We've seen cases here where the actual behavior is falling through
into a function that's never called. That's certainly not a
possibility provided by the standard.

Ok, fair point.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 12, 2026 02:02:51

In article <110fddl$1pooi$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:

[...]

I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,

#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif

...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}

If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.

I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of
the loop, and a compiler can elide the entire loop anyway.

Yes. Scott aluded to the rest; what if the actual body had set
the exit condition for the loop, and had been optimized away?

For example, given `DOTHING` as above:

for (int i = 0; i < n; ) {
if (DOTHING) {
// Something complex here...
i++;
}
}

Here, as before, the compiler is allowed to assume that the loop
_would_ terminate, and thus elide it, as before. Of course, it
is not forced to _guarantee_ that happens because it can't solve
the halting problem.

[...]

As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.

Personally, I don't think C should be in the business of doing
such things. But it is what it is.

I agree.

Yup.

It is one of the reasons C is no longer my favorite language.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Friday, June 12, 2026 02:08:45

In article <110f5qm$1nfih$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."

The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result
of bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

How about a loop that has a non-constant condition, but that is
not expected to terminate in normal usage?

while (! something_really_bad_happened()) {
sleep(1);
}
self_destruct();

A compiler could "assume" that the loop terminates, even if >something_really_bad never happens, and that assumption could result in
a call to self_destruct(). There are probably better ways to do that,
but it's straightforward code with seemingly obvious semantics that
an implementation is permitted to make unwarrated assumptions about.

[...]

I think, given the names, that this would _likely_ not meet the
criteria in 6.8.6 para 4. What would the criteria for, `something_really_bad_happened` to return `true`? It would
almost certainly involve something that is listed as a require
for the compiler to prove could not happen in order to assume
the loop terminates; as written, the "assume it terminates"
pretty much only allows empty loop bodies, or bodies that just
do simple calculations. I guess it's possible, but I'm having a
hard time imagining that `something_really_bad_happened`
wouldn't do IO or access a volatile or do an atomic operation or
something.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Friday, June 12, 2026 10:58:07

On 11/06/2026 17:34, Janis Papanagnou wrote:

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
�� bool keep_going = true;
�� while (keep_going) {
�� keep_going = true;
�� }
�� puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.� But is it likely?

I think we should not make any assumptions about the "creativity" of a programmer ("C" or else). - Semantics should be well defined, and then
clear to the programmer.

I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not be
known to all C programmers, but I think they are only relevant in a very
small number of cases.

In my experience, infinite loops are generally very clearly written -
either as "for (;;)" loops or "while (true)" loops - or they are the
result of bugs in the code that accidentally run forever.� If the loop
is accidentally infinite, the programmer will already be expecting it
to run the code after the loop.

[...]

So while I agree that this kind of thing can lead to curiosities and
behaviour that seems counter-intuitive, and is popular with the
"modern compilers are evil" crowd, I really do not see it as an issue
in practice.� There are many other mistakes programmers can make, or
UB that they hit accidentally - this is a drop in the ocean IMHO.

Languages shall be sensibly and clearly defined. For bad designs (or
bad standards) the language or standard should be blamed, and not the
critics badly and inappropriately despised as ''"modern compilers are
evil" crowd''. - Programmers are at the final end of the "food chain".
And there's a lot of horrible pits in the C-language where programmers
"made the mistake" to fall in; don't blame them, neither the ones who silently suffer nor the ones who shout out.

I agree that standards should be clear, and standards documents should
be held accountable if they are not. There's no doubt that the C
standards are not perfect (Keith's "42 is not an expression" is an
example of that).

But it is less obvious that the language should be blamed for bad
design. As a wise man here said, "C is what it is". The reasons for
design decisions might be lost to history, inappropriate for a modern language, or forced for compatibility reasons - but the language stands
with the rules it has. I don't know of anyone who uses a mainstream programming language for serious work and does not think at least some
of its design decisions are bad - "bad" is highly subjective, depending
on both the programmer and the type of work they do. Just like for any programming language, if you are programming in C, then you need to be
aware of the pitfalls of C or steer well clear of where pitfalls might be.

Ultimately, programming languages are subject to the equivalent of
market forces - the choice of language to use for a particular task is a matter of weighing up what you think are the good and bad points for
available alternatives. As the incumbent in many situations, C of
course has an unfair advantage - but with enough incentive, people move
to other languages with their own benefits, disadvantages, and "bad"
design decisions. This is a slow process, but it is the only way forward.

As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your
code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to
blame compiler developers for implementing the language as it is defined.

I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

It is always best if compilers are able to warn you about problems in
your code - such as UB - and avoid surprising results. But I don't
think it is practical to expect them to catch everything, and too many warnings will flood you with false positives. (gcc used to have a
warning for when code was elided - as the compiler got stronger and
gained more optimisations, the warning was dropped because eliding code happened far too often to warn about.)

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Friday, June 12, 2026 11:02:24

On 11/06/2026 22:29, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."

The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result
of bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.

How about a loop that has a non-constant condition, but that is
not expected to terminate in normal usage?

while (! something_really_bad_happened()) {
sleep(1);
}
self_destruct();

A compiler could "assume" that the loop terminates, even if something_really_bad never happens, and that assumption could result in
a call to self_destruct(). There are probably better ways to do that,
but it's straightforward code with seemingly obvious semantics that
an implementation is permitted to make unwarrated assumptions about.

The compiler can only assume that if it knows that the controlling
expression - the call to "something_really_bad_happened()" - does not
contain any IO operations, volatile accesses or atomic operations.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Friday, June 12, 2026 11:37:39

On 12/06/2026 01:46, Dan Cross wrote:

In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:

In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]

I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:

```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```

I see the same behavior.

The following largely repeats what I've written previously in
this thread.

Apparently the authors of clang decided that this statement in N3220
6.8.6.p4:

An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...

means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.

I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).

Why do you think the behavior is unspecified rather that undefined?

Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance".
(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)

What are the "two more more possibilities" in this case?

The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.

- Dan C.

I think perhaps there is both undefined and unspecified aspects here.

The implementation may assume the loop terminates - that means, to me,
that there are no requirements for what happens if the loop does not terminate. Not terminating would be UB.

However, I don't support clang's reasoning after that in this case. As
I see it, a compiler can reason that the loop terminates and then
executes "return 0;" because the non-terminating situation is UB and
cannot occur. Thus it can skip the loop and go straight to "return 0;".
Alternatively, it can reason that the non-terminating situation is UB
and we don't care what happens if it does not terminate - so "return 0;"
would be fine in that case too, simplifying the generated code.

But it seems that clang is reasoning that it can assume the loop
terminates, and it can prove that the loop does not terminate, and this contradiction means that anything is allowed (including skipping all
code generation). The code has two conflicting semantics - it is an
infinite loop, and it is a terminating loop. I think the standards say
that the compiler /may/ consider the terminating loop interpretation as correct, thus giving just "return 0;", or it may choose not to consider
that it terminates, and generate an infinite loop. Clang appears to
think that it can pick both options at once, which would give
contradictory behaviour, and therefore jump straight to UB.

I would say that the best behaviour for a compiler here would be to give
a warning, then it should pick one or the other defined behaviours.
(gcc picks the infinite loop, but does not give any warning.) I cannot
say for sure that clang's behaviour is incorrect - but it is certainly
very unhelpful and poor quality of implementation.

(I also think that it makes sense for compilers to use the "ud2" or
similar "undefined behaviour" trap instructions in cases where they know
an execution path is definitely UB and doing so does not affect the
efficiency of non-UB paths.)

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Friday, June 12, 2026 12:55:36

On 11/06/2026 20:29, Janis Papanagnou wrote:

On 2026-06-10 09:04, David Brown wrote:

[...]

The rough equivalent of the distinction between Pascal procedures and
functions is that procedures are like C functions that have "void"
return type.� It's fine (and not at all a bad idea) for a language to
distinguish between void and non-void like this.� What cannot easily
be done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).

Here I cannot follow you. - The C-compiler can analyze code to do optimizations and even (as so often stated) "assume" things about
the intent concerning UB and optimization but cannot value facts
about types and context? - If so, then it sounds rather arbitrary.

I think this thread is getting difficult to follow - there is a lot of wandering and vagueness (mostly from me, I must admit). So I am not
sure if it is worth pursuing further.

However, what I am trying to say is that it is easy for a programming
language design to make a distinction between "things that result in a
value of a type like int" and "things that do not result in a value" -
and then the language could decide that the former are "expressions"
that cannot stand alone, and the later are "statements". It is much
harder for a language description to say that /some/ "things that result
in a value of a type like int" can be used as "statements", while others
can be used only as "expressions" and not stand-alone statements.

[...]

It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour.� (A "pure procedure" would not do anything.)

By "would not do anything" you probably mean that it would not have side-effects on/with relatively global entities in the program?

Yes.

A "pure" function is one whose output depends entirely on its input parameters, and has no side-effects. (Some details of the definition
may be varied, such as the ability to read global data that never
changes after the first call. Perhaps memoizing might also be allowed.)
If you don't use the value of a call to a pure function - or if the
pure function does not return a value - then it can't do anything useful.

As far as I remember, Pascal does not make that distinction.

Pascal functions and procedures can affect and be affected by global entities. Predefined functions and procedures can have side effects
also unrelated to global entities in the program (e.g. print effect).
A procedure/function not affecting the global (or surrounding stack) environment could likely be identified. But here we're anyway talking
about the (clean!) return-interface of functions (as opposed to the procedures).

That all agrees with what I thought.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Friday, June 12, 2026 12:27:00

David Brown <david.brown@hesbynett.no> writes:

On 11/06/2026 17:34, Janis Papanagnou wrote:

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
�� bool keep_going = true;
�� while (keep_going) {
�� keep_going = true;
�� }
�� puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.� But is it likely?

I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.

I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.

I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
is specified in terms of what an implementation may "assume", not in
terms of the semantics of the program. One can conclude that this
means that the program has undefined behavior if the assumption is
violated, but that's not directly stated. I don't know how many C
programmers know the standard well enough to reach that conclusion.
I'm not even 100% sure it's accurate.

The permission was added in C11 with little fanfare. It's not
mentioned in the list of major changes in the C11 Foreword.
The cases where it applies may be rarer than I had assumed, but
it at least has the potential to break existing code that was well
defined in C99.

The rationale is to provide more opportunities for optimization,
but it's not at all clear (at least to me) that it's particularly
successful. If cases where it can cause problems are rare, then
presumably cases where it's actually useful are rare. (That may
be an oversimplification.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Saturday, June 13, 2026 12:36:15

On 12/06/2026 21:27, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

On 11/06/2026 17:34, Janis Papanagnou wrote:

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
�� bool keep_going = true;
�� while (keep_going) {
�� keep_going = true;
�� }
�� puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.� But is it likely?

I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.

I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.

I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
is specified in terms of what an implementation may "assume", not in
terms of the semantics of the program. One can conclude that this
means that the program has undefined behavior if the assumption is
violated, but that's not directly stated. I don't know how many C programmers know the standard well enough to reach that conclusion.
I'm not even 100% sure it's accurate.

The permission was added in C11 with little fanfare. It's not
mentioned in the list of major changes in the C11 Foreword.
The cases where it applies may be rarer than I had assumed, but
it at least has the potential to break existing code that was well
defined in C99.

The rationale is to provide more opportunities for optimization,
but it's not at all clear (at least to me) that it's particularly
successful. If cases where it can cause problems are rare, then
presumably cases where it's actually useful are rare. (That may
be an oversimplification.)

I agree on that last point. I doubt if any code would suffer if the
paragraph were removed entirely from the standard. And while I also
don't think much real-world code is at risk of problems from its
inclusion in the standard, as long as there is some risk of problems
with existing correct code, or some risk of confusion or
misunderstanding on the part of programmers reading the standard, then
it would be better if that paragraph had not been added to the standard
at all.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, June 13, 2026 12:02:24

In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly >insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always >frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >blame compiler developers for implementing the language as it is defined.

Eh...I think those people have a point.

Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.

But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.

Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.

I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.

I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >unexpected results is regularly used as "evidence" by those that hold >extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.

And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, June 13, 2026 12:03:47

In article <110hmi7$2e85g$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

David Brown <david.brown@hesbynett.no> writes:

On 11/06/2026 17:34, Janis Papanagnou wrote:

On 2026-06-11 08:56, David Brown wrote:

On 10/06/2026 23:47, Keith Thompson wrote:

[...]

#include <stdio.h>
int main(void) {
�� bool keep_going = true;
�� while (keep_going) {
�� keep_going = true;
�� }
�� puts("never reached");
}

[...]

[...]

The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).

A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite.� But is it likely?

I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.

I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.

I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
is specified in terms of what an implementation may "assume", not in
terms of the semantics of the program. One can conclude that this
means that the program has undefined behavior if the assumption is
violated, but that's not directly stated. I don't know how many C >programmers know the standard well enough to reach that conclusion.
I'm not even 100% sure it's accurate.

The permission was added in C11 with little fanfare. It's not
mentioned in the list of major changes in the C11 Foreword.
The cases where it applies may be rarer than I had assumed, but
it at least has the potential to break existing code that was well
defined in C99.

Another example of something that was previously well-defined
and is now UB, I guess. :-/

The rationale is to provide more opportunities for optimization,
but it's not at all clear (at least to me) that it's particularly
successful. If cases where it can cause problems are rare, then
presumably cases where it's actually useful are rare. (That may
be an oversimplification.)

I'm not sure that's the rationale: rather, it's to guarantee
forward progress. Again, that's not really the language's
purview.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Saturday, June 13, 2026 12:44:29

In article <video-20260613131240@ram.dialup.fu-berlin.de>,
Stefan Ram <ram@zedat.fu-berlin.de> wrote:

cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and

Might like to have a look at the video

"Garbage In, Garbage Out, Arguing about Undefined Behavior
with Nasal Demons" (2016) by Chandler Carruth.

IIRC it essential takes the point of your friend, but maybe adds
some explanations. At 15' in, it discusses the suggestion to
"define all the behavior". It's for C++, but I think some of it
might apply to C as well. At 24' come some examples.

I'm not a huge fan of Carruth.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Saturday, June 13, 2026 14:57:52

On 2026-06-12 04:20, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-11 18:30, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 2026-06-09 03:25, Waldek Hebisch wrote:

[...]

Interesting views. - Thanks.

I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.

I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.

You sniped most of what I wrote.

Yes, because I acknowledged it by my above on-line remark already
(and I didn't want to waste space unnecessarily). (No offense!)

I intended to comment just on the one paragraph above, with its
assumption that it may be an inherent problem to programmers.

But this paragraph was closely linked to the text above. Dan Cross
wanted formal semantics and my paragraph was responding to this.
I think that lawyerish style of current C standard is mostly inertia,
and making standard more mathematical would improve it. But giving
formal semantic in the standard would mean significantly bigger
change.

Yes, you said that, and I had acknowledged that; meanwhile twice.

I'm not sure why you persistently insist on any relation to your
previous text when all what *I* wanted to comment on in your post
was just _one aspect_ in your last paragraph, which was:

I think biggest trouble is normal programmers.
They already struggle with current standard text.

And I expressed that I refute that view and I explained my view.

If you think your statement about "normal programmers" (whatever
you imply with "normal") is correct and my perception with people
is in any way wrong we can discuss that.

(On your other text I see nothing that we'd need to discuss.)

Janis

[...]

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Janis Papanagnou@3:633/10 to All on Saturday, June 13, 2026 15:01:38

On 2026-06-12 12:55, David Brown wrote:

On 11/06/2026 20:29, Janis Papanagnou wrote:

[...]

I think this thread is getting difficult to follow - there is a lot of wandering and vagueness (mostly from me, I must admit).� So I am not
sure if it is worth pursuing further.

I agree, and I appreciate your post to clarify some things. - Thanks.

Janis

[...]

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Saturday, June 13, 2026 18:32:24

On 13/06/2026 14:02, Dan Cross wrote:

In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always
frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your
code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to
blame compiler developers for implementing the language as it is defined.

Eh...I think those people have a point.

Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.

I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements
when you think it could be better.

But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.

It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)

But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to
help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.

Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own
part.

Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.

I think Regehr has made some good points in his writings, but I do not
agree with him on everything.

As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no
"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.

It is the case that in C, there are some kinds of UB that can be quite
subtle. However, you rarely need to risk meeting them. Yes, there are pitfalls - don't go near them, and they don't matter.

However, it is unfortunately the case that sometimes avoiding UB can be
costly in performance terms. An example would be if you have need of type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not, depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of
magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor
monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation
based on UB. I don't know what the "best" answer here is.

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.

I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that. I am
happy knowing that I cannot divide by 0, or find the square root of a
negative number (in the real domain). I am happy knowing that I cannot
add two ints if their sum overflows the range of their type, and that I
cannot call a function with a different number or type of parameters
than its definition. I have a great deal of difficulty seeing how
things could be any different, other than in a managed language with significant overhead from run-time checks - and that goes against the "aggressively optimised" requirement.

Having "well-defined semantics" does not mean the language should accept anything that happens to fit the syntax and grammar rules, or that all functions and operations should give a defined result for all inputs.
It means that the set of valid inputs is clearly defined, along with the outputs and effects you get when the inputs are valid.

(There are plenty of points in the C standards where the wording could
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it
could reasonably be.)

That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.

Communication between the separate parties is always an issue, and it is
easy for it to be a one-way street with a language standards committee dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.

A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less
knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for
different compilers with different unwritten rules? It is not easy to
please everyone.

I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and
unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.

I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.
In addition, for some microcontrollers the toolchains have relatively
small user bases and consequently higher risks of unknown bugs in the toolchains themselves. Sometimes there are also implementation-specific features that change between versions (though that is less of an issue
these days).

And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.

Being dead does not resolve you of the responsibility - the person that
wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.
It might not be fair to hold it against them - there are a great many
possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

- Dan C.

Agreed.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Sunday, June 14, 2026 14:33:33

In article <110k0mp$329k6$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 13/06/2026 14:02, Dan Cross wrote:

In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always >>> frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >>> code - especially if writing correct code gives inefficient results with >>> the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >>> blame compiler developers for implementing the language as it is defined. >>

Eh...I think those people have a point.

Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.

I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements >when you think it could be better.

Yes.

But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.

It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)

But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to >help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.

Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own >part.

Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.

Even once it was a part of C, the concept was communicated
poorly.

Some people seem to delight in this, believing precision in
interpreting the standard in abstruse ways is an expression of
deep technical expertise; but it really is not.

Yes, UB is created by programmers. However, in large systems,
it may be that it was created inadvertantly; someone makes a
change that subtley invalidates some invariant that an unknown
caller far away in the code base (or in another one that relies
on the change via an indirect dependency) and now you've got UB;
locally, everything appears correct; but it's the combination
where the UB manifests.

Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.

I think Regehr has made some good points in his writings, but I do not
agree with him on everything.

As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no >"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.

This example makes little sense to me. If you don't want
integer overflow, then don't overflow; the techniques for
avoiding it are pretty well known. But why is specifically
better that it is UB, rather than than trapping in debug
builds, or having IB semantics based on the underlying machine?
It seems to be that the burden on the programmer is the same.

It is the case that in C, there are some kinds of UB that can be quite >subtle. However, you rarely need to risk meeting them. Yes, there are >pitfalls - don't go near them, and they don't matter.

I disagree. I think almost all non-trivial programs have UB to
a greater or lesser extent, whether they intend to or not.

However, it is unfortunately the case that sometimes avoiding UB can be >costly in performance terms. An example would be if you have need of >type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not, >depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of >magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor >monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation >based on UB. I don't know what the "best" answer here is.

This is kind of my point. If you need a fast way to convery

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.

I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.

UB is literally the opposite of well-defined.

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

or find the square root of a negative number (in the real
domain).

Yup. That should be a trap.

I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,

Yup. That should be a trap (if you want wrapping semantics, you
should request it explicitly).

and that I cannot call a function with a different number or
type of parameters than its definition.

Yup. That should be a compile-time error.

I have a great deal of difficulty seeing how things could be
any different, other than in a managed language with significant
overhead from run-time checks - and that goes against the
"aggressively optimised" requirement.

There are existence proofs of other languages that can, and do,
do these things, and do them well. I hate to keep beating this
drum, but I think Rust does well here: in safe Rust, UB is a
compile-time error; in *unsafe* Rust, there are tools to help
find where programmers violate the language's invariants.

Having "well-defined semantics" does not mean the language should accept >anything that happens to fit the syntax and grammar rules, or that all >functions and operations should give a defined result for all inputs.

I never said that it did.

It means that the set of valid inputs is clearly defined, along with the >outputs and effects you get when the inputs are valid.

So I was the one who said "well-defined semantics" and I had a
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.

(There are plenty of points in the C standards where the wording could
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it >could reasonably be.)

It's not just that it's nowhere close to being as well-defined
as it should be, it's because the language as defined permits
behavior that varies far too widely, specifically because of UB.

Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.

That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.

This is one reason the committee is trying to reign some of this
in.

That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.

Communication between the separate parties is always an issue, and it is >easy for it to be a one-way street with a language standards committee >dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.

A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less >knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for >different compilers with different unwritten rules? It is not easy to >please everyone.

I think that's simplistic; not many programmers actively want to
"avoid the consequences of their mistakes." Do you really
believe that they do? If so, why?

Conversely, there *is* this kind of machismo attitude among many
C programmers that it requires a superior intellect to truly
understand this language, and those who do not (or who make any
mistake in their understanding) are simply unworthy. I have
repeatedly observed this over many decades now, and when I see
it, I think that it is odious.

My experience is that most programmers are highly intelligent,
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.

In fairness, I think the current members of the committee
recognize this.

I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >>> unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.

I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.

Obviously in a production setting tools should be tested and
qualified. But the danger posed by UB adds unacceptable risk on
large projects, and the burden for updating a toolchain is too
high. That is as much an indictment of the language as of any
particular project.

As a counter example, there was the Harvey project, which was a
fork of Plan 9 where the Plan 9 C dialect was replaced with ISO
C; we accounted for this by having CI build with 6 seperate
compilers; this flushed out a lot of bugs.

I am surprised that more projects do not adopt canary CI builds
against newer toolchains.

In addition, for some microcontrollers the toolchains have relatively
small user bases and consequently higher risks of unknown bugs in the >toolchains themselves. Sometimes there are also implementation-specific >features that change between versions (though that is less of an issue
these days).

Fun fact: part of the reason Google got involved in clang and
LLVM development was because the vendor toolchain for a
particular microcontroller used in android phones was buggy and
would crash (that is, the compiler itself crashed). The
solution was not to live with it; it was to build a better
toolchain.

Google could afford to do that; I recognize not many
organizations can.

And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.

Being dead does not resolve you of the responsibility - the person that >wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.

See above. Those people may well have written the code before C
was standardized and before UB as we know it now existed. Also,
by definition UB is not an error.

It might not be fair to hold it against them - there are a great many >possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful >exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.

Exactly. The footguns hiding in C code that has worked
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.

_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.

The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.

Who's fault is that?

And no, this is not contrived; this is exactly the sort of thing
that happens on large, long-lived projects.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

Agreed.

...but be careful blaming the programmer.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Sunday, June 14, 2026 22:02:50

On 14/06/2026 16:33, Dan Cross wrote:

In article <110k0mp$329k6$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

On 13/06/2026 14:02, Dan Cross wrote:

In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
As for my '"modern compilers are evil" crowd' comment, there are people >>>> (not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are >>>> only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking" >>>> their code that relied on the effects of some kinds of UB. It is always >>>> frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >>>> code - especially if writing correct code gives inefficient results with >>>> the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >>>> blame compiler developers for implementing the language as it is defined. >>>

Eh...I think those people have a point.

Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.

I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements
when you think it could be better.

Yes.

But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.

It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)

But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to
help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.

Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own
part.

Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.

Even once it was a part of C, the concept was communicated
poorly.

It is certainly the case that C code has been written for a long time.
And it is certainly the case that some C code was written long ago, and
is still used on systems today. But I think it is important to keep in
mind that the solid majority of C code is relatively recent. Very
little pre-C90 code is ever compiled with modern tools. Code that is
old and still in use is important code, but modern code and modern tools should not be kept back because of it.

Maybe there is scope for compilers to have better options for handling
old code, other than the usual "Use -O0 to avoid optimising on UB"
solution. You could come a long way with a "treat all variables as
volatile" flag, for example.

Some people seem to delight in this, believing precision in
interpreting the standard in abstruse ways is an expression of
deep technical expertise; but it really is not.

Agreed.

Yes, UB is created by programmers. However, in large systems,
it may be that it was created inadvertantly; someone makes a
change that subtley invalidates some invariant that an unknown
caller far away in the code base (or in another one that relies
on the change via an indirect dependency) and now you've got UB;
locally, everything appears correct; but it's the combination
where the UB manifests.

That can certainly happen. But that's just bugs in the code. I don't
see why UB should be considered as something special here. People
making changes to existing code sometimes misunderstand things, or accidentally break something that worked before. That's life as a
programmer, and there are techniques to reduce the risk - code reviews, linters, testing regimes, etc. Nothing gives 100% guarantees, and
everything has to weigh risks, consequences, costs and resources. UB is
not special here.

Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.

I think Regehr has made some good points in his writings, but I do not
agree with him on everything.

As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no
"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.

This example makes little sense to me. If you don't want
integer overflow, then don't overflow; the techniques for
avoiding it are pretty well known. But why is specifically
better that it is UB, rather than than trapping in debug
builds, or having IB semantics based on the underlying machine?
It seems to be that the burden on the programmer is the same.

UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen. If signed integer overflow were
defined as wrapping, then compilers could not put in traps to catch the
errors because as far as the language is concerned, they are not errors.
If they are defined as causing traps, then that's the semantics -
compilers could not optimise code assuming overflow does not happen,
unless it can prove there is no overflow.

And making it defined behaviour gives programmers the mistaken idea that
they don't need to avoid overflow because there is no UB.

Making this UB is an admission of the blindingly obvious - there is no
correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and it allows tools to help programmers avoid these mistakes, and it allows
compilers to give programmers the most efficient results from known good
code rather than adding unnecessary run-time checks that are never
triggered.

It is the case that in C, there are some kinds of UB that can be quite
subtle. However, you rarely need to risk meeting them. Yes, there are
pitfalls - don't go near them, and they don't matter.

I disagree. I think almost all non-trivial programs have UB to
a greater or lesser extent, whether they intend to or not.

However, it is unfortunately the case that sometimes avoiding UB can be
costly in performance terms. An example would be if you have need of
type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not,
depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of
magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor
monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation
based on UB. I don't know what the "best" answer here is.

This is kind of my point. If you need a fast way to convery

(I think you missed a bit of your answer here?)

Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.

I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.

UB is literally the opposite of well-defined.

I want good definitions of things that should be defined. Things that
cannot have good definitions, are fine left undefined. A language
standard should not be trying to define the behaviour of /everything/.

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

For some programs, yes. For others, no.

or find the square root of a negative number (in the real
domain).

Yup. That should be a trap.

For some programs, yes. For others, no.

I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,

Yup. That should be a trap (if you want wrapping semantics, you
should request it explicitly).

I agree that wrapping semantics should be something you have to ask for.
(As an aside, I think it is a mistake for languages to have types that
have wrapping semantics - it's the operations that should wrap, not the
types. Zig gets it right by distinguishing between "x + y" and "x +% y".)

I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for it.

and that I cannot call a function with a different number or
type of parameters than its definition.

Yup. That should be a compile-time error.

There I agree entirely. The build model of compiling units to separate
object files without any information beyond symbol names made sense 50
years ago - we should be doing far better now. (We /can/ do far better,
but it requires conventions in the way you write your C code and the
options used when compiling or linting the program.)

I have a great deal of difficulty seeing how things could be
any different, other than in a managed language with significant
overhead from run-time checks - and that goes against the
"aggressively optimised" requirement.

There are existence proofs of other languages that can, and do,
do these things, and do them well. I hate to keep beating this
drum, but I think Rust does well here: in safe Rust, UB is a
compile-time error; in *unsafe* Rust, there are tools to help
find where programmers violate the language's invariants.

Certainly it is possible to eliminate a number of things that are UB in
C. UB that is not necessary, or not useful, is a bad thing in a language.

But I think it is equally bad to give things a definition simply to be
able to say there is no UB. It is, IMHO, entirely /wrong/ of a language
to define integer overflow as wrapping simply so that it is not UB. I
do not see a guaranteed incorrect result that likely has catastrophic consequences in a program as being better than UB. (I believe Rust
defines integer overflow as trapping in "debug" mode and wrapping in
"release" mode, which I think is a horrendous idea.)

Having "well-defined semantics" does not mean the language should accept
anything that happens to fit the syntax and grammar rules, or that all
functions and operations should give a defined result for all inputs.

I never said that it did.

I didn't say you said it did :-)

It means that the set of valid inputs is clearly defined, along with the
outputs and effects you get when the inputs are valid.

So I was the one who said "well-defined semantics" and I had a
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.

I am all in favour of compile-time checks and rejecting code with errors
(not just UB) as soon as possible. The "perfect" language is one where
you really can follow the old Ada saying - if you can make it compile,
it's ready to ship.

I don't live dangerously by not having run-time checks on integer
overflows. I make sure my code does not have them, so checks are
unnecessary. For some of my code, if it "panicked" somewhere in
calculations, that would be a disaster - when you have code controlling
power electronics, a sudden stop can mean short-circuits and components releasing their magic grey smoke.

Thinking that run-time checks will save you from UB is wishful thinking.
How are you going to have run-time checks that a pointer parameter
points to a valid object of the right type? You can check for a
null-pointer, but that's about it. Some things that are potential UB in
C are inherent in the type of language - checking for such problems (at compile-time or run-time) needs a language that has a different way of handling objects and pointers so that you cannot have arbitrary pointers
to arbitrary objects.

C is not a language suitable for such run-time or compile-time checks -
it is a language for getting the highest efficiency because the
programmer takes responsibility for getting things right. You are
correct that large programs normally have bugs (of which UB is just one
class) - the risk of bugs goes up with the size of the code base. The corollary is that C is not a language suitable for large programs.

Rust, I think, reduces the risk of some kinds of bugs. So does C++,
when used carefully. Most code, however, is best written in languages
where these issues cannot occur - or at least where checks can be done
without a measurable impact. For example, if you use Python, you never
have integer overflow, and you never have invalid pointers.

(There are plenty of points in the C standards where the wording could
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it
could reasonably be.)

It's not just that it's nowhere close to being as well-defined
as it should be, it's because the language as defined permits
behavior that varies far too widely, specifically because of UB.

Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.

That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.

I'd be happy for the C standard to say that signed integer overflow is a
bug, or that code is not allowed to overflow its integer arithmetic. I
would not be happy if it said compilers must trap on the bug or handle
it in some specific way - what happens when a bug is reached is still
UB. And if the wording of the standard were changed to call it a "bug"
rather than "UB", it would make absolutely zero difference to the way I
write my code.

This is one reason the committee is trying to reign some of this
in.

That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.

Communication between the separate parties is always an issue, and it is
easy for it to be a one-way street with a language standards committee
dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.

A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less
knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for
different compilers with different unwritten rules? It is not easy to
please everyone.

I think that's simplistic; not many programmers actively want to
"avoid the consequences of their mistakes." Do you really
believe that they do? If so, why?

It was badly worded - I meant that programmers do not want mistakes that
they might make to lead to additional problems. We can all appreciate
and expect that if we make a mistake in code with an incorrect
calculation, that will give incorrect output, or perhaps a crash in the program. But we hope that it will not lead to corruption of a
filesystem, or an exploitable security hole - something out of
proportion with the mistake.

Conversely, there *is* this kind of machismo attitude among many
C programmers that it requires a superior intellect to truly
understand this language, and those who do not (or who make any
mistake in their understanding) are simply unworthy. I have
repeatedly observed this over many decades now, and when I see
it, I think that it is odious.

In my field, people usually put a lot of effort into writing code simply
and clearly. You avoid mistakes not by being "clever", but by being meticulous and careful. I don't think successful C programming requires greater intellect, knowledge or experience compared to other programming languages - but it /does/ require an appropriate attitude. You are
working with sharp knives - pay attention to what you are doing, and
you'll be fine.

My experience is that most programmers are highly intelligent,
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.

I was avoiding signed integer overflow long before I had read any C
standards or even knew about the term "UB". Programming in C does not
need a lawyer knowledge of the language. It is just like programming in
any other programming language - use features that you know are correct,
and if you want to do something and don't know how to do so correctly,
look it up.

In fairness, I think the current members of the committee
recognize this.

I am not in any way saying that critics of aspects of C (the language, >>>> the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >>>> unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.

The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.

I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.

Obviously in a production setting tools should be tested and
qualified. But the danger posed by UB adds unacceptable risk on
large projects, and the burden for updating a toolchain is too
high. That is as much an indictment of the language as of any
particular project.

As a counter example, there was the Harvey project, which was a
fork of Plan 9 where the Plan 9 C dialect was replaced with ISO
C; we accounted for this by having CI build with 6 seperate
compilers; this flushed out a lot of bugs.

I am surprised that more projects do not adopt canary CI builds
against newer toolchains.

In addition, for some microcontrollers the toolchains have relatively
small user bases and consequently higher risks of unknown bugs in the
toolchains themselves. Sometimes there are also implementation-specific
features that change between versions (though that is less of an issue
these days).

Fun fact: part of the reason Google got involved in clang and
LLVM development was because the vendor toolchain for a
particular microcontroller used in android phones was buggy and
would crash (that is, the compiler itself crashed). The
solution was not to live with it; it was to build a better
toolchain.

Buggy toolchains are always a pain. (So is buggy hardware -
microcontrollers and cpus have their errors too.)

Google could afford to do that; I recognize not many
organizations can.

Unfortunately that's true.

And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.

Being dead does not resolve you of the responsibility - the person that
wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.

See above. Those people may well have written the code before C
was standardized and before UB as we know it now existed. Also,
by definition UB is not an error.

It might not be fair to hold it against them - there are a great many
possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful
exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.

Exactly. The footguns hiding in C code that has worked
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.

_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.

The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.

Who's fault is that?

There's no simple answer here.

But one thing is clear to me - "UB" is irrelevant here (and in many of
your points). It would not matter if everything had fully defined
behaviour. The point is that something is changed in one part of the
code that has unexpected consequences in another part of the code. Who
cares if there is UB or not? The issue is that the code does not work
as intended or expected. UB can provide situations where you have
unexpected bugs - but so can all sorts of other things.

And no, this is not contrived; this is exactly the sort of thing
that happens on large, long-lived projects.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

Agreed.

...but be careful blaming the programmer.

Or the language, or the tools.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Scott Lurndal@3:633/10 to All on Sunday, June 14, 2026 21:24:20

ram@zedat.fu-berlin.de (Stefan Ram) writes:

cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:

I'm not a huge fan of Carruth.

(Text after "| " below was generated by a chatbot asked to explain
narrow contracts and the reduction of efficiency by defining UB.)

(Let me guess: You are not a huge fan of chatbots either!
Ok, that was easy.)

Chandler talked about how narrow contracts allow optimizations.

| - Wide Contract: The function guarantees to handle all possible inputs
| gracefully, usually by returning an error code or throwing an
| exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
|
| - Narrow Contract: The function only guarantees correct behavior if
| the caller meets specific preconditions. If the preconditions are
| violated, the behavior is undefined.
|
| When is it appropriate to have a narrow contract? Always, when
| performance, memory footprint, or direct hardware control are
| paramount. In operating system kernels, embedded systems, real-time
| applications, and high-performance computing, the overhead of
| validating every pointer, checking every array bound, and verifying
| every integer range is unacceptable.

I have a recollection that a version of IBM's MVS operating
system did, indeed, validate input and output arguments to kernel
functions.

Indeed, google says it was called MVS/SP and later MVS/XA (extended addressing).

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Keith Thompson@3:633/10 to All on Sunday, June 14, 2026 15:55:09

David Brown <david.brown@hesbynett.no> writes:
[...]

UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen.

No, it means that the implementation can make that choice (or allow you
to make that choice). A conforming compiler could generate code on the assumption that signed overflow never happens, and not give the
programmer any options.

[...]

Making this UB is an admission of the blindingly obvious - there is no correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.

Trapping or raising/throwing an exception on overflow would also be an admission of the blindingly obvious. And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Of course those kinds of checks are not in the "spirit of C".

[...]

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

For some programs, yes. For others, no.

What's the difference between these programs?

[...]

I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for
it.

I don't want to pay the price of checking for syntax errors when I know
my code is syntactically correct. But I never know that, because I'm
fallible.

I admit that's not a very strong argument. There are real differences
between compile-time and run-time checks.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 15, 2026 10:09:56

On 15/06/2026 00:55, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:
[...]

UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen.

No, it means that the implementation can make that choice (or allow you
to make that choice). A conforming compiler could generate code on the assumption that signed overflow never happens, and not give the
programmer any options.

Sure. But if it were not UB, then a conforming implementation could not
make such choices or give me such choices. UB does not mean that I
definitely have such choices (as my poor wording implies), but that implementations are able to give me the choice.

If the standards had said integer overflow was IB, then that puts limits
on what the compiler can do - and therefore on what it can do to help
the programmer. Exactly what options it had would depend on the wording
of the standard, such as whether it required an "implementation-defined
value" or, like narrowing conversions to signed integer types, "either
the result is implementation-defined or an implementation-defined signal
is raised". However, even in that later case I think it would be more confusing for a lot of programmers - many programmers, quite reasonably,
have an intuition that "UB" means "don't do this" or "this is not legal
in C". They also have the intuition that "IB" means "this works
according to the underlying hardware". If the standards had said
integer overflow was IB, most programmers would immediately assume that
meant wrapping behaviour.

More interesting, I think, is the possible future "erroneous behaviour" marker. My understanding is that it lets the compiler have traps or
other run-time detection, or provide unspecified values, while making it
clear that erroneous behaviour is a result of software bugs.

[...]

Making this UB is an admission of the blindingly obvious - there is no
correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.

Trapping or raising/throwing an exception on overflow would also be an admission of the blindingly obvious.

It is obvious - to me, anyway - that signed overflow is a mistake in the
code. It is trying to do something that cannot be done. What is the single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.

Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in
every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Sure - but in practice having strict overflow checks would significantly reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.

Of course those kinds of checks are not in the "spirit of C".

Indeed.

And if we want to move away from the "spirit of C", then I think we
should move away from the /language/ of C. In C, people do not expect exceptions or sudden jumps from their code - they expect that if there
is checking for errors, it is explicit in the code. In many other
languages, there is a much clearer understanding that lots of things can
fail and cause immediate exits from the function - and code is
(hopefully!) written to handle that.

[...]

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

For some programs, yes. For others, no.

What's the difference between these programs?

There are disadvantages in having a trap. It can (depending on
hardware) mean extra code to detect the zero - usually that run-time
cost is negligible, but sometimes it is not. It will mean extra code to handle the exception - again, often but not always negligible. Those
costs apply even if the programmer has made sure that division by zero
never occurs. And if a trap is thrown, what then? I think that a
programmer that is careful enough to see that a division expression
might throw, and handle the trap or exception appropriately, is going to
be careful enough to avoid the problem in the first place. So the trap
is going to be unexpected and handled badly. A badly handled division
by zero exception left the USS Yorktown dead in the water for three hours.

Is it better /not/ to trap? There is no general rule. If you have
tried to divide by zero, something has gone wrong before the division,
and there are no good answers to what will go wrong afterwards.
Sometimes it is possible to do damage limitation - sometimes not.

The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.

(There are other ways of handling such things, like the use of NaN's in floating point, or extending your integers with some kind of "invalid" indicators.)

[...]

I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for
it.

I don't want to pay the price of checking for syntax errors when I know
my code is syntactically correct. But I never know that, because I'm fallible.

Checking for syntax errors is cheap - PC computing power is, in this
context, pretty much free and unlimited. If I am using a target
environment where run-time resources are plentiful, I would not be using
C in the first place.

I admit that's not a very strong argument. There are real differences between compile-time and run-time checks.

Perhaps I work in a field where that difference is more extreme than for
many programmers, and I thus feel it more than most.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Monday, June 15, 2026 16:01:32

On 15/06/2026 12:43, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 00:55, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

<snip>

[...]

Making this UB is an admission of the blindingly obvious - there is no >>>> correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.

Trapping or raising/throwing an exception on overflow would also be an
admission of the blindingly obvious.

It is obvious - to me, anyway - that signed overflow is a mistake in the
code. It is trying to do something that cannot be done. What is the
single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.

Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a
problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in
every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Sure - but in practice having strict overflow checks would significantly
reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.

IMO resonable and easy definition is: computation either delivers mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.

It is always the case that an implementation can weaken preconditions
and strengthen postconditions and remain correct - though it might then
be less efficient than you expect. But if you are /requiring/ a weaker precondition and /requiring/ a strong postcondition - such as by
insisting on traps on overflow - you are changing the function or
operation specification, and it is not necessarily a good thing.

In C, the integer addition operation "c = a + b;" has a precondition :

(a + b) <= INT_MAX, (a + b) >= INT_MIN

It has the postcondition :

c == a + b

Saying that it must trap if there is overflow weakens the precondition
to any "a" and "b", but makes the postcondition much more complicated.
It means it is no longer true that the result of an addition operation
is the sum of the operands. Addition is no longer a "pure" function -
now it has side-effects that are completely unpredictable at the site of
use. Programmers can no longer rely on the timing of the operation,
stack usage, interaction with other code, or even that the operation
ever finishes.

If your code is correct, and overflow never happens, then this is all a
big disadvantage in terms of understanding and analysing the code. And
it does not in any way reduce the effort needed to be sure that your
inputs are appropriate for getting the desired results of the operation.

Trapping like this can certainly be useful for debugging. But as a
general feature it gives a false sense of security, complicates
mathematical analysis, introduces massive additional possible code path choices which are either real or almost certainly untested in practice,
or not real (because the compiler can see they are not taken) and
untestable. That is not qualitatively worse than "who knows what will
happen" UB, but it is not significantly better.

<snip>

The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.

What is value of certification required for some software? If
programmer did good job then program will work correctly.

Yes.

Trap give assurance that programmer indeed correctly handled
tricky problem.

No, it certainly does not. And one of the reasons to dislike traps is
that it makes people think like that. A trap can only happen if the programmer did /not/ handle the problem correctly. And I expect that if
the programmer is able to write an appropriate specific trap handler for
the failing expression (rather than a program-global "crash with error message" handler), then he/she would be able to avoid the problem in the
first place.

Sometimes, of course, you are trying to write code that has some input
which is supposed to be correct, but you are not sure - and you can't
change the calling code. How you handle that situation will depend on
the program and the situation. But I don't see trapping as "correct
handling" unless the whole program is written with the expectation of
traps for error handling. You might, however, end up deciding that
trapping is the least bad option.

And once you know that computation works
according to math rules other forms of verification are easier.

You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.

I think if you are /not/ concerned with high efficiency in the code,
then you should be seriously questioning the choice of C as the language
in the first place. And even if you use C, there are often things you
can do to avoid having problems in the first place. The obvious one for integer overflow is to make more use of bigger types.

In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 15, 2026 17:52:09

In article <8_EXR.112952$Mm3.81340@fx33.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:

ram@zedat.fu-berlin.de (Stefan Ram) writes:

cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:

I'm not a huge fan of Carruth.

(Text after "| " below was generated by a chatbot asked to explain
narrow contracts and the reduction of efficiency by defining UB.)

(Let me guess: You are not a huge fan of chatbots either!
Ok, that was easy.)

Chandler talked about how narrow contracts allow optimizations.

| - Wide Contract: The function guarantees to handle all possible inputs
| gracefully, usually by returning an error code or throwing an
| exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
|
| - Narrow Contract: The function only guarantees correct behavior if
| the caller meets specific preconditions. If the preconditions are
| violated, the behavior is undefined.
|
| When is it appropriate to have a narrow contract? Always, when
| performance, memory footprint, or direct hardware control are
| paramount. In operating system kernels, embedded systems, real-time
| applications, and high-performance computing, the overhead of
| validating every pointer, checking every array bound, and verifying
| every integer range is unacceptable.

I have a recollection that a version of IBM's MVS operating
system did, indeed, validate input and output arguments to kernel
functions.

Indeed, google says it was called MVS/SP and later MVS/XA (extended addressing).

The Midori folks at Microsoft added bounds checking to all array
accesses in M# (the safe language they wrote Midori in). They
expected performance to be awful; when they provided it, the
overhead was pretty much undetectable: the cost was in the
noise.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Monday, June 15, 2026 19:26:16

Prefatory: I think we're largely in agreement; I'll just add a
few notes, but snip most of the rest.

In article <110n1db$3sbck$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:

[snip]
Maybe there is scope for compilers to have better options for handling
old code, other than the usual "Use -O0 to avoid optimising on UB"
solution. You could come a long way with a "treat all variables as >volatile" flag, for example.

The problem is, the language doesn't make any guarantees here,
and the compilers get to decide. If you're lucky, the compiler
gives you some control via flags or pragmas or something, but
if you're not lucky, it doesn't and the guarantees you can rely
on are just too weak.

[snip]
That can certainly happen. But that's just bugs in the code. I don't
see why UB should be considered as something special here.

Because unlike many bugs, which are clearly bugs, UB is just the
absence of defined behavior. So the output of a program
executes can change in subtle ways with no changes to the code,
only changes to the compiler or how it is invoked.

People
making changes to existing code sometimes misunderstand things, or >accidentally break something that worked before. That's life as a >programmer, and there are techniques to reduce the risk - code reviews, >linters, testing regimes, etc. Nothing gives 100% guarantees, and >everything has to weigh risks, consequences, costs and resources. UB is
not special here.

Yes. My point with this line is that UB doesn't show up because
programmers are just careless, and "just write code more
carefully" doesn't scale any better than, "have you tried just
writing code without bugs?"

UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen. If signed integer overflow were
defined as wrapping, then compilers could not put in traps to catch the >errors because as far as the language is concerned, they are not errors.

If "you" means the compiler, then sure. If "you" means the
programmer, then you are lucky if you get to choose that, but
it is not guaranteed that you will have that kind of flexibility
available.

If they are defined as causing traps, then that's the semantics -
compilers could not optimise code assuming overflow does not happen,
unless it can prove there is no overflow.

And making it defined behaviour gives programmers the mistaken idea that >they don't need to avoid overflow because there is no UB.

Making this UB is an admission of the blindingly obvious - there is no >correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and it >allows tools to help programmers avoid these mistakes, and it allows >compilers to give programmers the most efficient results from known good >code rather than adding unnecessary run-time checks that are never >triggered.

But it doesn't say that. It says, "no guarantees; whatever
happens happens."

This is the thing: the correct answer is whatever the language
defines it to be. The language could say, "this is an error" or
it could say, "we do whatever the hardware does." But making it
UB isn't a statement of anything. UB is a refusal to make a
statement.

[snip]
(I think you missed a bit of your answer here?)

(I did, but i was just going to say something about memcpy; it
wasn't that interesting. :-/)

[snip]
I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.

I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.

UB is literally the opposite of well-defined.

I want good definitions of things that should be defined. Things that >cannot have good definitions, are fine left undefined. A language
standard should not be trying to define the behaviour of /everything/.

I accept that there will be some number of things that one
cannot reasonably define when creating a programming language.
But that set should be small;

I am happy knowing that I cannot divide by 0,

Yup. That should be a trap.

For some programs, yes. For others, no.

No. I don't accept that division by zero is ever acceptable in
a real program. What purpose would be served by _not_ trapping?
Most hardware will do it anyway.

or find the square root of a negative number (in the real
domain).

Yup. That should be a trap.

For some programs, yes. For others, no.

Same as above. If you want a NaN to be a possbility, you should
use an operation that lets you get that, `unchecked_sqrt()` or
something.

I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,

Yup. That should be a trap (if you want wrapping semantics, you
should request it explicitly).

I agree that wrapping semantics should be something you have to ask for.
(As an aside, I think it is a mistake for languages to have types that
have wrapping semantics - it's the operations that should wrap, not the >types. Zig gets it right by distinguishing between "x + y" and "x +% y".)

Yes. Rust has this as well, in `.wrapping_add()` et al.

I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for it.

Then the language should give you the ability to explicitly ask
for the unchecked versions of those operations.

But I think it is equally bad to give things a definition simply to be
able to say there is no UB.

I'm not suggesting that one should do that. What I'm saying is
that it is possible to conceive of a language that lets you
write robust, complex programs with strong guarantees about the
behavior of code, without UB. That doesn't mean that the
language is devoid of all notions of undefined behavior, but
rather that unless you ask for it, using UB is an error.

It is, IMHO, entirely /wrong/ of a language
to define integer overflow as wrapping simply so that it is not UB. I
do not see a guaranteed incorrect result that likely has catastrophic >consequences in a program as being better than UB.

We've discussed this before, and I understand your perspective
on it, but I feel it necessary to reiterate that I do not share
that perspective.

Defining arithmetic to be modular is perfectly acceptable. It
is not "wrong". Defining arithmetic on explicitly sized types
to use 2's complement semantics similarly. C defined arithmetic
overflow for signed types to be UB because when it was
standardized, machines existed that had different behavior and
representations for signed types. Why didn't they make it IB?
I don't know.

The world is different now.

(I believe Rust
defines integer overflow as trapping in "debug" mode and wrapping in >"release" mode, which I think is a horrendous idea.)

I agree that's kind of a wart. It's basically what you get with
UB in C.

In my opinion, the right call is providing an `unchecked_add`
and forcing the caller to wrap that in an `unsafe` block, while
normal `+` is always checked unless the compiler can deduce that
overflow cannot happen.

https://doc.rust-lang.org/std/primitive.u32.html#method.unchecked_add

So I was the one who said "well-defined semantics" and I had a
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.

I am all in favour of compile-time checks and rejecting code with errors >(not just UB) as soon as possible. The "perfect" language is one where
you really can follow the old Ada saying - if you can make it compile,
it's ready to ship.

I don't live dangerously by not having run-time checks on integer
overflows. I make sure my code does not have them, so checks are >unnecessary. For some of my code, if it "panicked" somewhere in >calculations, that would be a disaster - when you have code controlling >power electronics, a sudden stop can mean short-circuits and components >releasing their magic grey smoke.

This doesn't follow. If you have validated that the code cannot
overflow, and you are confident in that, then the code won't
panic due to overflow. So arguing against the validation seems
superfluous.

And of course if the compiler can validate that your code is
free of overflow (perhaps by examining your checks) then it
needn't insert the checks, so there is no runtime overhead.

Thinking that run-time checks will save you from UB is wishful thinking.
How are you going to have run-time checks that a pointer parameter
points to a valid object of the right type?

In strongly-typed languages with non-nullable references and
lifetimes as a first-class property of an object, the compiler
does that for you, statically, at compile-time.

You can check for a
null-pointer, but that's about it. Some things that are potential UB in
C are inherent in the type of language - checking for such problems (at >compile-time or run-time) needs a language that has a different way of >handling objects and pointers so that you cannot have arbitrary pointers
to arbitrary objects.

C is not a language suitable for such run-time or compile-time checks -

I agree.

it is a language for getting the highest efficiency because the
programmer takes responsibility for getting things right.

Paradoxically, this is not true. Consider pointers: because
they can be invalid, they have to be checked before dereference.
Contrast to non-nullable references in e.g. Rust; since their
mere existence implies that they refer to a valid object, they
do not need to be checked for nullity, misalignment, etc. Thus,
the better-defined language with stronger guarantees can afford
opportunities for optimization that don't exist in the
lower-level language riddled with UB.

You are
correct that large programs normally have bugs (of which UB is just one >class) - the risk of bugs goes up with the size of the code base. The >corollary is that C is not a language suitable for large programs.

Sadly, I now agree.

Rust, I think, reduces the risk of some kinds of bugs. So does C++,
when used carefully. Most code, however, is best written in languages
where these issues cannot occur - or at least where checks can be done >without a measurable impact. For example, if you use Python, you never
have integer overflow, and you never have invalid pointers.

If you use Rust, and restrict yourself as far as practical to
the safe subset, you never have invalid pointers, either. Nor
do you have uninitialized variables, or double-frees, or data
races. Entire categories of problems --- and their expensive
runtime checks --- are simply eliminated.

[snip]
Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.

That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.

I'd be happy for the C standard to say that signed integer overflow is a >bug, or that code is not allowed to overflow its integer arithmetic. I >would not be happy if it said compilers must trap on the bug or handle
it in some specific way - what happens when a bug is reached is still
UB. And if the wording of the standard were changed to call it a "bug" >rather than "UB", it would make absolutely zero difference to the way I >write my code.

This is an example of two people who are not sharing a
vocabulary around UB. I have no real commentary on that; I just
think it is interesting.

[snip]
In my field, people usually put a lot of effort into writing code simply
and clearly. You avoid mistakes not by being "clever", but by being >meticulous and careful. I don't think successful C programming requires >greater intellect, knowledge or experience compared to other programming >languages - but it /does/ require an appropriate attitude. You are
working with sharp knives - pay attention to what you are doing, and
you'll be fine.

50 years of experience shows us that that simply isn't true.
"Pay attention" and "be careful" just don't work.

My experience is that most programmers are highly intelligent,
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.

I was avoiding signed integer overflow long before I had read any C >standards or even knew about the term "UB". Programming in C does not
need a lawyer knowledge of the language. It is just like programming in
any other programming language - use features that you know are correct,
and if you want to do something and don't know how to do so correctly,
look it up.

Right. But the issue is that the source of truth, the standard,
is ambiguous in places and opaque in others. Sussing out the
true semantics of a thing can be cross-referencing half a dozen
different places, and this newsgroup sees cases where people who
are clearly intelligent, and who have an aptitude for
programming in C, can disagree on the specific meaning of things
in the standard.

Frankly, I think much of that is a waste of time. Let's have
better definitions, and more rigorous exposition.

[snip]
Exactly. The footguns hiding in C code that has worked
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.

_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.

The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.

Who's fault is that?

There's no simple answer here.

But one thing is clear to me - "UB" is irrelevant here (and in many of
your points). It would not matter if everything had fully defined >behaviour. The point is that something is changed in one part of the
code that has unexpected consequences in another part of the code. Who >cares if there is UB or not? The issue is that the code does not work
as intended or expected. UB can provide situations where you have >unexpected bugs - but so can all sorts of other things.

UB is the essential characteristic here. With a better defined
language, these issues are either compile-time failures, or they
become immediately apparent during testing. In the face of
C-style UB, however, they become spooky action at a distance;
the realized effect of the change may not manifest as a bug for
many years.

And no, this is not contrived; this is exactly the sort of thing
that happens on large, long-lived projects.

As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.

Agreed.

...but be careful blaming the programmer.

Or the language, or the tools.

I push back on both of these.

There's an old saw that goes, "a good craftsman never blames his
tools." (I dislike it, but that's how it usually goes.)

But there's an unstated corollary: a good craftsman also
maintains and carefully selects the tools for the job at hand.
You don't smooth a rough-cut board with a screwdriver, nor do
you turn a bolt with a hammer. And you don't use a chainsaw
without a guard.

- Dan C.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From James Kuyper@3:633/10 to All on Monday, June 15, 2026 21:59:14

On 14/06/2026 16:33, Dan Cross wrote:
...

Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.

"undefined behavior", defined as "behavior ... for which this
international standard imposes no requirements" Was introduced by the
first standard. However, before there was a standard there was K&R C,
the closest thing they had to a standard. And though the phrase
"undefined behavior" was not in use, there was "behavior for which K&R C imposes no requirements". In fact, there was a great deal more of it,
since K&R C was not written as carefully and precisely as the first
standard, so it left a great deal more behavior that was "undefined by
omission of any relevant definition" than there was in the first standard.

--- PyGate Linux v1.5.16
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From Dan Cross@3:633/10 to All on Tuesday, June 16, 2026 04:59:38

In article <110qali$3q27m$1@dont-email.me>,
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

On 14/06/2026 16:33, Dan Cross wrote:
...

Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.

"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.

"undefined behavior", defined as "behavior ... for which this
international standard imposes no requirements" Was introduced by the
first standard. However, before there was a standard there was K&R C,
the closest thing they had to a standard. And though the phrase
"undefined behavior" was not in use, there was "behavior for which K&R C >imposes no requirements". In fact, there was a great deal more of it,
since K&R C was not written as carefully and precisely as the first
standard, so it left a great deal more behavior that was "undefined by >omission of any relevant definition" than there was in the first standard.

I am guessing that there was supposed to be a point in there
somewhere, but I can't find it.

- Dan C.

--- PyGate Linux v1.5.17
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

From David Brown@3:633/10 to All on Tuesday, June 16, 2026 10:10:21

On 15/06/2026 19:57, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 12:43, Waldek Hebisch wrote:

David Brown <david.brown@hesbynett.no> wrote:

On 15/06/2026 00:55, Keith Thompson wrote:

David Brown <david.brown@hesbynett.no> writes:

<snip>

[...]

Making this UB is an admission of the blindingly obvious - there is no >>>>>> correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and >>>>>> it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from >>>>>> known good code rather than adding unnecessary run-time checks that >>>>>> are never triggered.

Trapping or raising/throwing an exception on overflow would also be an >>>>> admission of the blindingly obvious.

It is obvious - to me, anyway - that signed overflow is a mistake in the >>>> code. It is trying to do something that cannot be done. What is the
single-digit sum of 5 and 8? There is no answer. The answer is not 3, >>>> or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.

Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a >>>> problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not >>>> the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in >>>> every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.

And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.

Sure - but in practice having strict overflow checks would significantly >>>> reduce optimisation and re-arrangement possibilities, as well as having >>>> to include the checks themselves. You might allow non-strict checks in >>>> some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.

IMO resonable and easy definition is: computation either delivers
mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.

It is always the case that an implementation can weaken preconditions
and strengthen postconditions and remain correct - though it might then
be less efficient than you expect. But if you are /requiring/ a weaker
precondition and /requiring/ a strong postcondition - such as by
insisting on traps on overflow - you are changing the function or
operation specification, and it is not necessarily a good thing.

In C, the integer addition operation "c = a + b;" has a precondition :

(a + b) <= INT_MAX, (a + b) >= INT_MIN

It has the postcondition :

c == a + b

Saying that it must trap if there is overflow weakens the precondition
to any "a" and "b", but makes the postcondition much more complicated.

No. Precondition is the same. Postcondition has additional term "computation finished with no traps".

That's back where we started, with no defined behaviour if "a + b" is
too big - that is the specification for normal C addition. When you say
that addition should either deliver the correct result for suitable "a"
and "b", and trap for other values, you now have an operation that
accepts any "a" and "b", and has a postcondition that includes traps.
You have changed the function, and changed its specification,
pre-conditions and post-conditions.

It means it is no longer true that the result of an addition operation
is the sum of the operands.

Oposite of that: no traps means that regardless of precondition
the result of an addition operation is the sum of the operands.

Your change means that either the result is no traps and a correct sum,
/or/ it is a trap and no valid sum (what you get returned as the "sum"
will depend on how you define all this).

I think, perhaps, what you mean here is that if you do something like "x
= a + b;", and the execution makes it through the addition and does the assignment, then "x" is guaranteed to be equal to the sum of "a" and
"b". That is fair enough - without such guarantees, traps, exceptions,
etc., would be a completely useless concept.

Addition is no longer a "pure" function -
now it has side-effects that are completely unpredictable at the site of
use. Programmers can no longer rely on the timing of the operation,
stack usage, interaction with other code, or even that the operation
ever finishes.

The difference is that without traps programmers do not know if
arithmetic operations give correct result.

They do know - if the code is written correctly. They know the result
is correct because they know they have fulfilled the pre-conditions. It
is the caller code that has the responsibility to make sure the
pre-conditions hold.

If the programmer does not know if the pre-conditions will hold before
the call, then they don't know what their code will do. And that is not
a good situation to be in - the possibility of some unknown jump to
somewhere else in the code does not make it better.

Note that all of this is different from run-time failures that might
occur in the normal course of the program, outside of the knowledge or
control of the calling code. C++ exceptions, or C error return codes,
are fine for things like a "read file" function in the case when the
file does not exist. That is not the result of a bug in the code.
(Well, it might be, but it doesn't have to be.) It is an expected
situation that can be handled.

Traps on UB are unexpected situations resulting from bugs in code. They
can be helpful for fault-finding, and may have some uses in damage
limitation.

With traps they do
not know if program will successfully finish, but if it
finishes they know that arithmetic gave correct results.

This is achievable in a controlled manner, without traps.

If your code is correct, and overflow never happens, then this is all a
big disadvantage in terms of understanding and analysing the code. And
it does not in any way reduce the effort needed to be sure that your
inputs are appropriate for getting the desired results of the operation.

One needs to use correct formulas, there is no way around that.
Without traps programmer must analyse ranges of all intermetiate
expressions. That is tedious and error prone.

Then do a better job of it - or find ways that are not as tedious.

The main reasons for getting integer overflow are :

1. Using unsanitised input.

2. Using types that are too small.

3. Not having a clear idea of what kinds of values you are dealing with,
and what you are doing with them.

The way to avoid 1 is obvious. The way to avoid 2 is obvious (except in
the very rare situations where 64-bit integers are not big enough). The
way to avoid 3 is obvious. (Sometimes the details of implementing these
fixes are not minor, but the principle is clear.)

People work
around that by activating traps during testing, but it is
quite hard to find worst case values, so errors may be
easily missed during testing. Having traps active during
production runs means that you may discover problem. You
apparently think that ignoring possible problems at
runtime is good thing.

No, ignoring problems is never a good thing. Writing code that doesn't
run the risk of problems is a good thing.

And I can agree that sometimes leaving traps enabled in released code
can be helpful - there are situations where you can't practically remove
the risk of overflows, and it is better to crash out reliably than risk running on with faulty data. It is, however, also the case that
sometimes traps will cause far more problems than incorrect data would. (Noting that UB does not guarantee "incorrect data" - it can do
anything. Wrapping semantics, or unspecified value semantics, would do
that.)

For simple programs you may analyze
it well enough to be sure that nothing bad happens at
runtime, but in general computing we use a lot of "interesting"
programs which are too complex to analyse. We hope that
they will run OK, but have no proof. Sometimes hope is
based on statistical tests and on low probability input
program may fail. Traps are useful to make sure that
wrong results will not propagate further.

This is why you break your code down into manageable and understandable
parts - functions, classes (for some languages), modules / translation
units, files, directories, libraries. Yes, there can be interactions
that can be very difficult to test well - testing is not easy.

Code over a certain size is likely to contain bugs - programmers are
rarely infallible, and even when they are ( :-) ), the customer
specifying the program is not.

But we are talking here about a specific class of bugs - UB that can be detected by trap options in code generation or cpu hardware, which
basically means integer overflows, divide by 0, dereferencing null
pointers, and shift by inappropriate amounts. Those bugs are avoidable
- I really do not see them as a concern. Trapping won't help all the
other bugs - buffer overflows, unterminated strings, index out of range, misunderstanding the specifications, mixing up parameter order in
function calls, data races, logical errors, memory resource ownership
mixups, and everything else.

So your traps on arithmetic overflow is crippling the efficiency of calculations (and efficiency of calculations is a big reason for picking
C in the first place) to give unexpected crashes when easily preventable mistakes occur - while doing nothing to aid the big risks.

Trapping like this can certainly be useful for debugging. But as a
general feature it gives a false sense of security, complicates
mathematical analysis, introduces massive additional possible code path
choices which are either real or almost certainly untested in practice,
or not real (because the compiler can see they are not taken) and
untestable.

You get extra code paths only if you attempt to handle traps.

Unhandled traps are also a code path.

Trapping of overflows gives you assurance that in computation that
you did and which finished with no traps there were no errors of
certain kind (that is wrong results due to overflow). That is
really not different than insistence on static types.

They are not remotely the same - the distinction between compile-time
and runtime is critical.

Neither
assures you of no bugs, but each tells you that some bugs
did not happen. Of course, trapping at runtime is less
satisfactory than compile time checking, but tight a priori
bounds on ranges are notoriusly hard to obtain, so trapping
is the best we can have for high performance software with
current state of art.

That is not qualitatively worse than "who knows what will
happen" UB, but it is not significantly better.

<snip>

The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then >>>> what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.

What is value of certification required for some software? If
programmer did good job then program will work correctly.

Yes.

Trap give assurance that programmer indeed correctly handled
tricky problem.

No, it certainly does not. And one of the reasons to dislike traps is
that it makes people think like that. A trap can only happen if the
programmer did /not/ handle the problem correctly.

Yes.

And I expect that if
the programmer is able to write an appropriate specific trap handler for
the failing expression (rather than a program-global "crash with error
message" handler), then he/she would be able to avoid the problem in the
first place.

Rather non-specific trap handler could work as "redo the computation
in arbitrary precision". If problem (like division by zero) persists,
then there is logic bug, otherwise it means that precision was
inadequate and problem is resolved.

If you are talking here about using traps as a testing and debugging
aid, helping the developer spot problems and improve their code, then I
agree - that's a good thing.

If you are talking about some kind of automatic handling, then that is
totally out of scope for a language like C. It would be much more
appropriate to use a higher level managed language and higher level
arithmetic (like support for arbitrary precision integers) in the first
place.

Howver, you should think about such traps similarly to parity error
which can be signaled by some hardware. There is low but nonzero
probablity that such error can occur. Parity check gives you
reasonable chance to detect it.

That's not an unreasonable comparison. Parity checks used to be popular
- they are almost non-existent in communication protocols now. You
either have something that you know works correctly, or you use much
better methods - multiple ECC bits, CRCs, FEC, or whatever, according to
the balance of cost, error rates, consequences of data loss, etc.

Handling is at least as problematic
as with overflow. Absence of traps gives you less info: no
overflow traps mean no overflow, no parity traps means that
parity was correct, but intent of parity check it to discover bit
error and they are possible even with correct parity. So, do you
think that parity check inside MCU-s are useless?

Yes, for the most part. A parity check is almost always either
unnecessary, or not nearly enough.

Sometimes, of course, you are trying to write code that has some input
which is supposed to be correct, but you are not sure - and you can't
change the calling code. How you handle that situation will depend on
the program and the situation. But I don't see trapping as "correct
handling" unless the whole program is written with the expectation of
traps for error handling. You might, however, end up deciding that
trapping is the least bad option.

And once you know that computation works
according to math rules other forms of verification are easier.

You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.

I think if you are /not/ concerned with high efficiency in the code,

Well, if efficiency does not matter traps can be implemented as
a software layer above the language. Or one can use arbitrary
precision arithmetic. Traps matter when efficiency matters,
so they should be implemented in place giving best efficiency,
at best in CPU and if that is not possible then in optimizing
compiler.

then you should be seriously questioning the choice of C as the language
in the first place. And even if you use C, there are often things you
can do to avoid having problems in the first place. The obvious one for
integer overflow is to make more use of bigger types.

Which may be best choice if efficiency is not important. But
some calculations require surprisingly large accuracy to avoid
overflow. Worse, in vast majority of cases lower accuracy
may be adequate, so there is pressure to use "sufficient"
accuracy overlooking special cases.

In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).

--- PyGate Linux v1.5.17
* Origin: Dragon's Lair, PyGate NNTP<>Fido Gate (3:633/10)

Who's Online
Recent Visitors
- Wang Bu
  Sunday, June 14, 2026 19:13:00
  from Manila, Philippines via Telnet
- Wang Bu
  Sunday, May 24, 2026 21:32:28
  from Manila, Philippines via Telnet
- Wang Bu
  Monday, May 18, 2026 09:25:45
  from Manila, Philippines via Telnet
- Wang Bu
  Thursday, May 14, 2026 00:10:16
  from Manila, Philippines via Telnet

System Info

Sysop:	Jacob Catayoc
Location:	Pasay City, Metro Manila, Philippines
Users:	4
Nodes:	4 (0 / 4)
Uptime:	494928:21:14
Calls:	162
Files:	568
D/L today:	14 files (349K bytes)
Messages:	74,957

Re: this girl calls c ugly

Who's Online

Recent Visitors

System Info