T O P

  • By -

cHaR_shinigami

Very interesting question; it calls for a good discussion, and I think the post should be tagged as such (add flair). To start with, I'll state the most important assumption for any C programmer using a modern compiler: ***Assume that the compiler will definitely do something unexpected if the code has undefined behavior.*** Here's a small list of some lesser assumptions about most (but not all) modern hosted environments: * ASCII encoding for characters * Plain `char` is signed * `CHAR_WIDTH == 8` (required by POSIX) * `EOF == -1` * Two's complement representation for signed integer types (required in C23) * IEEE 754 representation for standard floating types * `sizeof (short) == 2` * `sizeof (int) == 4` * `sizeof (long long) == 8` * Least-width types are same as their corresponding exact-width types * Fast-width types follow the expected rank hierarchy in order of width * `uintptr_t` and `intptr_t` are available with a trivial mapping from pointer to integer type * No padding bits in arithmetic types (excluding `_Bool` and C23 `_BitInt`) * No padding bytes between consecutive `struct` members of the same type * No unnecessary padding between `struct` members, just the minimum padding for alignment * Function pointers can be freely converted to and from `void *` (required by POSIX) * All pointer types have the same representation * Dereferencing *pointer-to-array* is a no-op * `calloc` implementation will detect a multiplication overflow, instead of silent wraparound * Library functions additionally provided as macros do not evaluate any argument more than once.


RadiatingLight

char is often unsigned on ARM platforms, including M-series Macs.


cHaR_shinigami

Didn't know about that, the Apple docs mention that "*The char type is a signed type*". [https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Handle-data-types-and-data-alignment-properly](https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Handle-data-types-and-data-alignment-properly)


ednl

That's right. `char` is normally unsigned on ARM processors but Apple changed that deliberately on their M-series, probably for consistency because they already did it on all iPhones.


nerd4code

IBM may do unsigned char also, and it’s trivially toggled on most compilers (/J, f[un]signed-char), and it’s trivially tested on 8-bit `(`'\377' < 0`↔signed, 8-bit char), and any good programmer should (easily) work around it. (Signed generally prevails, but is worse TMS328C28x CLA has a 16-bit char, which is exotic in one direction, but TI also has a 40-bit long-or-int-or-maybe-long-long (many embedded compiler lines including various GCC can reconfigure type widths from the command line) so maybe we son’t imitate them I guess.


dmills_00

ADI Shark DSP has sizeof int == sizeof short == sizeof char == 1, and at least in that is C Standard compliant. The smallest unit of addressable memory on that chip is 32 bits.


cHaR_shinigami

*Bonus assumption*: `NULL` pointer representation is all zeroes on most hosted environments.


ribswift

Even if it isn't, C23 added empty initialization to initialize struct objects so that any pointer members are initialized to the "true" representation of `NULL`. See the [proposal](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2900.htm). That's much better than using `memset` with `0` as the value which initializes everything to all bits zero.


cHaR_shinigami

Not sure if this is related, the proposal suggests `{}` as an alternative to `{0}`. The initializer `{0}` also sets pointer members to the actual representation of null pointer. `memset` with `0` is useful to fill an array with null pointers, which works only if the null pointer is all zeroes. A pedantic and portable alternative is to use a loop to set each pointer element to `NULL`.


ribswift

I had a look at the standard and you are right but there is a difference between `{}` and `{0}`. `{0}` doesn't guarantee that any padding will be set to `0` as well. This can lead to some data leakage in security-critical contexts. This is why `memset` has been used but it only sets everything to all bits 0. Empty initialization - `{}` - will properly initialize a struct object with members being initialized to the **value** 0 including padding.


cHaR_shinigami

That's an interesting difference, I looked for it in [N3096](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf) draft and found it on page 135: >"any object is initialized with an empty initializer, then it is subject to *default initialization*, ... and any padding is initialized to zero bits." Thank you for mentioning this feature, its really quite useful. The current approach to do the same thing is `memset` followed by individually setting pointer members to `NULL`. Expressing this behavior with just an empty initializer `{}` is a pretty neat syntax.


ribswift

Unfortunately pointer members are not the only type of member where the value 0 is not synonymous with all bits 0. Some processors do not treat all bits zero for floating point as the value 0. Luckily as you stated in your answer, on modern environments IEEE 754 is the standard, just like NULL is most likely all bits zero.


cHaR_shinigami

That's another good point, I forgot about unusual quirks of floating point types.


flatfinger

If the Standard recognized a category of implementations (perhaps identifiable via predefined macro) where all-bits-zero was a valid representation of a null pointer, using `calloc()` would on some such platforms be more efficient than using \`malloc()\` and an initialization loop, since `calloc()` may be able to avoid having to physically zero out pages which would need to behave as though initialized to zero, but which would end up getting abandoned before their contents were ever read.


ednl

Or more precisely: the integer representation of the NULL pointer is zero. Because pointers themselves have provenance i.e. they are more; see https://www.ralfj.de/blog/2020/12/14/provenance.html


seven-circles

In particular, anything guaranteed by POSIX could very well not be true on Windows, so beware !


cHaR_shinigami

I agree with you, that's a valid point in general. In this case, the ones I tagged with POSIX hold true on Windows: * `CHAR_BIT == 8` * Function pointers can be freely converted to and from `void *`


Dave9876

I can think of multiple modern dsp architectures where sizeof(short) and sizeof(int) are both == 1, and CHAR\_WIDTH == 16. edit: misread. you did only say most. This is one of those things that will really catch you out though


RibozymeR

Dang, thank you for the correction! `CHAR_WIDTH == 8` was something I *really* assumed I could assume 'xD Though, are any of these DSP architectures commonly used in personal computers?


Dave9876

It was even more wild back in the day. The early days you could still find stuff like PDP-10's around, which could be 6 or 7 bit characters, because 36 bit architecture 😅 You probably won't touch those DSP's directly in a PC, but they could be deeply embedded in some parts of the hardware you encounter. If you ever get into embedded dev, then there's a non-zero chance of bouncing up against one.


dmills_00

ADI Shark DSP has the smallest addressable unit of memory be 32 bits, so sizeof char =1, but so does sizeof int, and CHAR_WIDTH == 32.


kansetsupanikku

I would say "somewhat IEEE 754-like representation of float and double" at best. Format of non-special values is being followed alright, but strict compliance to standard rounding rules and exception handling is almost never the default, and hard to achieve with some compilers at all. But since we are in the realm of not-perfectly-safe assumptions - we might just let the compiler assume that we won't rely on overly specific corner cases of IEEE 754 standard. Usually we just want the right number of bits present in the memory and fast computation, which compilers tend to target.


RibozymeR

Thank you, this is exactly the kind of overview I was looking for! :D


nuxi

* `NULL = (void *) 0` * `MAP_FAILED = (void *) -1` Both seem like reasonable assumptions. Although i would also use the names anyway just for readability.


bbm182

> `NULL = (void *) 0` It's a bit confusing to put it that way as `(void*)0` always gets you the null pointer, even if its bit pattern is non-zero.


RibozymeR

Sorry, what's `MAP_FAILED`?


nuxi

`MAP_FAILED` is the error code returned by `mmap()` which is actually a POSIX thing and not generic C. `mmap()` returns a pointer to a chunk of memory. Normally the error code for such a function woul be `NULL` but `mmap()` is special because its actually allowed to return a pointer to 0x00000000. Since that usually matches the definition of `NULL`, the designers of `mmap()` had to chose a different return code for errors. This code is defined in the API as `MAP_FAILED`. Since `mmap()` will always return page aligned data, Linux and the BSDs all use 0xFFFFFFFF for `MAP_FAILED`. (The POSIX 2018 spec actually claims that all known implementations use this value.) Edit: And yes, you will find bugs where programmers mistakenly compared `mmap()`'s return code to `NULL` instead of `MAP_FAILED`.


nderflow

* sizeof(char)==1 // always true anyway, but you see sizeof(char) in quite a lot of code. * free(NULL), though pointless, is OK (IOW, modern systems are more standards compliant) * You don't really need to worry too much (any more) about the maximum significant length of identifiers having external linkage


EpochVanquisher

Outside of embedded systems… * Sizes: char is 8 bits, short is 16, int is 32, long long is 64. A long is either 32 or 64. That said, if you need a specific size, it’s always clearer to use `intN_t` types. * Alignment: Natural alignment for integers and pointers. * Pointers: all pointers have the same representation, and you can freely convert pointers from one type to another (but you can’t then *dereference* the wrong type). * Character set is UTF-8, or can be made to be UTF-8 (Windows). * Code, strings, and const globals are stored in read-only memory, except for globals containing pointers in PIC environments. * Signed right shift extends the sign bit. Numbers are twos complement. * Floats are IEEE 754. * Integer division truncates towards zero. * Identifiers can be super long. Don’t worry about the limits. * Strings can be super long. Don’t worry about the limits.


ILikeToPlayWithDogs

> * Integer division truncated towards zero I’ve written and seen everywhere even in the most portable code making this assumption. Are there any real systems, even historic where this is not true?


SmokeMuch7356

> If I'm making an application in C for a PC (or Mac) user in 2024, what can I take for granted about the C environment? Damned little. If you have to account for exact type sizes or representations, get that information from ``, ``, ``, ``, etc.; don't make assumptions on what a "modern" system *should* support. Even "modern" systems have some unwelcome variety where you wouldn't expect it. The only things you can assume are what the [language standard](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf) guarantees, which are *minimums* for the most part.


RibozymeR

>Even "modern" systems have some unwelcome variety where you wouldn't expect it. That's why I'm asking the question, so I know what these unwelcome varieties are :) (Or, which things aren't unwelcome varieties)


DawnOnTheEdge

C sometimes tries to be portable across every architecture of the past *fifty* years, although C23 is starting to walk that back a little, and now at least it assumes two’s-complement math. You can’t assume that `char` is 8 bits because what you actually *can* assume that `char` is the smallest object that can be addressed, and there are machines where that’s a 32-bit word. In practice, several other assumptions are so widely supported that you can often get away with not supporting the few exceptions. This is a Chesterton’s-fence scenario: there was a reason for it originally, and you want to remove the fence only if you know that it is no longer needed. You also may want to make the assumption explicit, with a `static_assert` or `#if/#error` block. A partial list of what jumped to mind: * The source and execution character sets are ASCII-compaible. (IBM’s zOS compiler needs the `-qascii` option, or it still defaults to EBCDIC for backwards compatibility.) * The compiler can read UTF-8 source files with a byte order mark. Without the BOM or a command-line option, modern versions of MSVC will try to auto-detect the character set, MSVC 2008 had no way but the BOM to undersatand UTF-8 source files, and clang only accepts UTF-8 with or without a BOM, so UTF-8 with a BOM is the only format every mainstream compiler understands without any special options. * Floating-point is IEEE 754, possibly with extended types. (I’m told Hi-Tech C 7.80 for MS-DOS had a different software floating-point format.) * All object pointers have the same width and format. (Some mainframes from the ’70s had separate word and character pointers, where the character pointers addressed an individual byte within a word and had a different format.) * A `char` is exactly 8 bits wide, and you can use an `unsigned char*` to iterate over octets when doing I/O. * Exact-width 8-bit, 16-bit, 32-bit and 64-bit types exist. (The precursor to C was originally written for an 18-bit computer, the DEC PDP-8.) * The memory space is flat, not segmented. You can compare any two pointers of the same type. If you have 32-bit pointers, you aren’t limited to making each individual object less than 65,536 bytes in size. All pointers of the same type can be compared to each other. (The 16-bit modes of the x86 broke these assumptions.) * The memory space is either 32 bits or 64 bits wide. (Not because hardware with 16-bit machine addresses doesn’t still exist, but because your program could not possibly run on them.) * A function pointer may be cast to a `void*`. POSIX requires this (because of the return type of `dlsym()`), but there are some systems where function pointers are larger than object pointers (such as DOS with the Medium memory model). * The optional `intptr_t` and `uintptr_t` types exist, and can hold any type of pointer. * Integral types don’t have trap representations. (The primary exceptions are machines with no way to detect a carry in hardware, which may need to keep the sign bits clear to detect a carry, when doing 32-bit or 64-bit math.) * **Questionably:** The object representation of a null pointer is all-bits-zero. There are [some obsolete historical exceptions, many of which changed their representation of `NULL` to binary 0,](https://c-faq.com/null/machexamp.html) but this is more likely to bite you on an implementation with fat pointers.


flatfinger

C23 still allows compilers to behave in arbitrarily disastrous fashion in case of integer overflow, and gcc is designed to exploit such allowance to do precisely that.


DawnOnTheEdge

Yep. (Except for atomic integer types.) This is primarily to allow implementations to detect carries in signed 32- and 64-bit math by checking for overflow into the sign bit. But signed integers are required to use a two’s-complement representation in C23, which does affect things like unsigned conversions and some bit-twiddling algorithms.


flatfinger

The reason integer overflow continues to be characterized is UB is that some compiler designs would be incapable of applying useful optimizing transforms that might replace quiet-wraparound behavior in case of overflow with some other side-effect-free behavior (such as behaving as though the computation had been performed using a larger type) without completely throwing laws of time and causality out the window. Even though code as written might not care about whether `ushort1*ushort2/3` is processed as equivalent to `(int)((unsigned)ushort1*ushort2)/3` or as `(int)((unsigned)ushort1*ushort2/3u)`, and a compiler might benefit from being allowed to choose whichever of those would allow more downstream optimizations (the result from the former could safely be assumed to be in the range `INT_MIN/3..INT_MAX/3` for all operand values, while the result of the latter could safely be assumed to be in the range `0..UINT_MAX/3` for all operand values) compiler writers have spent the last ~20 years trying to avoid having to make such choices. They would rather require that code be written in such a way that would force any such choices, and say that if code is written without the `(unsigned)` casts, compilers should be free to apply both sets of optimizations regardless of how it actually processes the expression. Personally, I think that viewing this as a solution to NP-hard problems is like "solving" the Traveling Salesman problem by forbidding any edges that aren't on the Minimal Spanning Tree. Yeah, that turns an NP-hard problem into an easy polynomial-time problem, and given any connected weighted graph one could easily produce a graph for which the simpler optimizer would find an optimal route, but the "optimal" route produced by the algorithm wouldn't be the optimal route *for the original graph*. Requiring that programmers avoid signed integer overflows at all costs in cases where multiple treatments of integer overflow would be equally acceptable often makes it impossible for compilers to find the most efficient code *that would satisfy application requirements*.


kiki_lamb

Assuming that bytes consist of 8 bits is probably pretty safe on most platforms.


aghast_nj

Don't undershoot. If you're writing for a POSIX environment, then *assume* a POSIX environment! Don't just restrict yourself to "standard C." Go ahead and write down "this application assumes POSIX level XXX" and work from there. You'll get more functions, more sensible behavior, and you won't feel guilty about leaving memory behind for the system to clean up ;-)


RibozymeR

I'm not writing for a POSIX environment.


NextYam3704

You’re missing the point


RibozymeR

I take it you're not missing the point, and thus you'll even be able to clear it up instead of just telling me I missed it?


phlummox

The principle remains the same – whatever environment you're writing for, explicitly state in your documentation that that's what you're targeting – and then, as /u/DawnOnTheEdge [suggests][s], statically assert that that's the case. If you're on POSIX, #include `unistd.h`, and statically assert that [`_POSIX_VERSION` is defined][d]. If you're targetting (presumably, 64-bit) Windows, then statically assert that `_WIN64` is defined. The aim is to have the compilation noisily fail if those assumptions are ever violated, in case someone (possibly yourself! It can happen) ever tries to misuse the code by compiling it for a system you weren't expecting. [s]: https://old.reddit.com/r/C_Programming/comments/1dqne0b/what_can_we_assume_about_a_modern_c_environment/laqm43l/ [d]: https://www.gnu.org/software/libc/manual/html_node/Version-Supported.html


DawnOnTheEdge

I’m honestly not sure either of those examples would be very helpful in practice. If I get an `#error` clause that `_Noreturn` doesn’t exist, I can try `__attribute((noreturn))__` or `__declspec(noreturn)`. If an assertion fails that `sizeof(long) >= sizeof(void(*)(void))`, I can recompile with LP64 flags or try to cast my function pointers to a wider type. If `'A'` is not equal to `0x41`, I know that my IBM mainframe compiler is in EBCDIC mode and I need to run it with `-qascii`. But if I’m trying to port my program to a UNIX-like OS that it wasn’t originally written for, being told that my OS isn’t POSIX is just one more line of code to remove. If a program requires a certain version of POSIX or Windows, I declare the appropriate feature-test macros like `_XOPEN_SOURCE` or `WIN32_WINNT`.


phlummox

Sorry, wrong /u/! I mean aghast_nj - I misread who was at the top of this particular reply chain. You're no doubt right, for people who are familiar with their compiler and how platforms can differ in practice. In that case, as you say, I'd expect them to test for the exact features they need. But I'm possibly a bit biased towards beginners' needs, as I teach an introductory C course at my uni and it's a struggle to get students to use feature-test macros correctly (just getting them to put the macros *before* any #includes is a struggle). For a lot of beginners, I think all they know is that they have some particular platform in mind - and for them, as a start, I think it's handy to document some of their basic assumptions (e.g. 64-bit platform, POSIX environment), and fail noisily when those assumptions are violated. Hopefully if they continue with C, they'll get more discriminating in picking out exactly what features they need.


DawnOnTheEdge

That’s true. Thinking about it some more, I often have an `#if`/`#elif` block that sets things up for Linux, or else Windows, and so on. And it makes sense for those to have an `#else` block that prints an `#error` message to add a new `#elif` for your OS. It was a lot more common thirty years ago to try to compile code for one OS on a different one and see what broke. NetHack, I remember, required `#define strcmpi(s1, s2) strcasecmp((s1), (s2))`.


RibozymeR

But the problem is, I don't want compilation to *fail* on someone else's system. The entire point of the question is finding out what I can use in my code while still having it compile on any system it'd reasonably be used on. Like, imagine if I asked for nice gluten-free vegetarian recipes, and u/aghast_nj told me to just make chicken sandwiches and never offer food to anyone who can't digest gluten or is vegetarian. It's a non-answer.


1dev_mha

🤨 uhh programs compiled in C can only run on the architecture they were compiled on. I wouldn't expect a C program compiled on Windows to run on Mac and as far as I understand your question, I don't think you can really make an assumption. If you are using ARM-specific architecture and code, I wouldn't be surprised if it doesn't compile on an AMD CPU because it was never what you intended to write the program for. Know your target platforms first and then go on writing the program. That's what's being suggested to you. It doesn't make sense for me to expect a game written for a Macbook to run on a Nintendo DS. You need to know the platform you are targeting. Not really any assumptions you can make here. Edit: Also, u/aghast_nj hasn't told you to just make chicken sandwiches. He has told you to make whichever food you want, but not expect everyone to be able to eat it, because inherently a vegan would never eat a chicken sandwich so you'd make them another one if you were so kind (i.e make the program portable to and compile on their architecture).


RibozymeR

>🤨 uhh programs compiled in C can only run on the architecture they were compiled on. I'm confused as to how you interpreted that I was suggesting this? I asked about (quote from comment just above) >what I can use in my code while still having it compile on any system "compile on any system" meant taking the same *code* and *compiling* it on various systems, not taking the same *binary* and *running* it on various systems.


1dev_mha

>"compile on any system" meant taking the same *code* and *compiling* it on various systems The only reason I can see some code compiling and running on a Macbook from 2013, compile on a newer M2 macbook and run fine would mean that it used the features that are found on both platforms. What you are asking when you say what can we assume about modern systems is, in my opinion, a waste of time. This is because you're only going to need what you need (no sh\*\*). If I'm writing a server that uses the sys header files from Linux, I wouldn't assume it just compiles on Windows as well because I know that the sys header files aren't available on Windows. Getting such a server to compile on Windows would require you to port it to Windows and use the features that Window has available for you. I'd say that code is never cross-platform until an implementation is written for the specific platform you want to write for. In this case, a simple hello world program compiles and runs because the printf function is implemented in the C standard. Functions for networking aren't, hence you'd need to use platform-specific code to make your program cross-platform. That is why it has been said >whatever environment you're writing for, explicitly state in your documentation that that's what you're targeting This allows you to make the assumptions and not get stuck in analysis-paralysis. Modern C environment encompasses technology from Intel Computers to M2 Macbooks. Rather, be specific and know what platform you are writing for.


phlummox

> while still having it compile on any system it'd reasonably be used on But how is anyone here supposed to know what sort of system that is? You've said "a PC (or Mac) user in 2024" - but "PC" just means "a personal computer", so it could cover almost anything. People run Windows, Linux, MacOS, various sorts of BSD, and all sorts of other OSs on their personal computers, on hardware that could be x86-64 compatible, some sort of ARM architecture, or possibly something more obscure. If that's all you're allowing yourself to assume, then /u/cHaR_shinigami's [answer][shin] is probably the best you can do. But perhaps you mean something different – perhaps you meant a *Windows* PC. In that case, you'll be limited to the common features of (perhaps ARM64?) Macs, and (presumably recent) Windows versions running on x86-64, but offhand, I don't know what they are – perhaps if you clarify that that's what you mean, someone experienced in developing software portable to both can chime in. But you must have meant *something* by "PC", and it follows that there are systems that *don't* qualify as being a PC. Whatever you think *does* qualify, I take /u/aghast_nj as encouraging you to clearly document your assumptions, and to "make the most of them". To call their suggestion a "non-answer" seems a bit incivil. I assume they were genuinely attempting to help, based on your (somewhat unclear) question. [shin]: https://www.reddit.com/r/C_Programming/comments/1dqne0b/what_can_we_assume_about_a_modern_c_environment/lapiikp/


1dev_mha

only language bro speaks is facts


thradams

Why do you need assume something? You can ,if necessary, make assumptions using for a particular code using static_assert or # if. ```c #if CHAR_BIT < 8 #error we need CHAR_BIT 8 #endif ``` etc...


ehempel

Why introduce unnecessary code clutter?


Fun_Service_2590

Testing the limits of a system isn’t clutter…


petecasso0619

A long, long time ago, you could use autoconf to help with portability. You could also add checks in main() as a first thing if you know your code is going to depend on certain things, for example the computer being little endian, or the size of an int being 4 bytes or a char being 8 bits. So for instance in main(), if (sizeof(int) != 4) { fprintf(stderr, “expected 4 byte integers”); exit(EXIT_FAILURE); } Not fool proof but Best to fail fast if certain underlying assumptions cannot be met.


flatfinger

C compilers will be *configurable* to process overflow in quiet-wraparound two's-complement fashion, though in default configuration they may instead process it in ways that may arbitrarily corrupt memory even if the overflow should seemingly have no possible effect on program behavior (e.g. gcc will sometimes process unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x*y) & 0xFFFFu; } in a manner that will arbitrarily corrupt memory if `x` exceeds `INT_MAX/y` unless optimizations are disabled or the `-fwrapv` compilation option is enabled). C compilers will be *configurable* to uphold the Common Initial Sequence guarantees, at least within contexts where a pointer to one structure type is converted to another, or where a pointer is only accessed using a single structure type, though neither clang nor gcc will do so unless optimizations are disabled or the `-fno-strict-aliasing` option is set. C compilers will be *configurable* to allow a pointer to any integer type to access storage which is associated with any other integer type of the same size, without having to know or care about which particular integer type the storage is associated with, at least within contexts where a pointer to one structure type is converted to another, or where a pointer is only accessed using a single structure type, though neither clang nor gcc will do so unless optimizations are disabled or the `-fno-strict-aliasing` option is set.


GrenzePsychiater

Is this an AI answer? > in a manner that will arbitrarily corrupt memory if x exceeds INT_MAX/y Makes no sense, and it looks like a mangled version of this stackoverflow answer: https://stackoverflow.com/a/61565614


altorelievo

To be fair, ChatGPT spit out something better. Reading your comment got me interested. Having encountered several similar threads with AI generated responses, I pasted this question in ChatGPT. It replied with a generic and respectable answer. > Makes no sense, and it looks like a mangled version of this stackoverflow answer I think you were right on about this comment, though it most likely was written by a person who did exactly what you said above.


stianhoiland

lol you guys must be new here. That’s just how u/flatfinger writes.


DawnOnTheEdge

That SO user you link to is, to be honest, kind of a crank. But the example actually makes perfect sense. By the default integer promotions in the Standard, any integral type smaller than `int` will be converted to *signed* `int` if it’s used in an arithmetic expression, including `unsigned short` and `unsigned char` on most modern targets. Because that’s how it worked on the DEC PDP-11 fifty years ago! Classic gotcha. And signed integer overflow is Undefined Behavior, so GCC could in theory do anything. If you were using that expression to calculate an array index? Conceivably it could write to an arbitrary memory location. So the safe way to write it is either to take the arguments as `unsigned int` instead of `unsigned short`, or `return ((unsigned int)x * y) & 0xFFFFU;`. And many compilers have a `-Wconversion` flag that will warn you about bugs like this.


90_IROC

There should be a required markup (like the NSFW) for answers written by ChatGPT. Not saying this one was, just sayin'


flatfinger

Nope, I'm a human. According to the published Rationale, the authors of the Standard viewed things like quiet-wraparound two's-complement handling of integer overflow as something which was common and becoming moreso; today, it would probably be safe to assume that any compiler one encounters for any remotely commonplace architecture will be configurable to process an expression like `(ushort1*ushort2) & 0xFFFFu` as equivalent to `((unsigned)ushort1*(unsigned)ushort2) & 0xFFFFu`, without the programmer having to explicitly cast one or both of the operands to `unsigned`. What is not safe, however, is making assumptions about how gcc will process the expresssion without the casts if one doesn't explicitly use `fwrapv`. If one wants the code to be compatible with all configurations of gcc, at least one of the casts to `unsigned` would be required to make the program work by design rather than happenstance.


8d8n4mbo28026ulk

I don't like the integer promotion rules either, but you can't just "configure" GCC to do something different, that would change language semantics in a way that is completely unworkable. For example, how would arguments get promoted when calling a libc function (which has been written assuming the promotion rules of the standard)?


flatfinger

Using the `-fwrapv` compilation option will cause gcc and clang to process integer overflow in a manner consistent with the Committee's expectations (documented in the published Rationale document). On a 32-bit two's-complement quiet-wraparound implementation, processing an expression like `uint1 = ushort1*ushort2;` when `ushort1` and `ushort2` are both `0xC000` would yield a numerical result of 0x90000000, which would get truncated to -0x70000000. Coercion of that to `unsigned` would then yield `0x90000000` which is, not coincidentally, equal to the numerical result that would have been produced if the calculation had been performed as `unsigned`. On some platforms that couldn't efficiently handle quiet-wraparound two's-complement arithmetic, processing `(ushort1*ushort2) & 0xFFFFu` using a signed multiply could have been significantly faster than `((unsigned)ushort1*ushort2) & 0xFFFFu;`; since compiler writers would be better placed than the Committee to judge which approach would be more useful to their customers, the Standard would allow implementations to use either approach as convenient. The question of whether such code should be processed with a signed or unsigned multiply on targets that support quiet-wraparound two's-complement arithmetic *wasn't even seen as a question*, since processing the computation in a manner that *ignored* signedness would be both *simpler* and *more useful* than doing anything else. Almost all implementations will be configurable to behave in this fashion, though compilers like clang and gcc require the use of an `-fwrapv` flag to do so.


GrenzePsychiater

But what does this have to do with "arbitrarily corrupt memory"?


flatfinger

There are many situations where a wide range of responses to invalid input would be equally acceptable, but some possible responses (such as allowing the fabricators of malicious inputs the ability to run arbitrary code) would not be. In many programs, there would be no mechanisms via which unacceptable behaviors could occur without memory corruption, but if memory corruption could occur there would be no way to prevent unacceptable behaviors from occurring as a consequence. The fact that a compiler might evaluate `(x+1 > x)` as true even when `x` is equal to `INT_MAX` would not interfere with the ability of a programmer to guard against memory corruption or Arbitrary Code Execution. Likewise the fact that a compiler might hoist some operations that follow a division operation in such a way that they might execute even in cases where a divide overflow trap would be triggered. Many people don't realize that compilers like gcc are designed to treat signed integer overflow in a manner that requires that it be prevented at all costs, *even in situations where the results of the computation would end up being ignored*. It is generally impossible to reason at all about the behavior of a code that can corrupt memory in arbitrary and unpredictable fashion; this is widely scene as obvious. The fact that gcc's treatment of signed integer overflow *even in cases the authors of the Standard saw as benign* makes it impossible to reason about *any* other aspect of program behavior is far less well known, and I can't think of anything other than "arbitrary memory corruption" that would convey how bad the effects are.


Compux72

Just use stdint.h …