Dennis Ritchie on alias analysis in C (1988)

_dh54 · on March 20, 2021

I had a hard time following Ritchie’s argument. I guess I’m missing a lot from the proposals he’s critiquing. One part stood out that was not clear:

> memcpy(noalias void *s1, const noalias void *s2, size_t n);

> what information can one glean from it? Some committee members apparently believe that it conveys either to the reader or to the compiler that the routine is safe, provided that the strings do not overlap. They are mistaken.

I don’t follow. It seems that noalias exactly means that the two arrays can never point to the same memory, thus they do not overlap. So why would it be wrong to assume that memcpy() with noalias arguments is safe if called with non-overlapping memory?

————

Separately I agree with him that “noalias” unlike “const” is not a property of the object being pointed to but rather a property of the access being done. It creates an inconsistency where “char * noalias” makes sense but “char noalias” doesn’t. Perhaps he would have supported a “noalias dereference” operator or compiler built-in instead.

pm215 · on March 20, 2021

I think this is the ANSI draft Ritchie was commenting on: http://www.3kranger.com/LabNotes/ANSI-C-X3J1188-Draft.pdf -- section 3.35.3 talks about 'noalias'. It does not define 'noalias' in terms of whether arguments do or do not overlap -- it defines concepts of 'actual objects' which lvalues are handles to, and 'virtual objects' which lvalues with the noalias attribute are handles to, and rules about the contents of the virtual object being synchronized with the actual object at various points. Presumably the intention was to define abstract semantics corresponding to "OK if you don't overlap arguments" but it's definitely not clear to me that this hard-to-understand virtual-and-actual-objects business is in fact doing that.

I think Ritchie may be saying that the expectation is that if you have an array x[1024], then memcpy(x, x + 512, 10) should be OK, as it's copying between two non-overlapping slices of the array; but the noalias annotation is overbroad and acts on the entirety of the x[] array, making the call undefined behaviour. But I'm not sure :-)

Denvercoder9 · on March 20, 2021

> It seems that noalias exactly means that the two arrays can never point to the same memory, thus they do not overlap.

I think that `noalias` would enforce that `s1` and `s2` aren't equal, but that doesn't mean that `s2` doesn't partially overlap with `s1` (i.e. it starts at an offset of `s1`).

_dh54 · on March 20, 2021

I see now. The signature itself would only require that the values passed in are not equal, not that the range doesn’t overlap.

If that indeed was his criticism, that’s easily fixable. The standard could simply add language that says “noalias” must be true for the whole range of the array. Maybe that would prevent other practical use cases.

I think his consistency argument is more powerful, if maybe less practical.

dooglius · on March 20, 2021

blogspam for https://groups.google.com/g/comp.lang.c/c/K0Cz2s9il3E/m/YDyo...

dundarious · on March 20, 2021

I was grateful for this context in the introduction, even if I was already familiar with Torvalds and Regehr's comments. The introduction is extremely light, but I don't think it amounts to "blogspam", and if the implication is that your direct link should be substituted, I disagree.

> For further reading try Linus Torvalds[1] (pre-nice) note, John Regehr’s paper[2] on alias and this proposal[3] for the C 2x standard.

[1] https://www.yodaiken.com/2018/06/07/torvalds-on-aliasing/

[2] https://blog.regehr.org/archives/1307

[3] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2577.pdf

pjmlp · on March 20, 2021

Then 10 years later restrict was born with many of the same caveats that Dennis Ritchie refers to noalias.

A good example of when language authors no longer control its direction, rather the standards body.

Blikkentrekker · on March 20, 2021

Because languages won't allow themselves to be controlled.

A language auctor could produce his own standard, and watch as all the big vendors adopt the standard of the standard body instead. Why? because the standard body listens to what the big vendors want, who typically send their emissaries to the body, to reach a compromise they can all live with and seek to implement.

A language auctor may try to dictate what he thinks is best, but if the big compiler vendors disagree with regards to their own interests and that of the users they serve, they will not implement it.

kps · on March 20, 2021

“a license for the compiler to undertake aggressive optimizations that are completely legal by the committee's rules, but make hash of apparently safe programs”

That phrase, describing ‘noalias’, proves that nobody at the time realized or intended what “undefined behavior” would turn out to mean.

xiphias2 · on March 20, 2021

I think Dennis would love what Rust provides, it took just 45 years to fix noalias with a new language.

pjmlp · on March 20, 2021

I advise reading the manual of NEWP, released in 1961.

https://en.wikipedia.org/wiki/NEWP

And also the C history, written by Dennis.

https://csapp.cs.cmu.edu/3e/docs/chistory.html

C took over the world thanks to UNIX, while Denis and Thompson decided to ignore what came the decade before.

> Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own.

And they were aware of some caveats

> To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions.

Problem is that unlike Dennis and Thompson, static analysis is a foreign words for most C devs (current surveys place it around 11%).