Using a borrow checker to track mutable refs in a GCed FP language?

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

Post Body

I'm designing a statically typed, compile-to-JS language. It'll be an FP/OO language, inspired by languages like Scala and TypeScript, but hopefully way simpler for the programmer to wrap their head around how the type system and type inference works, while still having it be powerful and expressive. Rust is also an inspiration to me, for using static type system features to solve what would otherwise be inefficiencies in compiling high-level features to lower-level code. I care about performance, though obviously once you accept GC, your performance concerns are at a slightly different level. I've performance-optimized a lot of Java and JavaScript code in my career, and a lot of it comes down to avoiding unnecessary allocations (even though generational GC is supposed to be fast for short-lived objects, and engines like V8 perform some escape analysis to eliminate allocations, etc), as well as choosing data representations that pack data as tightly in memory as possible. It also really irks me that nice features in JavaScript like iterating over an array using for-of instead of for(let i = 0; (and so forth) carry significant performance consequences. It's like the opposite of zero-cost abstraction; they add nice syntax, that seems like it could even be faster, but actually involves memory allocations or lots of extra steps under the hood. So I'd like to remedy that in my language.

Anyway, in terms of programming patterns that I want to embrace in the language, I like the pattern of preferring immutable data, i.e. trees of records and lists/arrays (or maps, sets, etc.). Treating objects as immutable by convention is a common pattern in JavaScript, already, though immutability is not enforced at compile-time (even with TypeScript) or runtime (you can "freeze" objects but it has a performance impact and it just makes writes fail silently, so no one does it). Clojure is an example of an FP language that preaches that everything should be immutable, all the time, except for "atoms" that hold time-varying values. If you want to change one value deep in some data structure (for example, changing the value of x.y.z[3] where x is the value of some atom), you path-copy the whole tree, and set the atom's value to the new tree, if that makes sense. It's very inefficient.

In my language, I'd like to support mutating records, maps, sets, and lists in place, and even lists where you can efficiently perform splices in the middle (e.g. backed by B trees), but I'd also like to support passing around immutable data (not just read-only but actually constant, will-never-change values). I came up with a system I like for blending mutable and immutable tree structures that involves keeping a "mutable?" flag on each node, where the nodes are internal to a tree data structure. Nodes are initially mutable, and changes to the data structure are performed in place. If you ask for a constant "snapshot" of the data structure, or a clone that you can independently modify, what happens is it marks all the nodes as immutable and shares them. Only when you then go to mutate one of the trees are nodes copied (copy-on-write). This works well. Immutable nodes only have immutable children. Mutable nodes are never shared. To make a change to a node, the code starts at the top of the tree and walks down, and if it encounters an immutable node, it copies it, and the copy—being a fresh and unshared node—is marked mutable. With this scheme, you get mutable data structures AND structure sharing, and it is cheap to build structures mutably and then use them immutably. Only if you need to snapshot or clone your structure and then make changes (to the original structure or the clone) is there any copying.

This is great, but it is not a static, type-level way to track mutability or immutability! I think I may still use the above system in the implementation of some things, and there are cases in application development where this kind of structure sharing and snapshotting is really useful—not just for debugging, but for implementing application-level functionality like snapshots, undo/redo, and collaboration—but a lot of times code does not need an immutable snapshot or "copy" of a data structure, it just needs to be able to read from it and treat it as constant. At the type level, what I think would lead to a great mix of performance and correctness is if a function could take a "const" argument that is considered "deeply" const—accesses are referentially transparent, the value will never be observed to change—but not necessarily forever, just for the lifetime of the function. Of course, the function must not save references to these values, because the underlying objects might change later. This suggests something like what Rust does as far as tracking read-only and mutable references. Suppose you are writing an application, and all the state is in this big compound data structure, stored in an "atom" (a box that holds one big mutable value). The various bits of your UI rendering code are functions that take in pieces of this data structure, and they want to treat these values as constant for the lifetime of the function. You'll also have "action" functions that mutate bits of the data structure, but it should be statically impossible for such a mutation to affect any read-only or mutable ref that is "alive," either to change its value or invalidate it (by changing a parent object).

I haven't worked out the details, but this seems doable, by analogy to what Rust does. You'd have types "ref" and "mutable ref" that "point into" atoms, and internally keep track of the "path" into the atom that they are pointing to. You'd have strict rules in the compiler that refs have to stay on the stack and not be aliased. If you had a mutable ref to A, which has a member B, you could get a mutable ref A.B, but then you can't mutate A for the lifetime of the sub-ref A.B, because it could potentially invalidate it (especially if B is an element of a list or a value of a map). You could pass the ref to A.B into a function, and it could read and write to it if it was written to take a mutable ref. The ability of functions to perform mutations in general would be very limited, maybe to mutable refs that they take as arguments. I'm not trying to track ALL side effects in my language, but I guess I would have to be careful to not give arbitrary functions the power to go and set some global atom.

One way that some "persistent immutable data structures" libraries solve the problem that every little mutation requires multiple new objects to be allocated is to support "batches" of mutations, where in the middle of a batch, objects are mutated in place. I think this amounts to the same things I described above, like being able to turn an immutable structure into a mutable one and back with as little copying as possible, and then if you allow arbitrary functions as "batch" operations, ideally it would be enforced that the function is pure except for the mutations it performs, which I think are typically easiest to express imperatively.

So my questions are, does this connect with anything you have thought about, and do you know of any languages or papers I should take a look at along these lines?

Thanks.

Author

Account Strength

100%

Account Age

16 years

Verified Email

Yes

Verified Flair

Total Karma

11,627

Link Karma

1,087

Comment Karma

10,205

Profile updated: 1 week ago

Posts updated: 2 months ago

dgreensp

Subreddit

r/ProgrammingLanguages

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 1 year ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/Programming...