Attach
This is something I've been thinking about for a while, and I want to
solicit feedback and hopefully consensus from our wonderful programming
community at large on the matter. I want to consider what programming
language(s) we should be moving towards in new code in TB. The key here
is "new code"; it is infeasible, and probably ill-advised, for us to try
a massive rush to rewrite all of our code into whatever target language
we want just for the sake of rewriting code in a language, but we should
be thoughtful if it is worth migrating code to a better language as we
improve or modernize it.
Within the Mozilla build system, we essentially have a choice of
adopting 4 languages. I'll summarize their advantages and disadvantages
here:
Javascript
*Advantages
*
Dynamically typed
We can very aggressively new language features, due to our reliance
on tip-of-trunk SpiderMonkey
No need to recompile after editing source files
Good tooling for debugging, testing, benchmarking--if our code can
run in the environments that those tools want
*Disadvantages
Dynamically typed
Workers have a cumbersome model, which makes it hard to move to
multithreaded code
System APIs (e.g., filesystem, networking) are generally unavailable
unless there's an XPIDL or DOM interface exposing them
The scope for a lot of third-party packages tend to imply a Node.js
backing, which requires shims to implement outside of Node.js
A lot of dumb programmer errors (e.g., fat-fingered a name) can only
be caught at runtime
Support for binary parsing is kind of sucky, and support for
quasi-binary is really poor
Performance can be dicey or unpredictable
XPIDL-based C++
*Advantages
*
Our code is already written in this format
XPIDL is the most flexible FFI system we have at the moment. The
only missing path is calling XPIDL from JS workers
*Disadvantages
The API style is outdated, and it requires lots of macros and other
magic incantations to get stuff done
Mozilla's long-term commitment to XPIDL is questionable (but XPIDL
is basically a way of enforcing a common AVI--with the exception of
xpconnect, it's pretty trivial to maintain ourselves)
Mozilla considers XPIDL to at the very least be soft-deprecated
Promise-style async API has pretty much no support whatsoever here
Modern C++
*Advantages
*
Modern C++ is actually quite an ergonomic language, at least if
you're not attempting to wrap everything into a std::move and
&&-based logic
Mozilla's done a fairly decent job at providing a useful library of
standard ADT and some system API bits in MFBT and de-COM'd xpcom code
Compared to XPIDL, being able to chain method calls or not have to
assume that every function can potentially fail is a big win
*Disadvantages
Using lambdas for callbacks can easily create use-after-free bugs
Exposing this to anything else is generally difficult unless there's
already a bindings framework to use.
Mozilla generally prohibits the use of the STL
There's a lag between new features being added to the standard and
our ability to pick them up
No consistent API for handling error propagation, either in
non-Mozilla projects or in Mozilla code
Rust
*Advantages
*
Rust's handling of strings-versus-binary-versus-ASCII-but-maybe-not
is the best of the set of possible languages
The error handling and propagation is pretty sane, safe, and
ergonomic. Least likely to have errors get dropped on the floor with
no one noticing that an error ever happened
The borrow checker allows for enforcing quite a few invariants in
the type system
Rust can easily compile to WebAssembly, which allows an extra way to
call from JS code for computation-heavy code
Cargo is probably the friendliest package system I've tried dealing with
*Disadvantages
As a newer language, we're less likely to see knowledgeable contributors
Assuaging the borrow checker can be challenging for novices
Rust<->JS calls are particularly challenging
All of the vendoring of crates happens in mozilla-central--it could
be a challenge if we start using libraries that m-c doesn't use
I don't think there is a great benefit for enforcing that we have to
pick one language to implement the entirety of TB in. The reality is
that we don't have the bandwidth to rewrite everything. Even if we could
magically wave that away, the reality of system integration is that
systems require us to have support for native languages--which include
C++, Objective-C, even Java for Android--to implement necessary
features. Furthermore, I don't see any realistic way of cutting lose our
dependency from the Mozilla stack, and I think that people are too
optimistic when looking at the challenges of shimming all of the system
APIs if we were to try to support multiple stacks. At the end of the
day, multiple languages I don't see as the biggest barrier, or even one
of the biggest barriers, to development. From that perspective, then, it
makes sense to break down our components into smaller pieces to figure
which languages ought to be used. Here are the components as I see them:
UI and frontend
This will be implemented in JS. There are a few pieces right now which
are not (nsMsgDBTreeView anybody?), but I expect that even these would
likely eventually move to JS. And I doubt there's any room for
discussion on the matter.
Protocols, including formats such as MIME, TNEF or PGP
As I've discussed in my last thread, this is the sort of stuff where JS
suffers the most. It's also where Rust tends to shine the brightest. Of
course, there are complications. For complex things, particularly IMAP,
you generally want a procedural structure for the code; this suggests
off-main-thread synchronous I/O or an async/await implementation. C++
and Rust are both /getting/ coroutine support in some form, but neither
of them have it yet: Rust could probably see async/await stabilized by
the end of this year, and C++20 officially merged coroutine support only
a month ago (so wait at least three or four years for minimum-supported
compilers to get it).
My personal opinion here is that we should stick with the status quo for
now, but we should explore the viability of Rust implementations. If
Rust doesn't pan out, then I would suggest looking for a modern C++
implementation instead. I don't think we should choose to implement this
sort of stuff in JS.
Database (which includes both the msgdb and the mailbox store)
It would be daft of us to try to implement a database (as in something
like Mork or SQLite) ourselves (and yes, I'm aware of the irony that we
actually do exactly this right now). Building high-performance, durable
databases is a specialized skillset that is rather orthogonal to the
main challenges of building an email client, and I don't think we have
any community members with adequate expertise in that skillset. It's
much easier just to reuse an off-the-shelf implementation. Of all the
components I discuss, this is the one that most needs invasive
modifications, and the necessary modifications are going to have to
amount to a complete rewrite before long: the present API forces us to
doing lots of stuff as synchronous, on-main-thread disk access, which is
a recipe for performance issues and one whose results have already been
observed.
The database API obviously is going to be most heavily used by the UI.
The connections to the protocol implementations--particularly if we
rewrite them to be separate the protocol from the consumer logic--are
much weaker and can be encapsulated in a few smaller details (basically,
there's a method to convert a MIME message to a header object, some
potentially-batched add operations, flag toggles, message deletion by
key, and a list-keys operation, and I think that's it, although I
haven't audited IMAP code in great detail). I don't have a strong
opinion here; it depends on what database implementation best suits our
needs.
The smorgasbord of everything else
What I called out above is the code that is likely to be performance
sensitive. Everything else that falls into this category of "other" is
likely not to be (search is the possible exception, but if the database
exposes a search API, most of the critical performance pieces are going
to be in the database bits rather than the wrapping search code). The
code that deals with allowing for multiple backends--that's for both
mail protocols and address books--is going to face the reality that some
implementations will want to be JS and some will want to be native (see
my early note that system integration is going to invariably involve
native code somewhere). But the orchestration code itself could
reasonably be in any language, so long as the cross-language bindings
are possible.
Thoughts/questions/comments/concerns?
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist
Just a couple of comments to this. Not sure what my comments are worth
as they are just observations but here goes anyway.
I should add that, regrettably, I've never contributed to Thunderbird
but I am working on a cross-platform project with some broad similarities.
On 19/03/2019 02:54, Joshua Cranmer 🐧 wrote:
Modern C++ is actually quite an ergonomic language, at least if you're
not attempting to wrap everything into a std::move and &&-based logic
I'm not a C++ programmer but isn't one of the key points of modern C++
to use reference semantics and all that cool stuff? :-)
Mozilla generally prohibits the use of the STL
That's rather a shame. I recently made an effort to learn some C++
(coming largely from C#) and it seems to me that STL is where modern C++
really shines.
[Rust] As a newer language, we're less likely to see knowledgeable
contributors
This is true but, on the other hand, it seems to me that you are perhaps
more likely to see contributors in general. The reason I say this is
because Rust is the latest cool thing. Enthusiastic coders are a
valuable commodity. It seems to me that Rust is a better language
overall for application development than C++, too.
Even if we could magically wave that away, the reality of system
integration is that systems require us to have support for native
languages--which include C++, Objective-C, even Java for Android--to
implement necessary features.
This is an interesting statement. Can you suggest some of the things
you're thinking of?
The cross-platform project I mentioned above is being coded in C# on top
of .NET Core and I am not sure that this locks us out of anything. (I'm
not suggesting C#/.NET for Thunderbird, of course, only that system
integration is potentially available to languages other than native
ones). As an aside, if we were to begin the project now then either Go
or Rust might be alternatives to C#.
My personal opinion here is that we should stick with the status quo
for now, but we should explore the viability of Rust implementations.
If Rust doesn't pan out, then I would suggest looking for a modern C++
implementation instead. I don't think we should choose to implement
this sort of stuff in JS.
I'd be interested to see what current Thunderbird contributors think of
this. It seems to me that most people who are current contributors to
Thunderbird are quite likely to be C++ aficionados. What are their views
of using Rust instead in this area?
--
Mark Rousell
On 19-03-2019 18:17, Mark Rousell wrote:
I'd be interested to see what current Thunderbird contributors think
of this. It seems to me that most people who are current contributors
to Thunderbird are quite likely to be C++ aficionados.
Actually, we have very few contributors who want do dive deeply into
C++. I think it would be fair to say, most people touch the C++ code
only out of necessity. Can't think of any actual feature development
being done there during the last 10 years (excluding things related to
improved IMAP support). The absolute majority of contributors are more
comfortable in JavaScript.
-Magnus
Interesting, thanks for the write up. If I squint I can clearly see
that Rust has the best ratio of advantages to disadvantages in terms of
the raw amount of text in each column, so there's that.
Joking aside, as a C++ beginner who is most familiar with JS, I find
Rust very appealing. The lack of foot guns, the borrow checker, the
smaller surface area, the modern language features, etc. all make it
much more accessible for those of us accustomed to working in higher
level languages. I'd feel much more confident writing / editing /
debugging Rust as compared with C++. That's worth something, and maybe
quite a bit. We're fortunate to have it as an option on the Mozilla menu.
A thought on protocol code: as the Rust community builds out their
ecosystem, I would think they would be interested in having libraries
(crates) for working with common protocols. That might be a good
opportunity for collaboration, and for sharing the load of development
and maintenance.
As I've discussed in my last thread, this is the sort of stuff where
JS suffers the most. It's also where Rust tends to shine the brightest.
Can you say more and/or can someone point me to your last thread? I'm
just curious about the reasons and I wasn't able to find it when I looked.
UI and frontend
This will be implemented in JS.
TypeScript would be worth considering here, unless it's ruled out as
infeasible within the constraints of the Mozilla build system.
That said, while looking into TypeScript again, I found another option
that might be even better for Thunderbird: using TypeScript to type
check vanilla JS. I'll start another thread about that.
-Paul
On 3/20/2019 8:37 AM, Paul Morris wrote:
Interesting, thanks for the write up. If I squint I can clearly see
that Rust has the best ratio of advantages to disadvantages in terms
of the raw amount of text in each column, so there's that.
Joking aside, as a C++ beginner who is most familiar with JS, I find
Rust very appealing. The lack of foot guns, the borrow checker, the
smaller surface area, the modern language features, etc. all make it
much more accessible for those of us accustomed to working in higher
level languages. I'd feel much more confident writing / editing /
debugging Rust as compared with C++. That's worth something, and
maybe quite a bit. We're fortunate to have it as an option on the
Mozilla menu.
A thought on protocol code: as the Rust community builds out their
ecosystem, I would think they would be interested in having libraries
(crates) for working with common protocols. That might be a good
opportunity for collaboration, and for sharing the load of development
and maintenance.
As I've discussed in my last thread, this is the sort of stuff where
JS suffers the most. It's also where Rust tends to shine the brightest.
Can you say more and/or can someone point me to your last thread? I'm
just curious about the reasons and I wasn't able to find it when I
looked.
UI and frontend
This will be implemented in JS.
TypeScript would be worth considering here, unless it's ruled out as
infeasible within the constraints of the Mozilla build system.
That said, while looking into TypeScript again, I found another option
that might be even better for Thunderbird: using TypeScript to type
check vanilla JS. I'll start another thread about that.
-Paul
Perhaps some Thunderbird experiments should be done in Rust, both by
those who have used it and those who have not, and flesh out the issues
and draw some useful conclusions?
On 20/03/2019 07:45, Magnus Melin wrote:
Actually, we have very few contributors who want do dive deeply into
C++. I think it would be fair to say, most people touch the C++ code
only out of necessity. Can't think of any actual feature development
being done there during the last 10 years (excluding things related to
improved IMAP support). The absolute majority of contributors are more
comfortable in JavaScript.
That's very interesting. Thanks.
If approachability of C++ is an issue then I'd have thought that Rust
would be something of an improvement (albeit still a significant
learning curve).
--
Mark Rousell
On 20/03/2019 12:37, Paul Morris wrote:
A thought on protocol code: as the Rust community builds out their
ecosystem, I would think they would be interested in having libraries
(crates) for working with common protocols. That might be a good
opportunity for collaboration, and for sharing the load of development
and maintenance.
In the "wouldn't be nice if" department, I would have thought that the
Rust ecosystem could really do with an equivalent of Jeffrey Stedfast's
MailKit and MimeKit.
--
Mark Rousell
On Wed, Mar 20, 2019 at 15:35 +0000, Mark Rousell wrote:
On 20/03/2019 12:37, Paul Morris wrote:
A thought on protocol code: as the Rust community builds out their
ecosystem, I would think they would be interested in having libraries
(crates) for working with common protocols. That might be a good
opportunity for collaboration, and for sharing the load of development
and maintenance.
In the "wouldn't be nice if" department, I would have thought that the
Rust ecosystem could really do with an equivalent of Jeffrey Stedfast's
MailKit and MimeKit.
FWIW we are heading towards porting https://delta.chat (a chat e-mail client)
core dependencies from C to Rust. We are starting by replacing Delta's
current netpgp dependency with https://github.com/dignifiedquire/rpgp
and have also applied for Security code review funds.
Apart from OpenPGP, we are thinking to integrate Rust SMTP and IMAP libraries
and thus substitute libetpan and cyrusasl/openssl with Rust equivalents. To fully
get rid of libetpan we also need to add mime-parsing and generation in Rust.
However, Delta Chat has probably a lot less API requirements than Thunderbird
so even if we succeed, it doesn't mean it's directly integratble with TB.
But it's probably still useful to know that some folks in the e-mail space
are heading for this C->Rust strategy, and there is potential synergy.
holger
On 2019-03-20 12:45 a.m., Magnus Melin wrote:
Actually, we have very few contributors who want do dive deeply into
C++. I think it would be fair to say, most people touch the C++ code
only out of necessity. Can't think of any actual feature development
being done there during the last 10 years (excluding things related to
improved IMAP support). The absolute majority of contributors are more
comfortable in JavaScript.
Speaking only for myself as a very light contributor with a couple of
Javascript patches submitted, learning and working on Thunderbird in
Rust appeals to me, but C++ doesn't at all. That's because I think if
I'm going to learn something new I'd rather it be in Rust, which seems
more interesting and exciting, has a lot of growth and research ahead,
and is still in a relatively early stage.
On 3/20/2019 8:37 AM, Paul Morris wrote:
Interesting, thanks for the write up. If I squint I can clearly see
that Rust has the best ratio of advantages to disadvantages in terms
of the raw amount of text in each column, so there's that.
Joking aside, as a C++ beginner who is most familiar with JS, I find
Rust very appealing. The lack of foot guns, the borrow checker, the
smaller surface area, the modern language features, etc. all make it
much more accessible for those of us accustomed to working in higher
level languages. I'd feel much more confident writing / editing /
debugging Rust as compared with C++. That's worth something, and
maybe quite a bit. We're fortunate to have it as an option on the
Mozilla menu.
A thought on protocol code: as the Rust community builds out their
ecosystem, I would think they would be interested in having libraries
(crates) for working with common protocols. That might be a good
opportunity for collaboration, and for sharing the load of development
and maintenance.
In practice, every library I've come across just bolts its own I/O
framework into the system. This is independent of language; you don't
really see many bring-your-own-I/O packages out there. This is
challenging for us because we do need to reuse some of the Mozilla
infrastructure for I/O (particularly around proxy resolution and SSL
certificate policy). There's another issue in that most simple packages
also tend to export a sync-only interface, and building an async API on
top of that is challenging. SASL support is also mildly challenging,
particularly because Mozilla does some dynamic loading for support for
GSSAPI and NTLM SSO-style SASL support.
There's another complication in that most of the protocols are generally
too simple; you can describe a high-level interface to SMTP and POP with
one function (with a bajillion parameters, in SMTP's case), and NNTP and
IMAP start to get a lot more in the mode of "we're synchronizing our
local database with a remote database," which tends to bind you tightly
to your internal architecture. MIME is mostly stuck in the category of
the everyone-implements-me-sync category which makes async
implementation difficult. It's very much the case that there are no
existing libraries we can borrow [1], and it's not clear to me how much
value there is in actually ensuring that we can release widely usable
packages for this infrastructure.
As I've discussed in my last thread, this is the sort of stuff where
JS suffers the most. It's also where Rust tends to shine the brightest.
Can you say more and/or can someone point me to your last thread? I'm
just curious about the reasons and I wasn't able to find it when I looked.
The thread in question is at
nntp://news.mozilla.org/mozilla.dev.apps.thunderbird/16018.
UI and frontend
This will be implemented in JS.
TypeScript would be worth considering here, unless it's ruled out as
infeasible within the constraints of the Mozilla build system.
We cannot use TypeScript in the existing build environment, which is why
I excluded it. I don't know how hard it would be to integrate, but it
does reduce some of the features of JS (e.g., you lose aggressive
language feature timelines and no need to recompile after editing source
files).
[1] Not that I've sampled the space too thoroughly. But my prior
searches lead me to suggest that most of the available packages end up
being hobby projects rather than attempts to build robust
implementations. SMTP may be the biggest exception, as some of the SMTP
clients at first glance do implement a thorough number of features. But
the IMAP in particular I saw a comment on Hacker News stating that most
of the packages didn't even implement it correctly.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist