https://bugzilla.mozilla.org/show_bug.cgi?id=1463289
Before commenting, please read this post in full. I know what some of
you will respond, and I tried to anticipate the responses.
libmime is written in object orientated C, using C syntax to define and
call classes, but it's compiled as C++. We should change the class
definitions to proper C++. This project should be kept minimal in scope,
to reduce the risk. It merely changes the syntax and not the
implementation, so the risk for regressions should be minimal. The code
changes are straight-forward and mechanical. That's what makes it
feasible. More ambitious changes can be done later and are out of scope.
Background
libmime is our core backend library that processes all incoming email
and converts it for display and other purposes. It lets us understand
emails, and display them. It know all our details about decoding email
formats, and implements all the details about the message body display.
So, it's a core library.
It's been designed and written by jwz, who founded mozilla.org, together
with others, and who was the first technical lead of mozilla.org, before
Brendan Eich. libmime was actually created long before mozilla.org, but
somewhere in the Netscape 2.0 or 3.0 time frames.
Back then, he decided to write this backend library in C, not C++. Back
then, C++ was a language used for GUI implementation, go figure.
However, he saw the value of object orientation and a class hierarchy.
So, he designed and implemented his own object system, in pure C. He's
not the only one to do that, GTK+ also chose to use C, but with OO.
However, GTK+ is passing around a lof ot null pointers and then casting
them. Not so libmime. It has to use casts, due to the C language but it
casts from higher level object types, so there are no (or hardly any)
null pointers in the system, so it's actually type safe. It's the only
OO system in C that I've ever seen that manages to avoid null pointers.
It also implements all 3 object orientation properties, of
encapsulation, inheritance, and polymorphs. All in C. That's kind-of
ingenious, really. More details are in mimei.h
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.h.
I highly recommend reading the large comment at the top, it's very
educating and enlightening.
Also, libmime has a very nicely designed class hierarchy. It's in fact
the clearest class hierarchy in all of Thunderbird, and the best
documented one, too. Again, top of mimei.h and mimei.cpp.
mimei stands for libmime interface and contains the "switchboard" that
decides which libmime implementation class to use for parsing which MIME
part.
This project
But it's called mimei.cpp, not mimei.c, so isn't it already C++? Yes and
no. It's compiled as C++ code, and we do already use C++ Mozilla classes
like nsString in libmime. But the libmime classes themselves are still
defined in C. That is a left-over and no longer necessary. A lot of the
prejudice against libmime comes from the fact that people don't
understand that this is actually C code.
I think it would make sense to convert libmime classes to real, modern
C++ classes. It would make libmime a lot more readable to today's
developers. At the same time, it's a very feasible project, because
libmime is already completely object orientated, and it's already
compiled as C++, so the transition is a pure syntactical one. At the
same time, we could clean up the code style to be in line with Mozilla
Style Guide for new code.
It would be a very straight-forward, simple transition, a simple mapping
of syntax. None of the logic changes, at all. We don't even have to
change the string and buffer use at all, and we will not, to avoid
breaking existing code. That is the only thing that makes this project
viable. It reduces the risk of regressions to almost zero, because it
only changes language syntax and not any implementation. Otherwise, I
would not attempt this.
FAQ
In this project here, we don't even have to touch the function
implementations, only change the syntax of function calls and function
signatures. Many of the function implementation lines won't even change.
It would not only be a syntax change, but a re-implementation. It's a
much larger change, and not one I'm ready or able to make.
In other words, changing to JS would probably be 10 or 100 times more
work than what I propose here.
This has been attempted a few years ago, and they found that JSMime
cannot yet replace libmime. I don't know JSMime, nor am I familiar with
the reasons in detail, but they were severe enough that this project was
abandonned and JSMime parsing used only for header parsing, not content
parsing. JSMime was written by jcranmer, and he's a highly competent
engineer, and I respect him. If he didn't manage to get it to production
quality for 25 million users, I won't dare to attempt that. The risk is
huge.
libmime and its support libraries have features that JSMime does not
have. For example, the plaintext conversion code received a lot of love
and is very finely tuned. That includes quote recognition, struct and
smiley recognition, and URL recognition, which is the most difficult.
All this would have to be re-implemented and ported, to match the
existing code, to avoid regressions. This part alone is a year-long
project to fine-tune, and that's just one part of libmime, but there are
many others.
In other words, changing to JSMime would be between 10 and 100 times
more work than what I propose here.
Scope
What is the goal?
Convert libmime class definitions from C syntax to C++ syntax. Same
for function definitions.
Adapt the function calls, remove the casts and change |myClass| et al
to |this|.
Use a C++ name space and remove the pseudo name space in the class names.
Change the class method names to the Mozilla Style Guide
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style.
Only the method names, not the parameters.
What is out of scope?
Atomic commits
https://en.wikipedia.org/wiki/Atomic_commit#Atomic_commit_convention:
"Greater understandability comes from the small size and focused nature
of the commit. It is much easier to understand what is changed and
reasoning behind the changes if you are only looking for one kind of
change. This becomes especially important when making format changes to
the source code. If format and functional changes are combined it
becomes very difficult to identify useful changes"
Exceptions. Error handling stays with integer result codes for now. We
will even stay with libmime's own result code, not nsresult. This limits
the scope and risk of this change here. This could be changed later in a
followup commit to either nsresult or exceptions, if desired. Changing
error handling changes mostly the function implementations, while this
project changes mostly the class definitions. It's reasonable to leave
this out of scope here, because this change would be mostly orthogonal.
It can be left out here, so it should, to limit scope and risk and
regressions.
Change the code to the Mozilla Style Guide
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style.
I've started doing that for 2-3 classes, and I think it's needed, but I
noticed that it's too much work for me to chew all at once. So, to make
this class conversion feasible, I'd prefer to first only convert to C++
classes, and then as a second step, we can together adapt the code
style. This should also keep most of the controversial opinionated
discussion out of this purely technical conversion. That said, I do
think this code should be converted to Mozilla Code Style. Just not here
in this bug in this first step.
On 22/05/2018 03:10, Ben Bucksch wrote:
https://bugzilla.mozilla.org/show_bug.cgi?id=1463289
Before commenting, please read this post in full. I know what some of
you will respond, and I tried to anticipate the responses.
libmime is written in object orientated C, using C syntax to define
and call classes, but it's compiled as C++. We should change the class
definitions to proper C++. This project should be kept minimal in
scope, to reduce the risk. It merely changes the syntax and not the
implementation, so the risk for regressions should be minimal. The
code changes are straight-forward and mechanical. That's what makes it
feasible. More ambitious changes can be done later and are out of scope.
[etc etc]
I'm not qualified to comment on your proposal (although it seems like a
worthy rationalisation to me) but thank you for posting what is a rather
interesting historical note.
--
Mark Rousell
On 5/21/2018 10:10 PM, Ben Bucksch wrote:
There's at least two other bugs reported requesting this ;-)
I think it would make sense to convert libmime classes to real, modern
C++ classes. It would make libmime a lot more readable to today's
developers. At the same time, it's a very feasible project, because
libmime is already completely object orientated, and it's already
compiled as C++, so the transition is a pure syntactical one. At the
same time, we could clean up the code style to be in line with Mozilla
Style Guide for new code.
What's the point?
Yes, the transition would be easy. You could probably write an automated
refactoring in a weekend to do it. I actually did start one once just to
see how bad it would be... and this was before Clang! But the transition
would be easy because the code so well adheres to its internal model,
and it's no worse C code than, say, the VFS layer of the Linux kernel,
and I don't hear much enthusiasm for making that be C++.
More to the point: libmime has several flaws. Its object-oriented C
nature is only a superficial flaw. Fundamentally, libmime exposes the
wrong interface. It's singularly focused on one goal (making the single
HTML document for display) that it's defeated attempts to reuse the mime
parsing for other purposes (say, full-text body search). The nastiest
bits of code aren't even in the C-in-C++ files (mimemalt may have some
nasty-look cases, but try figuring out how you drive the system via
nsStreamConverter some day. For bonus points, identify the paths that
are dead).
What the MIME code most needs is it needs alternative APIs that are
much, much easier for other people to try to use. After that, it needs
cleaning up of all the dead cruft. And then maybe we can start on some
of the big longstanding MIME bugs that are impossible to fix in the
current architecture.
I don't know if the current implementation can be easily coaxed into a
cleaner model. My instinct says that it cannot, based on some attempts
to do this on other crufty parts on the codebase (try working out how to
do it for the composition code some day, that really makes you weep for
how wrong it is). What's stopped me from attempting any forward progress
on this code in particular is the strategic indecision about what to do
with paying off technical debt.
Developer time is scarce. I don't see any short-term benefits to
converting libmime to C++, and there's strong reason to doubt the
existence of long-term benefits if we're going to throw all C++ code
away to move to a pure-JS build. If it enabled some major MIME project
to go forward, maybe it might be valuable, but I just don't see any
projects that would benefit from it. I'd much rather see time go into
something more certain to provide benefits, even if it's just making
concrete plans on what the future should look like.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist
Joshua Cranmer 🐧 wrote on 22.05.18 06:47:
It's singularly focused on one goal (making the single HTML document
for display)
Right. It's a MIME -> HTML converter.
But... you can get the message structure from the part numbers. Not
pretty, but read on.
What the MIME code most needs is it needs alternative APIs
There's an inherent problem: Say you want to look for all messages with
an attachment of type .doc. If you wanted to use libmime or JSMime or
any other MIME library for that, you would need to fetch and stream and
parse all messages in the mail store. That's not a good idea.
So, for most cases where you need meta data about a message and its
parts and not the actual content for display or download, a MIME library
is not the thing to use. Instead, you want to cache that information in
a database somewhere.
You're right, you do need a MIME library to build that database once.
But I don't see why libmime cannot do that. Or at least it could be made
to be helpful there.
libmime currently builds a hierarchy of MIME parts for each message, and
assigns hierarchical IDs of the form "1.3.2" for them that reflect that
hierarchy. Internally, it represents each MIME part by a MimeObject.
A MimeObject has a parent. I don't see an easy accessor to get all
children, maybe I'm missing it (the transition to C++ should help), but
that should be trivial to add. A MimeObject already has a parent. If you
have a parent, you are somebody's child. So, it's just a matter of
keeping that list of children. You might say that libmime is a bad
mother, for not keeping /tabs/ of its children. But then again, it's C
code and C developers hate /tabs/ :).
I don't know if the current implementation can be easily coaxed into a
cleaner model.
I think we'll see that better once the APIs are cleaner and clearly OO.
I see no reason why not.
if we're going to throw all C++ code away to move to a pure-JS build.
... I'd much rather see time go into something more certain to provide
benefits, even if it's just making concrete plans on what the future
should look like.
It's kind of funny. Hello? I'm Ben. I spent the entire last year
planning TB:Next Generation, and trying to convince people to get behind
that. Unsuccessfully.
Now you ask me why I work on that old crufty code base?
You're right, this isn't TB:NG. libmime needed some love, and I gave it
some.
Ben
Ben Bucksch wrote on 22.05.18 16:47:
Joshua Cranmer 🐧 wrote on 22.05.18 06:47:
It's singularly focused on one goal (making the single HTML document
for display)
What the MIME code most needs is it needs alternative APIs
There's an inherent problem: Say you want to look for all messages
with an attachment of type .doc. If you wanted to use libmime or
JSMime or any other MIME library for that, you would need to fetch and
stream and parse all messages in the mail store. That's not a good idea.
So, for most cases where you need meta data about a message and its
parts and not the actual content for display or download, a MIME
library is not the thing to use. Instead, you want to cache that
information in a database somewhere.
You're right, you do need a MIME library to build that database once.
But I don't see why libmime cannot do that. Or at least it could be
made to be helpful there.
libmime currently builds a hierarchy of MIME parts for each message,
and assigns hierarchical IDs of the form "1.3.2" for them that reflect
that hierarchy. Internally, it represents each MIME part by a MimeObject.
A MimeObject has a parent. I don't see an easy accessor to get all
children, maybe I'm missing it (the transition to C++ should help),
but that should be trivial to add. A MimeObject already has a parent.
If you have a parent, you are somebody's child. So, it's just a matter
of keeping that list of children. You might say that libmime is a bad
mother, for not keeping /tabs/ of its children. But then again, it's C
code and C developers hate /tabs/ :).
OK, I took a closer look.
What follows is my new C++ API. In theory, the same exists in old
libmime, but it's much harder to find.
You actually have all you need:
The most abstract type is Part, a one part of a MIME message. It has:
Headers has:
If you have a message with attachments, your Part will be a Container,
which also has:
So, in fact, you have the entire message structure, nicely as objects,
with properties. You can know whether this is a body or a plaintext part
or a container. If it's a container, you can get its children.
Containers can contain further Containers. There are more specific
Container types for multipart/mixed (attachments), for
multipart/alternative etc., and you can get at all that.
For each part, you can find out its relative position in the message
structure, its libmime class, its MIME type, and the [file] name of the
part.
Each part, you can ask to be rendered in HTML separately, or you could
get the stream for it.
So, you have access to entire message structure, as objects, and can
navigate it and get its properties.
You mentioned full text search, so you're building a full text search
index and you don't like HTML and prefer plaintext, you can plug it
together with a HTML -> plaintext converter. In fact, we have code that
does that.
Looks quite flexible to me.
Frankly, I didn't know, either, that libmime had all these APIs. But
that's where my little project comes in hand: It organizes the existing
APIs in a ways that's easy to understand and find for developers.
Ben
On 5/22/2018 11:34 AM, Ben Bucksch wrote:
Frankly, I didn't know, either, that libmime had all these APIs. But
that's where my little project comes in hand: It organizes the
existing APIs in a ways that's easy to understand and find for developers.
You fell precisely into the same seductive trap I did 7 years ago. I
looked at libmime, thought hard about what kind of API I should design,
and implemented in JSMime. If you look at the design I made for JSMime,
you have basically the same similar interface to what you proposed.
There are some differences--I tried to renumber part numbering to align
more to IMAP part numbering than libmime's slightly different idea of
numbering, and I made the interface too focused on streaming rather than
collecting information into a single output structure (for which there
is a separate adapter you can use to get that). And then I went further
and replaced every consumer that could use that interface with using
JSMime instead of whatever hacked up trick it was using.
You'll note, of course, that there is very little use of JSMime. That is
because, it turns out, that almost no one /wants/ a MIME tree. There's a
few places where people only want to parse headers, but almost everybody
just wants to know one thing: what's the body? MIME presents the view
of messages as a tree of parts, but the email world has settled on a
model of "body, attachments, and a few special snowflakes" (mostly
text/calendar, but S/MIME and PGP can make for some weird interactions).
Of course, there are some complications: bodies can be synthesized out
of multiple parts, for example. Since libmime was no longer an effective
guide for how to design this, I waffled on the API without making
progress and instead moved to the simpler task of designing the MIME API
for message composition. That ran into a combination of life events
intervening (the cause of my withdrawal from active development) as well
as the sheer idiocy of the internal compose APIs.
To return to my original point: what is your goal? If we still want to
replace all of our C++ code with JS, then effort on mime is best spent
on achieving that goal [1]. If that is not the goal, then we should
still have a conversation on how to replace the ~5-ish MIME parsers we
have down to one, be it libmime or JSMime or something else that is not
yet written. I've thought about playing with MIME and message parsing in
Rust as another alternative idea, although I haven't attempted anything
because there's not much point trying to build such a library if no one
will use it. It's a discussion that's long overdue; perhaps the outcome
is that libmime just needs another coat of paint, but if it does, I
think we can wait a short while to ensure that we're not painting a
house with a bulldozer bearing down on it. :-)
[1] Fixing JSMime may not necessarily be the best path to a JS mime
library. I made several wrong bets on the evolution of JS when I
started, so it would not be surprising if it were easier to start from a
clean slate.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist
On 23.05.18 03:10, Joshua Cranmer 🐧 wrote:
On 5/22/2018 11:34 AM, Ben Bucksch wrote:
Frankly, I didn't know, either, that libmime had all these APIs. But
that's where my little project comes in hand: It organizes the
existing APIs in a ways that's easy to understand and find for developers.
You fell precisely into the same seductive trap I did 7 years ago. I
looked at libmime, thought hard about what kind of API I should design,
and implemented in JSMime. If you look at the design I made for JSMime,
you have basically the same similar interface to what you proposed.
I think you miss the point. This is not about replacing libmime, it's
about making it easier to maintain. We still have libmime - whether we
like it or not.
I did a little work on libmime myself, so I know what I'm talking about.
The entry barrier for new developers is very high, particularly
because of the the uncommon C-based object structure, and maintenance is
not really easy either. Only a very small number of developers
understand enough of libmime to be able to work on it.
By providing an API that is easier to understand, we could make it
easier for new developers to understand how libmime works, and we could
benefit from easier maintenance in the future. I therefore think that we
should follow Ben's proposal.
-Patrick
On Tue, May 22, 2018 at 7:11 PM Joshua Cranmer 🐧 pidgeot18@gmail.com
wrote:
That is because, it turns out, that almost no one wants a MIME tree.
There's a few places where people only want to parse headers, but almost
everybody just wants to know one thing: what's the body?
Even if that is the case, it seems like it would make sense to implement an
API that answers "what is the body" on top of one that generates a mime
tree.
On 23/05/2018 08:39, Patrick Brunschwig wrote:
Only a very small number of developers
understand enough of libmime to be able to work on it.
I'd deny that.
If I exclude Ben and maybe Joshua, who might understand libmime better
than all the rest of us, I'd say the following:
No active developer understands libmime to be able to work on it. Some
developers, recently mostly myself, were forced to work on libmime to
fix some annoying bugs (incl. long-standing crashes) no one dared to
touch. When we make a change, we address the problem and prey that the
whole thing won't come crashing down. See:
https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemalt.cpp
https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemrel.cpp
https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemoz2.cpp
https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimedrft.cpp
Particularly bad were bug 1016524, bug 1026989 and bug 1251783 :-(
So anything that will make libmime a little less brittle with its
notorious function pointers and free'ing strategies (one crash was a
double free) should help us before we replace the whole thing.
That said, I hope that replacing the doubtful free'ing strategies won't
introduce a whole new lot of crashes.
Jörg.
Tom Prince wrote on 23.05.18 08:58:
Even if that is the case, it seems like it would make sense to
implement an API that answers "what is the body" on top of one that
generates a mime tree.