Empathy List Archives

maildev@lists.thunderbird.net

Thunderbird email developers

Convert libmime from C to C++

Ben Bucksch

Tue, May 22, 2018 2:10 AM

https://bugzilla.mozilla.org/show_bug.cgi?id=1463289

Before commenting, please read this post in full. I know what some of
you will respond, and I tried to anticipate the responses.

libmime is written in object orientated C, using C syntax to define and
call classes, but it's compiled as C++. We should change the class
definitions to proper C++. This project should be kept minimal in scope,
to reduce the risk. It merely changes the syntax and not the
implementation, so the risk for regressions should be minimal. The code
changes are straight-forward and mechanical. That's what makes it
feasible. More ambitious changes can be done later and are out of scope.

Background

libmime is our core backend library that processes all incoming email
and converts it for display and other purposes. It lets us understand
emails, and display them. It know all our details about decoding email
formats, and implements all the details about the message body display.
So, it's a core library.

It's been designed and written by jwz, who founded mozilla.org, together
with others, and who was the first technical lead of mozilla.org, before
Brendan Eich. libmime was actually created long before mozilla.org, but
somewhere in the Netscape 2.0 or 3.0 time frames.

Back then, he decided to write this backend library in C, not C++. Back
then, C++ was a language used for GUI implementation, go figure.
However, he saw the value of object orientation and a class hierarchy.
So, he designed and implemented his own object system, in pure C. He's
not the only one to do that, GTK+ also chose to use C, but with OO.
However, GTK+ is passing around a lof ot null pointers and then casting
them. Not so libmime. It has to use casts, due to the C language but it
casts from higher level object types, so there are no (or hardly any)
null pointers in the system, so it's actually type safe. It's the only
OO system in C that I've ever seen that manages to avoid null pointers.
It also implements all 3 object orientation properties, of
encapsulation, inheritance, and polymorphs. All in C. That's kind-of
ingenious, really. More details are in mimei.h
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.h.
I highly recommend reading the large comment at the top, it's very
educating and enlightening.

Also, libmime has a very nicely designed class hierarchy. It's in fact
the clearest class hierarchy in all of Thunderbird, and the best
documented one, too. Again, top of mimei.h and mimei.cpp.

mimei stands for libmime interface and contains the "switchboard" that
decides which libmime implementation class to use for parsing which MIME
part.

This project

But it's called mimei.cpp, not mimei.c, so isn't it already C++? Yes and
no. It's compiled as C++ code, and we do already use C++ Mozilla classes
like nsString in libmime. But the libmime classes themselves are still
defined in C. That is a left-over and no longer necessary. A lot of the
prejudice against libmime comes from the fact that people don't
understand that this is actually C code.

I think it would make sense to convert libmime classes to real, modern
C++ classes. It would make libmime a lot more readable to today's
developers. At the same time, it's a very feasible project, because
libmime is already completely object orientated, and it's already
compiled as C++, so the transition is a pure syntactical one. At the
same time, we could clean up the code style to be in line with Mozilla
Style Guide for new code.

It would be a very straight-forward, simple transition, a simple mapping
of syntax. None of the logic changes, at all. We don't even have to
change the string and buffer use at all, and we will not, to avoid
breaking existing code. That is the only thing that makes this project
viable. It reduces the risk of regressions to almost zero, because it
only changes language syntax and not any implementation. Otherwise, I
would not attempt this.

FAQ

Why not change it to JavaScript directly?

libmime uses a lot of low-level XPCOM features, like byte sequences, C
strings, and Unicode strings, and converts between them. It uses
low-level XPCOM charset conversion functions extensively to do its work

after all, that's part of its primary purpose. Changing that to JS
Unicode strings and octet streams and the like would change the string
handling massively, would completely change the logic, and risk many
regressions.

In this project here, we don't even have to touch the function
implementations, only change the syntax of function calls and function
signatures. Many of the function implementation lines won't even change.
It would not only be a syntax change, but a re-implementation. It's a
much larger change, and not one I'm ready or able to make.
In other words, changing to JS would probably be 10 or 100 times more
work than what I propose here.

Why not use JSMime instead?

This has been attempted a few years ago, and they found that JSMime
cannot yet replace libmime. I don't know JSMime, nor am I familiar with
the reasons in detail, but they were severe enough that this project was
abandonned and JSMime parsing used only for header parsing, not content
parsing. JSMime was written by jcranmer, and he's a highly competent
engineer, and I respect him. If he didn't manage to get it to production
quality for 25 million users, I won't dare to attempt that. The risk is
huge.
libmime and its support libraries have features that JSMime does not
have. For example, the plaintext conversion code received a lot of love
and is very finely tuned. That includes quote recognition, struct and
smiley recognition, and URL recognition, which is the most difficult.
All this would have to be re-implemented and ported, to match the
existing code, to avoid regressions. This part alone is a year-long
project to fine-tune, and that's just one part of libmime, but there are
many others.
In other words, changing to JSMime would be between 10 and 100 times
more work than what I propose here.

Scope

What is the goal?

Convert libmime class definitions from C syntax to C++ syntax. Same
for function definitions.
Adapt the function calls, remove the casts and change |myClass| et al
to |this|.
Use a C++ name space and remove the pseudo name space in the class names.
Change the class method names to the Mozilla Style Guide
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style.
Only the method names, not the parameters.

What is out of scope?

Any functional changes. The logic should not change in any way. The
output should not change in any way. If we find bugs during the course
of this project, they will be filed as separate bug reports and fixed
after the conversion is over.

Atomic commits
https://en.wikipedia.org/wiki/Atomic_commit#Atomic_commit_convention:
"Greater understandability comes from the small size and focused nature
of the commit. It is much easier to understand what is changed and
reasoning behind the changes if you are only looking for one kind of
change. This becomes especially important when making format changes to
the source code. If format and functional changes are combined it
becomes very difficult to identify useful changes"

Exceptions. Error handling stays with integer result codes for now. We
will even stay with libmime's own result code, not nsresult. This limits
the scope and risk of this change here. This could be changed later in a
followup commit to either nsresult or exceptions, if desired. Changing
error handling changes mostly the function implementations, while this
project changes mostly the class definitions. It's reasonable to leave
this out of scope here, because this change would be mostly orthogonal.
It can be left out here, so it should, to limit scope and risk and
regressions.
Change the code to the Mozilla Style Guide
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style.
I've started doing that for 2-3 classes, and I think it's needed, but I
noticed that it's too much work for me to chew all at once. So, to make
this class conversion feasible, I'd prefer to first only convert to C++
classes, and then as a second step, we can together adapt the code
style. This should also keep most of the controversial opinionated
discussion out of this purely technical conversion. That said, I do
think this code should be converted to Mozilla Code Style. Just not here
in this bug in this first step.

https://bugzilla.mozilla.org/show_bug.cgi?id=1463289 Before commenting, please read this post in full. I know what some of you will respond, and I tried to anticipate the responses. ------------------------------------------------------------------------ libmime is written in object orientated C, using C syntax to define and call classes, but it's compiled as C++. We should change the class definitions to proper C++. This project should be kept minimal in scope, to reduce the risk. It merely changes the syntax and not the implementation, so the risk for regressions should be minimal. The code changes are straight-forward and mechanical. That's what makes it feasible. More ambitious changes can be done later and are out of scope. ------------------------------------------------------------------------ Background libmime is our core backend library that processes all incoming email and converts it for display and other purposes. It lets us understand emails, and display them. It know all our details about decoding email formats, and implements all the details about the message body display. So, it's a core library. It's been designed and written by jwz, who founded mozilla.org, together with others, and who was the first technical lead of mozilla.org, before Brendan Eich. libmime was actually created long before mozilla.org, but somewhere in the Netscape 2.0 or 3.0 time frames. Back then, he decided to write this backend library in C, not C++. Back then, C++ was a language used for GUI implementation, go figure. However, he saw the value of object orientation and a class hierarchy. So, he designed and implemented his own object system, in pure C. He's not the only one to do that, GTK+ also chose to use C, but with OO. However, GTK+ is passing around a lof ot null pointers and then casting them. Not so libmime. It has to use casts, due to the C language but it casts from higher level object types, so there are no (or hardly any) null pointers in the system, so it's actually type safe. It's the only OO system in C that I've ever seen that manages to avoid null pointers. It also implements all 3 object orientation properties, of encapsulation, inheritance, and polymorphs. All in C. That's kind-of ingenious, really. More details are in mimei.h <https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.h>. I highly recommend reading the large comment at the top, it's very educating and enlightening. Also, libmime has a very nicely designed class hierarchy. It's in fact the clearest class hierarchy in all of Thunderbird, and the best documented one, too. Again, top of mimei.h and mimei.cpp. mimei stands for libmime interface and contains the "switchboard" that decides which libmime implementation class to use for parsing which MIME part. ------------------------------------------------------------------------ This project But it's called mimei.cpp, not mimei.c, so isn't it already C++? Yes and no. It's compiled as C++ code, and we do already use C++ Mozilla classes like nsString in libmime. But the libmime classes themselves are still defined in C. That is a left-over and no longer necessary. A lot of the prejudice against libmime comes from the fact that people don't understand that this is actually C code. I think it would make sense to convert libmime classes to real, modern C++ classes. It would make libmime a lot more readable to today's developers. At the same time, it's a very feasible project, because libmime is already completely object orientated, and it's already compiled as C++, so the transition is a pure syntactical one. At the same time, we could clean up the code style to be in line with Mozilla Style Guide for new code. It would be a very straight-forward, simple transition, a simple mapping of syntax. None of the logic changes, at all. We don't even have to change the string and buffer use at all, and we will not, to avoid breaking existing code. That is the only thing that makes this project viable. It reduces the risk of regressions to almost zero, because it only changes language syntax and not any implementation. Otherwise, I would not attempt this. ------------------------------------------------------------------------ FAQ 1. Why not change it to JavaScript directly? * libmime uses a lot of low-level XPCOM features, like byte sequences, C strings, and Unicode strings, and converts between them. It uses low-level XPCOM charset conversion functions extensively to do its work - after all, that's part of its primary purpose. Changing that to JS Unicode strings and octet streams and the like would change the string handling massively, would completely change the logic, and risk many regressions. * In this project here, we don't even have to touch the function implementations, only change the syntax of function calls and function signatures. Many of the function implementation lines won't even change. * It would not only be a syntax change, but a re-implementation. It's a much larger change, and not one I'm ready or able to make. * In other words, changing to JS would probably be 10 or 100 times more work than what I propose here. 2. Why not use JSMime instead? * This has been attempted a few years ago, and they found that JSMime cannot yet replace libmime. I don't know JSMime, nor am I familiar with the reasons in detail, but they were severe enough that this project was abandonned and JSMime parsing used only for header parsing, not content parsing. JSMime was written by jcranmer, and he's a highly competent engineer, and I respect him. If he didn't manage to get it to production quality for 25 million users, I won't dare to attempt that. The risk is huge. * libmime and its support libraries have features that JSMime does not have. For example, the plaintext conversion code received a lot of love and is very finely tuned. That includes quote recognition, struct and smiley recognition, and URL recognition, which is the most difficult. All this would have to be re-implemented and ported, to match the existing code, to avoid regressions. This part alone is a year-long project to fine-tune, and that's just one part of libmime, but there are many others. * In other words, changing to JSMime would be between 10 and 100 times more work than what I propose here. ------------------------------------------------------------------------ Scope What is the goal? * Convert libmime class definitions from C syntax to C++ syntax. Same for function definitions. * Adapt the function calls, remove the casts and change |myClass| et al to |this|. * Use a C++ name space and remove the pseudo name space in the class names. * Change the class method names to the Mozilla Style Guide <https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style>. Only the method names, not the parameters. What is out of scope? * Any functional changes. The logic should not change in any way. The output should not change in any way. If we find bugs during the course of this project, they will be filed as separate bug reports and fixed after the conversion is over. Atomic commits <https://en.wikipedia.org/wiki/Atomic_commit#Atomic_commit_convention>: "Greater understandability comes from the small size and focused nature of the commit. It is much easier to understand what is changed and reasoning behind the changes if you are only looking for one kind of change. This becomes especially important when making format changes to the source code. If format and functional changes are combined it becomes very difficult to identify useful changes" * Exceptions. Error handling stays with integer result codes for now. We will even stay with libmime's own result code, not nsresult. This limits the scope and risk of this change here. This could be changed later in a followup commit to either nsresult or exceptions, if desired. Changing error handling changes mostly the function implementations, while this project changes mostly the class definitions. It's reasonable to leave this out of scope here, because this change would be mostly orthogonal. It can be left out here, so it should, to limit scope and risk and regressions. * Change the code to the Mozilla Style Guide <https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style>. I've started doing that for 2-3 classes, and I think it's needed, but I noticed that it's too much work for me to chew all at once. So, to make this class conversion feasible, I'd prefer to first only convert to C++ classes, and then as a second step, we can together adapt the code style. This should also keep most of the controversial opinionated discussion out of this purely technical conversion. That said, I do think this code should be converted to Mozilla Code Style. Just not here in this bug in this first step.

Mark Rousell

Tue, May 22, 2018 3:09 AM

On 22/05/2018 03:10, Ben Bucksch wrote:

https://bugzilla.mozilla.org/show_bug.cgi?id=1463289

Before commenting, please read this post in full. I know what some of
you will respond, and I tried to anticipate the responses.

libmime is written in object orientated C, using C syntax to define
and call classes, but it's compiled as C++. We should change the class
definitions to proper C++. This project should be kept minimal in
scope, to reduce the risk. It merely changes the syntax and not the
implementation, so the risk for regressions should be minimal. The
code changes are straight-forward and mechanical. That's what makes it
feasible. More ambitious changes can be done later and are out of scope.

[etc etc]

I'm not qualified to comment on your proposal (although it seems like a
worthy rationalisation to me) but thank you for posting what is a rather
interesting historical note.

--
Mark Rousell

On 22/05/2018 03:10, Ben Bucksch wrote: > > https://bugzilla.mozilla.org/show_bug.cgi?id=1463289 > > Before commenting, please read this post in full. I know what some of > you will respond, and I tried to anticipate the responses. > > > ------------------------------------------------------------------------ > > libmime is written in object orientated C, using C syntax to define > and call classes, but it's compiled as C++. We should change the class > definitions to proper C++. This project should be kept minimal in > scope, to reduce the risk. It merely changes the syntax and not the > implementation, so the risk for regressions should be minimal. The > code changes are straight-forward and mechanical. That's what makes it > feasible. More ambitious changes can be done later and are out of scope. > > ------------------------------------------------------------------------ > > [etc etc] > I'm not qualified to comment on your proposal (although it seems like a worthy rationalisation to me) but thank you for posting what is a rather interesting historical note. -- Mark Rousell

Joshua Cranmer 🐧

Tue, May 22, 2018 4:47 AM

On 5/21/2018 10:10 PM, Ben Bucksch wrote:

https://bugzilla.mozilla.org/show_bug.cgi?id=1463289

There's at least two other bugs reported requesting this ;-)

What's the point?

Yes, the transition would be easy. You could probably write an automated
refactoring in a weekend to do it. I actually did start one once just to
see how bad it would be... and this was before Clang! But the transition
would be easy because the code so well adheres to its internal model,
and it's no worse C code than, say, the VFS layer of the Linux kernel,
and I don't hear much enthusiasm for making that be C++.

More to the point: libmime has several flaws. Its object-oriented C
nature is only a superficial flaw. Fundamentally, libmime exposes the
wrong interface. It's singularly focused on one goal (making the single
HTML document for display) that it's defeated attempts to reuse the mime
parsing for other purposes (say, full-text body search). The nastiest
bits of code aren't even in the C-in-C++ files (mimemalt may have some
nasty-look cases, but try figuring out how you drive the system via
nsStreamConverter some day. For bonus points, identify the paths that
are dead).

What the MIME code most needs is it needs alternative APIs that are
much, much easier for other people to try to use. After that, it needs
cleaning up of all the dead cruft. And then maybe we can start on some
of the big longstanding MIME bugs that are impossible to fix in the
current architecture.

I don't know if the current implementation can be easily coaxed into a
cleaner model. My instinct says that it cannot, based on some attempts
to do this on other crufty parts on the codebase (try working out how to
do it for the composition code some day, that really makes you weep for
how wrong it is). What's stopped me from attempting any forward progress
on this code in particular is the strategic indecision about what to do
with paying off technical debt.

Developer time is scarce. I don't see any short-term benefits to
converting libmime to C++, and there's strong reason to doubt the
existence of long-term benefits if we're going to throw all C++ code
away to move to a pure-JS build. If it enabled some major MIME project
to go forward, maybe it might be valuable, but I just don't see any
projects that would benefit from it. I'd much rather see time go into
something more certain to provide benefits, even if it's just making
concrete plans on what the future should look like.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

On 5/21/2018 10:10 PM, Ben Bucksch wrote: > > https://bugzilla.mozilla.org/show_bug.cgi?id=1463289 > There's at least two other bugs reported requesting this ;-) > I think it would make sense to convert libmime classes to real, modern > C++ classes. It would make libmime a lot more readable to today's > developers. At the same time, it's a very feasible project, because > libmime is already completely object orientated, and it's already > compiled as C++, so the transition is a pure syntactical one. At the > same time, we could clean up the code style to be in line with Mozilla > Style Guide for new code. What's the point? Yes, the transition would be easy. You could probably write an automated refactoring in a weekend to do it. I actually did start one once just to see how bad it would be... and this was before Clang! But the transition would be easy because the code so well adheres to its internal model, and it's no worse C code than, say, the VFS layer of the Linux kernel, and I don't hear much enthusiasm for making that be C++. More to the point: libmime has several flaws. Its object-oriented C nature is only a superficial flaw. Fundamentally, libmime exposes the wrong interface. It's singularly focused on one goal (making the single HTML document for display) that it's defeated attempts to reuse the mime parsing for other purposes (say, full-text body search). The nastiest bits of code aren't even in the C-in-C++ files (mimemalt may have some nasty-look cases, but try figuring out how you drive the system via nsStreamConverter some day. For bonus points, identify the paths that are dead). What the MIME code most needs is it needs alternative APIs that are much, much easier for other people to try to use. After that, it needs cleaning up of all the dead cruft. And then maybe we can start on some of the big longstanding MIME bugs that are impossible to fix in the current architecture. I don't know if the current implementation can be easily coaxed into a cleaner model. My instinct says that it cannot, based on some attempts to do this on other crufty parts on the codebase (try working out how to do it for the composition code some day, that really makes you weep for how wrong it is). What's stopped me from attempting any forward progress on this code in particular is the strategic indecision about what to do with paying off technical debt. Developer time is scarce. I don't see any short-term benefits to converting libmime to C++, and there's strong reason to doubt the existence of long-term benefits if we're going to throw all C++ code away to move to a pure-JS build. If it enabled some major MIME project to go forward, maybe it might be valuable, but I just don't see any projects that would benefit from it. I'd much rather see time go into something more certain to provide benefits, even if it's just making concrete plans on what the future should look like. -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist

Ben Bucksch

Tue, May 22, 2018 2:47 PM

Joshua Cranmer 🐧 wrote on 22.05.18 06:47:

It's singularly focused on one goal (making the single HTML document
for display)

Right. It's a MIME -> HTML converter.

But... you can get the message structure from the part numbers. Not
pretty, but read on.

What the MIME code most needs is it needs alternative APIs

There's an inherent problem: Say you want to look for all messages with
an attachment of type .doc. If you wanted to use libmime or JSMime or
any other MIME library for that, you would need to fetch and stream and
parse all messages in the mail store. That's not a good idea.

So, for most cases where you need meta data about a message and its
parts and not the actual content for display or download, a MIME library
is not the thing to use. Instead, you want to cache that information in
a database somewhere.

You're right, you do need a MIME library to build that database once.
But I don't see why libmime cannot do that. Or at least it could be made
to be helpful there.

libmime currently builds a hierarchy of MIME parts for each message, and
assigns hierarchical IDs of the form "1.3.2" for them that reflect that
hierarchy. Internally, it represents each MIME part by a MimeObject.

A MimeObject has a parent. I don't see an easy accessor to get all
children, maybe I'm missing it (the transition to C++ should help), but
that should be trivial to add. A MimeObject already has a parent. If you
have a parent, you are somebody's child. So, it's just a matter of
keeping that list of children. You might say that libmime is a bad
mother, for not keeping /tabs/ of its children. But then again, it's C
code and C developers hate /tabs/ :).

I don't know if the current implementation can be easily coaxed into a
cleaner model.

I think we'll see that better once the APIs are cleaner and clearly OO.
I see no reason why not.

if we're going to throw all C++ code away to move to a pure-JS build.
... I'd much rather see time go into something more certain to provide
benefits, even if it's just making concrete plans on what the future
should look like.

It's kind of funny. Hello? I'm Ben. I spent the entire last year
planning TB:Next Generation, and trying to convince people to get behind
that. Unsuccessfully.

Now you ask me why I work on that old crufty code base?

You're right, this isn't TB:NG. libmime needed some love, and I gave it
some.

Ben

Joshua Cranmer 🐧 wrote on 22.05.18 06:47: > It's singularly focused on one goal (making the single HTML document > for display) Right. It's a MIME -> HTML converter. But... you can get the message structure from the part numbers. Not pretty, but read on. > What the MIME code most needs is it needs alternative APIs There's an inherent problem: Say you want to look for all messages with an attachment of type .doc. If you wanted to use libmime or JSMime or any other MIME library for that, you would need to fetch and stream and parse all messages in the mail store. That's not a good idea. So, for most cases where you need meta data about a message and its parts and not the actual content for display or download, a MIME library is not the thing to use. Instead, you want to cache that information in a database somewhere. You're right, you do need a MIME library to build that database once. But I don't see why libmime cannot do that. Or at least it could be made to be helpful there. libmime currently builds a hierarchy of MIME parts for each message, and assigns hierarchical IDs of the form "1.3.2" for them that reflect that hierarchy. Internally, it represents each MIME part by a MimeObject. A MimeObject has a parent. I don't see an easy accessor to get all children, maybe I'm missing it (the transition to C++ should help), but that should be trivial to add. A MimeObject already has a parent. If you have a parent, you are somebody's child. So, it's just a matter of keeping that list of children. You might say that libmime is a bad mother, for not keeping /tabs/ of its children. But then again, it's C code and C developers hate /tabs/ :). > I don't know if the current implementation can be easily coaxed into a > cleaner model. > I think we'll see that better once the APIs are cleaner and clearly OO. I see no reason why not. > if we're going to throw all C++ code away to move to a pure-JS build. > ... I'd much rather see time go into something more certain to provide > benefits, even if it's just making concrete plans on what the future > should look like. It's kind of funny. Hello? I'm Ben. I spent the entire last year planning TB:Next Generation, and trying to convince people to get behind that. Unsuccessfully. Now you ask me why I work on that old crufty code base? You're right, this isn't TB:NG. libmime needed some love, and I gave it some. Ben

Ben Bucksch

Tue, May 22, 2018 3:34 PM

Ben Bucksch wrote on 22.05.18 16:47:

Joshua Cranmer 🐧 wrote on 22.05.18 06:47:

It's singularly focused on one goal (making the single HTML document
for display)

What the MIME code most needs is it needs alternative APIs

There's an inherent problem: Say you want to look for all messages
with an attachment of type .doc. If you wanted to use libmime or
JSMime or any other MIME library for that, you would need to fetch and
stream and parse all messages in the mail store. That's not a good idea.

So, for most cases where you need meta data about a message and its
parts and not the actual content for display or download, a MIME
library is not the thing to use. Instead, you want to cache that
information in a database somewhere.

You're right, you do need a MIME library to build that database once.
But I don't see why libmime cannot do that. Or at least it could be
made to be helpful there.

libmime currently builds a hierarchy of MIME parts for each message,
and assigns hierarchical IDs of the form "1.3.2" for them that reflect
that hierarchy. Internally, it represents each MIME part by a MimeObject.

A MimeObject has a parent. I don't see an easy accessor to get all
children, maybe I'm missing it (the transition to C++ should help),
but that should be trivial to add. A MimeObject already has a parent.
If you have a parent, you are somebody's child. So, it's just a matter
of keeping that list of children. You might say that libmime is a bad
mother, for not keeping /tabs/ of its children. But then again, it's C
code and C developers hate /tabs/ :).

OK, I took a closer look.

What follows is my new C++ API. In theory, the same exists in old
libmime, but it's much harder to find.

You actually have all you need:

The most abstract type is Part, a one part of a MIME message. It has:

Part* parent;
char* PartAddress(); // ID of this part, e.g. 1.4.2, represents the
email structure
Part* GetObjectForPartAddress(const char* part); // Get subpart by ID
char* ExternalAttachmentURL(); // Download URL
bool IsMessageBody();
char* content_type; // MIME type of this part
PartClass* clazz; // Type in terms of libmime class
Headers* headers;
...

Headers has:

char* Get(const char* header_name,...
chat* GetParameter(const char* header_value, const char* parm_name, ...
char* GetFilename(DisplayOptions* opt);
...

If you have a message with attachments, your Part will be a Container,
which also has:

Part **children

So, in fact, you have the entire message structure, nicely as objects,
with properties. You can know whether this is a body or a plaintext part
or a container. If it's a container, you can get its children.
Containers can contain further Containers. There are more specific
Container types for multipart/mixed (attachments), for
multipart/alternative etc., and you can get at all that.

For each part, you can find out its relative position in the message
structure, its libmime class, its MIME type, and the [file] name of the
part.

Each part, you can ask to be rendered in HTML separately, or you could
get the stream for it.

So, you have access to entire message structure, as objects, and can
navigate it and get its properties.

You mentioned full text search, so you're building a full text search
index and you don't like HTML and prefer plaintext, you can plug it
together with a HTML -> plaintext converter. In fact, we have code that
does that.

Looks quite flexible to me.

Frankly, I didn't know, either, that libmime had all these APIs. But
that's where my little project comes in hand: It organizes the existing
APIs in a ways that's easy to understand and find for developers.

Ben

Ben Bucksch wrote on 22.05.18 16:47: > Joshua Cranmer 🐧 wrote on 22.05.18 06:47: >> It's singularly focused on one goal (making the single HTML document >> for display) > >> What the MIME code most needs is it needs alternative APIs > > > There's an inherent problem: Say you want to look for all messages > with an attachment of type .doc. If you wanted to use libmime or > JSMime or any other MIME library for that, you would need to fetch and > stream and parse all messages in the mail store. That's not a good idea. > > So, for most cases where you need meta data about a message and its > parts and not the actual content for display or download, a MIME > library is not the thing to use. Instead, you want to cache that > information in a database somewhere. > > You're right, you do need a MIME library to build that database once. > But I don't see why libmime cannot do that. Or at least it could be > made to be helpful there. > > libmime currently builds a hierarchy of MIME parts for each message, > and assigns hierarchical IDs of the form "1.3.2" for them that reflect > that hierarchy. Internally, it represents each MIME part by a MimeObject. > > A MimeObject has a parent. I don't see an easy accessor to get all > children, maybe I'm missing it (the transition to C++ should help), > but that should be trivial to add. A MimeObject already has a parent. > If you have a parent, you are somebody's child. So, it's just a matter > of keeping that list of children. You might say that libmime is a bad > mother, for not keeping /tabs/ of its children. But then again, it's C > code and C developers hate /tabs/ :). > OK, I took a closer look. What follows is my new C++ API. In theory, the same exists in old libmime, but it's much harder to find. You actually have all you need: The most abstract type is Part, a one part of a MIME message. It has: * Part* parent; * char* PartAddress(); // ID of this part, e.g. 1.4.2, represents the email structure * Part* GetObjectForPartAddress(const char* part); // Get subpart by ID * char* ExternalAttachmentURL(); // Download URL * bool IsMessageBody(); * char* content_type; // MIME type of this part * PartClass* clazz; // Type in terms of libmime class * Headers* headers; * ... Headers has: * char* Get(const char* header_name,... * chat* GetParameter(const char* header_value, const char* parm_name, ... * char* GetFilename(DisplayOptions* opt); * ... If you have a message with attachments, your Part will be a Container, which also has: * Part **children So, in fact, you have the entire message structure, nicely as objects, with properties. You can know whether this is a body or a plaintext part or a container. If it's a container, you can get its children. Containers can contain further Containers. There are more specific Container types for multipart/mixed (attachments), for multipart/alternative etc., and you can get at all that. For each part, you can find out its relative position in the message structure, its libmime class, its MIME type, and the [file] name of the part. Each part, you can ask to be rendered in HTML separately, or you could get the stream for it. So, you have access to entire message structure, as objects, and can navigate it and get its properties. You mentioned full text search, so you're building a full text search index and you don't like HTML and prefer plaintext, you can plug it together with a HTML -> plaintext converter. In fact, we have code that does that. Looks quite flexible to me. Frankly, I didn't know, either, that libmime had all these APIs. But that's where my little project comes in hand: It organizes the existing APIs in a ways that's easy to understand and find for developers. Ben

Joshua Cranmer 🐧

Wed, May 23, 2018 1:10 AM

On 5/22/2018 11:34 AM, Ben Bucksch wrote:

Frankly, I didn't know, either, that libmime had all these APIs. But
that's where my little project comes in hand: It organizes the
existing APIs in a ways that's easy to understand and find for developers.

You fell precisely into the same seductive trap I did 7 years ago. I
looked at libmime, thought hard about what kind of API I should design,
and implemented in JSMime. If you look at the design I made for JSMime,
you have basically the same similar interface to what you proposed.
There are some differences--I tried to renumber part numbering to align
more to IMAP part numbering than libmime's slightly different idea of
numbering, and I made the interface too focused on streaming rather than
collecting information into a single output structure (for which there
is a separate adapter you can use to get that). And then I went further
and replaced every consumer that could use that interface with using
JSMime instead of whatever hacked up trick it was using.

You'll note, of course, that there is very little use of JSMime. That is
because, it turns out, that almost no one /wants/ a MIME tree. There's a
few places where people only want to parse headers, but almost everybody
just wants to know one thing: what's the body? MIME presents the view
of messages as a tree of parts, but the email world has settled on a
model of "body, attachments, and a few special snowflakes" (mostly
text/calendar, but S/MIME and PGP can make for some weird interactions).
Of course, there are some complications: bodies can be synthesized out
of multiple parts, for example. Since libmime was no longer an effective
guide for how to design this, I waffled on the API without making
progress and instead moved to the simpler task of designing the MIME API
for message composition. That ran into a combination of life events
intervening (the cause of my withdrawal from active development) as well
as the sheer idiocy of the internal compose APIs.

To return to my original point: what is your goal? If we still want to
replace all of our C++ code with JS, then effort on mime is best spent
on achieving that goal [1]. If that is not the goal, then we should
still have a conversation on how to replace the ~5-ish MIME parsers we
have down to one, be it libmime or JSMime or something else that is not
yet written. I've thought about playing with MIME and message parsing in
Rust as another alternative idea, although I haven't attempted anything
because there's not much point trying to build such a library if no one
will use it. It's a discussion that's long overdue; perhaps the outcome
is that libmime just needs another coat of paint, but if it does, I
think we can wait a short while to ensure that we're not painting a
house with a bulldozer bearing down on it. :-)

[1] Fixing JSMime may not necessarily be the best path to a JS mime
library. I made several wrong bets on the evolution of JS when I
started, so it would not be surprising if it were easier to start from a
clean slate.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

On 5/22/2018 11:34 AM, Ben Bucksch wrote: > > Frankly, I didn't know, either, that libmime had all these APIs. But > that's where my little project comes in hand: It organizes the > existing APIs in a ways that's easy to understand and find for developers. > You fell precisely into the same seductive trap I did 7 years ago. I looked at libmime, thought hard about what kind of API I should design, and implemented in JSMime. If you look at the design I made for JSMime, you have basically the same similar interface to what you proposed. There are some differences--I tried to renumber part numbering to align more to IMAP part numbering than libmime's slightly different idea of numbering, and I made the interface too focused on streaming rather than collecting information into a single output structure (for which there is a separate adapter you can use to get that). And then I went further and replaced every consumer that could use that interface with using JSMime instead of whatever hacked up trick it was using. You'll note, of course, that there is very little use of JSMime. That is because, it turns out, that almost no one /wants/ a MIME tree. There's a few places where people only want to parse headers, but almost everybody just wants to know one thing: *what's the body*? MIME presents the view of messages as a tree of parts, but the email world has settled on a model of "body, attachments, and a few special snowflakes" (mostly text/calendar, but S/MIME and PGP can make for some weird interactions). Of course, there are some complications: bodies can be synthesized out of multiple parts, for example. Since libmime was no longer an effective guide for how to design this, I waffled on the API without making progress and instead moved to the simpler task of designing the MIME API for message composition. That ran into a combination of life events intervening (the cause of my withdrawal from active development) as well as the sheer idiocy of the internal compose APIs. To return to my original point: what is your goal? If we still want to replace all of our C++ code with JS, then effort on mime is best spent on achieving that goal [1]. If that is not the goal, then we should still have a conversation on how to replace the ~5-ish MIME parsers we have down to one, be it libmime or JSMime or something else that is not yet written. I've thought about playing with MIME and message parsing in Rust as another alternative idea, although I haven't attempted anything because there's not much point trying to build such a library if no one will use it. It's a discussion that's long overdue; perhaps the outcome is that libmime just needs another coat of paint, but if it does, I think we can wait a short while to ensure that we're not painting a house with a bulldozer bearing down on it. :-) [1] Fixing JSMime may not necessarily be the best path to a JS mime library. I made several wrong bets on the evolution of JS when I started, so it would not be surprising if it were easier to start from a clean slate. -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist

Patrick Brunschwig

Wed, May 23, 2018 6:39 AM

On 23.05.18 03:10, Joshua Cranmer 🐧 wrote:

On 5/22/2018 11:34 AM, Ben Bucksch wrote:

Frankly, I didn't know, either, that libmime had all these APIs. But
that's where my little project comes in hand: It organizes the
existing APIs in a ways that's easy to understand and find for developers.

I think you miss the point. This is not about replacing libmime, it's
about making it easier to maintain. We still have libmime - whether we
like it or not.

I did a little work on libmime myself, so I know what I'm talking about.
The entry barrier for new developers is very high, particularly
because of the the uncommon C-based object structure, and maintenance is
not really easy either. Only a very small number of developers
understand enough of libmime to be able to work on it.

By providing an API that is easier to understand, we could make it
easier for new developers to understand how libmime works, and we could
benefit from easier maintenance in the future. I therefore think that we
should follow Ben's proposal.

-Patrick

On 23.05.18 03:10, Joshua Cranmer 🐧 wrote: > On 5/22/2018 11:34 AM, Ben Bucksch wrote: >> >> Frankly, I didn't know, either, that libmime had all these APIs. But >> that's where my little project comes in hand: It organizes the >> existing APIs in a ways that's easy to understand and find for developers. >> > You fell precisely into the same seductive trap I did 7 years ago. I > looked at libmime, thought hard about what kind of API I should design, > and implemented in JSMime. If you look at the design I made for JSMime, > you have basically the same similar interface to what you proposed. I think you miss the point. This is not about replacing libmime, it's about making it easier to maintain. We still have libmime - whether we like it or not. I did a little work on libmime myself, so I know what I'm talking about. The entry barrier for new developers is _very_ high, particularly because of the the uncommon C-based object structure, and maintenance is not really easy either. Only a very small number of developers understand enough of libmime to be able to work on it. By providing an API that is easier to understand, we could make it easier for new developers to understand how libmime works, and we could benefit from easier maintenance in the future. I therefore think that we should follow Ben's proposal. -Patrick

Tom Prince

Wed, May 23, 2018 6:58 AM

On Tue, May 22, 2018 at 7:11 PM Joshua Cranmer 🐧 pidgeot18@gmail.com
wrote:

That is because, it turns out, that almost no one wants a MIME tree.
There's a few places where people only want to parse headers, but almost
everybody just wants to know one thing: what's the body?

Even if that is the case, it seems like it would make sense to implement an
API that answers "what is the body" on top of one that generates a mime
tree.

On Tue, May 22, 2018 at 7:11 PM Joshua Cranmer 🐧 <pidgeot18@gmail.com> wrote: > That is because, it turns out, that almost no one *wants* a MIME tree. > There's a few places where people only want to parse headers, but almost > everybody just wants to know one thing: *what's the body*? Even if that is the case, it seems like it would make sense to implement an API that answers "what is the body" on top of one that generates a mime tree.

Jörg Knobloch

Wed, May 23, 2018 7:31 AM

On 23/05/2018 08:39, Patrick Brunschwig wrote:

Only a very small number of developers
understand enough of libmime to be able to work on it.

I'd deny that.

If I exclude Ben and maybe Joshua, who might understand libmime better
than all the rest of us, I'd say the following:

No active developer understands libmime to be able to work on it. Some
developers, recently mostly myself, were forced to work on libmime to
fix some annoying bugs (incl. long-standing crashes) no one dared to
touch. When we make a change, we address the problem and prey that the
whole thing won't come crashing down. See:

https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemalt.cpp
https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemrel.cpp
https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemoz2.cpp
https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimedrft.cpp
Particularly bad were bug 1016524, bug 1026989 and bug 1251783 :-(

So anything that will make libmime a little less brittle with its
notorious function pointers and free'ing strategies (one crash was a
double free) should help us before we replace the whole thing.

That said, I hope that replacing the doubtful free'ing strategies won't
introduce a whole new lot of crashes.

Jörg.

On 23/05/2018 08:39, Patrick Brunschwig wrote: > Only a very small number of developers > understand enough of libmime to be able to work on it. I'd deny that. If I exclude Ben and maybe Joshua, who might understand libmime better than all the rest of us, I'd say the following: No active developer understands libmime to be able to work on it. Some developers, recently mostly myself, were *forced* to work on libmime to fix some annoying bugs (incl. long-standing crashes) no one dared to touch. When we make a change, we address the problem and prey that the whole thing won't come crashing down. See: https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemalt.cpp https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemrel.cpp https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimemoz2.cpp https://hg.mozilla.org/comm-central/log/tip/mailnews/mime/src/mimedrft.cpp Particularly bad were bug 1016524, bug 1026989 and bug 1251783 :-( So anything that will make libmime a little less brittle with its notorious function pointers and free'ing strategies (one crash was a double free) should help us before we replace the whole thing. That said, I hope that replacing the doubtful free'ing strategies won't introduce a whole new lot of crashes. Jörg.

Ben Bucksch

Wed, May 23, 2018 8:22 PM

Tom Prince wrote on 23.05.18 08:58:

Even if that is the case, it seems like it would make sense to
implement an API that answers "what is the body" on top of one that
generates a mime tree.

bool IsMessageBody();

https://phabricator.services.mozilla.com/D1344#change-toxB7gkcC6hQ

;-)

Tom Prince wrote on 23.05.18 08:58: > Even if that is the case, it seems like it would make sense to > implement an API that answers "what is the body" on top of one that > generates a mime tree. bool IsMessageBody(); https://phabricator.services.mozilla.com/D1344#change-toxB7gkcC6hQ ;-)