Jörg has asked me many times to give some thoughts on some of the
projects needed within the backend Thunderbird core code to solve some
key issues, if we plan to do significant improvements to the existing
codebase. I should also add that these issues are mostly orthogonal to
another huge effort which is maintaining Thunderbird functionality
against changes from the underlying Firefox code, with major Gecko
changes planned that will have huge impacts on Thunderbird.
Here's some thoughts on the technical debt independent of Gecko changes:
Part I: Specific subsystems
1. Replace Mork with a pluggable, async database.
Joshua and I had some conversations about this awhile back, and
generally the idea is to have an async call to a database that would
generate an immutable, read-only nsIMsgDBHdr-like object that would
represent a message. That object should be thread-safe. Any changes
would need to go through another async call.
As to the underlying database, previously we discussed using IndexedDB
for the sole reason that it was the only database then supported in Web
Workers. That needs to be rethought. SQLite would be an obvious
candidate, not sure if that could be made compatible with Web Workers or
not.
2. Rethink the view interfaces, nsMsgDBView.cpp and related subclasses.
When I have tried to review changes there, I get the impression that we
have no idea anymore of how that was supposed to work originally, so
knowing the proper level to do a change is really hard. Also,
dbViewWrapper.js seems to add another layer of fixes on top of the C++
code. At the same time, we have talked about changing the underlying
thread view from XUL to HTML, providing the capability of multi-line
displays of message summaries like most other email clients.
This whole area should probably be rethought and rewritten.
3. Mime processing.
Joshua did JSMime awhile back, focusing on processing of email headers.
We need a similar rewrite of the body MIME processing. Everyone is
afraid of the libmime code, as it consists of very ancient C code
rewritten as C++. Also, while the interfaces are essentially async, the
code is single threaded and likely the source of major performance issues.
4. Mail filters.
The underlying mail filter implementation is all done with sync,
main-thread processing. That has serious performance limitations, as
well as makes it difficult to implement multi-step filter operations
that involve async steps or steps that should be async (like MIME
processing, printing, body searches, or even the most fundamental filter
action, the move). The underlying architecture needs to be async, then
critical features added or fixed including adding pre-filter MIME
processing so that we can filter on attachment properties, fix body
searches, and allow reliable chaining or processing that might include
async steps such as printing or moves.
5. New message status
Thunderbird has a somewhat unique concept of a "new" message.
Unfortunately the whole concept is not consistent, and "new" is
conflated with many other things and used in inappropriate ways. Needs
cleanup.
6. Folder database and folder-level variables
The database with folder-level information is currently combined with
the database on message-level information, but because that is slow to
open there is a parallel database panacea.dat that is also used for
performance reasons. Many of those same things (total or unread message
count in a folder for example) are also stored in the folder C++ as well
as in the database code. Try figuring out how you are really supposed to
update, for example, the unread count in a folder. Add to that the
concept of pending changes, and it is a real mess.
7. Two competing folder listeners
There are two competing folder listener interfaces,
nsIMsgFolderListener.idl and nsIFolderListener.idl. I believe the
intention was to replace nsIFolderListener with nsIMsgFolderListener but
this was never finished.
8. Finish maildir implementation.
Part II: Meta issues.
Without planning and disciplines added to the development process, any
rework that is done is likely to deteriorate fairly rapidly. We need
policies and plans on:
A. Multi-processor usage.
Bienvenu used threading on IMAP but with so much that had to be done
main thread, the code is full of proxy steps that make it difficult. We
really need to understand what it will take to make more of our code
multi-threaded to take advantage of modern processors, including what
that looks like in C++ and in Javascript.
B. Documentation
What do we expect for documentation for 1) core developers, 2) addon
developers, and 3) user documentation? There is no documentation
anywhere of the design intent of the code, so it is difficult for
developers to maintain any consistency. There is no expectation of user
documentation. Are we happy with that? It seems to me that quality
software these days has user documentation in-tree that is updated as
part of any changes that are user-facing.
C. Approach to usage of XPCOM
Much Mozilla code has undergone so-called doCOMtamination to remove
unnecessary XPCOM methods where direct C++ calls make sense. Personally
I don't quite understand the relationship between XPCOM and possible
multi-threaded operations. Do we intend to switch ou C++ to use less
XPCOM? (Doing so would make addons like ExQuilla impossible, but if it
enables more widespread multi-threaded operation it would be useful).
D. Approach to Webidl
I've never done a Webidl interface - should that be our approach for new
code? Is it even possible for use to add Webidl interfaces?
E. Approach to XUL
Are we moving from XUL to HTML?
:rkent
On 14-03-2018 20:02, R Kent James wrote:
As to the underlying database, previously we discussed using IndexedDB
for the sole reason that it was the only database then supported in
Web Workers. That needs to be rethought. SQLite would be an obvious
candidate, not sure if that could be made compatible with Web Workers
or not.
Why do you think it needs to be rethought? To me the question isn't
whether SQLite can be tricked into working with web workers or not,
because even assuming it's possible the need would have to be pretty
great not to choose the blessed, standard solution (IndexedDB). If we go
out of our way to use technologies in ways they are not officially
supported, it's just a recipe for disaster. We must leverage the
platform, not work around it - unless really needed.
Thanks for the list! I'll add one more: finish the de-RDF work (which is
not that far from complete), since RDF is high up on the cutting list.
-Magnus
On 3/14/2018 1:49 PM, Magnus Melin wrote:
On 14-03-2018 20:02, R Kent James wrote:
As to the underlying database, previously we discussed using
IndexedDB for the sole reason that it was the only database then
supported in Web Workers. That needs to be rethought....
Why do you think it needs to be rethought?
I'm not saying don't use IndexedDB, I was just pointing out that anyone
redoing the database should not just assume that IndexedDB is the answer
without a thorough review. We need to make sure that it has the
capability we need, and that it is performant.
Two asides.
First, the message database. The use is pretty straightforward except
for threading. The message db uses some obscure Mork features to support
threading that may need some adaptation to work with non-Mork databases.
Second, gloda. I have some experience recently using ElasticSearch as a
database, and the performance of that for full-text searches is pretty
amazing. In an ideal world we would not have two independent databases
with message content (Mork and gloda's sqlite), but if performance for
gloda was a goal, using a database like ElasticSearch (or the underlying
Lucene database) that is optimized for full-text search might make a
huge performance gain (if we could put up with requiring Java).
:rkent
No, please please please don't require Java for Thunderbird. Please. I'm
begging here. OMG please. :cries:
On 3/14/18 5:56 PM, R Kent James wrote:
On 3/14/2018 1:49 PM, Magnus Melin wrote:
On 14-03-2018 20:02, R Kent James wrote:
As to the underlying database, previously we discussed using
IndexedDB for the sole reason that it was the only database then
supported in Web Workers. That needs to be rethought....
Why do you think it needs to be rethought?
I'm not saying don't use IndexedDB, I was just pointing out that
anyone redoing the database should not just assume that IndexedDB is
the answer without a thorough review. We need to make sure that it has
the capability we need, and that it is performant.
Two asides.
First, the message database. The use is pretty straightforward except
for threading. The message db uses some obscure Mork features to
support threading that may need some adaptation to work with non-Mork
databases.
Second, gloda. I have some experience recently using ElasticSearch as
a database, and the performance of that for full-text searches is
pretty amazing. In an ideal world we would not have two independent
databases with message content (Mork and gloda's sqlite), but if
performance for gloda was a goal, using a database like ElasticSearch
(or the underlying Lucene database) that is optimized for full-text
search might make a huge performance gain (if we could put up with
requiring Java).
:rkent
Maildev mailing list
Maildev@lists.thunderbird.net
http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net
On 3/14/2018 2:02 PM, R Kent James wrote:
Joshua and I had some conversations about this awhile back, and
generally the idea is to have an async call to a database that would
generate an immutable, read-only nsIMsgDBHdr-like object that would
represent a message. That object should be thread-safe. Any changes
would need to go through another async call.
As to the underlying database, previously we discussed using IndexedDB
for the sole reason that it was the only database then supported in
Web Workers. That needs to be rethought. SQLite would be an obvious
candidate, not sure if that could be made compatible with Web Workers
or not.
2. Rethink the view interfaces, nsMsgDBView.cpp and related
subclasses.
When I have tried to review changes there, I get the impression that
we have no idea anymore of how that was supposed to work originally,
so knowing the proper level to do a change is really hard. Also,
dbViewWrapper.js seems to add another layer of fixes on top of the C++
code. At the same time, we have talked about changing the underlying
thread view from XUL to HTML, providing the capability of multi-line
displays of message summaries like most other email clients.
This whole area should probably be rethought and rewritten.
These two pieces are sort of related: the thread pane (aka nsMsgDBView)
is the most performance-sensitive access to the message database, so the
database API is going to be driven in large part by what is needed to
make the DB view performant. BenB did ask me to write up my previous
thoughts on the database, and since I have a large block of
uninterrupted time next Thursday (unless the next nor'easter decides it
wants to hit then), I'm planning on doing the write up then. As you said
in your other message, the tricky part is message threading. Other than
that, the parameters of the database more or less fall out naturally.
7. Two competing folder listeners
There are two competing folder listener interfaces,
nsIMsgFolderListener.idl and nsIFolderListener.idl. I believe the
intention was to replace nsIFolderListener with nsIMsgFolderListener
but this was never finished.
And nsIURLListener, from time to time.
A. Multi-processor usage.
Bienvenu used threading on IMAP but with so much that had to be done
main thread, the code is full of proxy steps that make it difficult.
We really need to understand what it will take to make more of our
code multi-threaded to take advantage of modern processors, including
what that looks like in C++ and in Javascript.
One important comment to make: async code is not a replacement for
multithreading. There's compelling evidence that we need to be truly
multithreaded, not just rely on long-running operations trigger async
steps on the main thread.
C. Approach to usage of XPCOM
Much Mozilla code has undergone so-called doCOMtamination to remove
unnecessary XPCOM methods where direct C++ calls make sense.
Personally I don't quite understand the relationship between XPCOM and
possible multi-threaded operations. Do we intend to switch ou C++ to
use less XPCOM? (Doing so would make addons like ExQuilla impossible,
but if it enables more widespread multi-threaded operation it would be
useful).
Where XPCOM is most useful is where you have one interface that may be
implemented by many different consumers, which describes a rather large
chunk of mailnews code and is generally not the case in Gecko code.
Unfortunately, XPIDL's C++ ABI is not idiomatic modern C++, not to
mention that it lacks the capabilities to express a few useful concepts.
D. Approach to Webidl
I've never done a Webidl interface - should that be our approach for
new code? Is it even possible for use to add Webidl interfaces?
When I last broached the subject, admittedly several years ago, the
reception from Gecko people was that they'd be willing to help fix
issues needed to let Thunderbird use WebIDL.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist
On 14.03.2018 21:49, Magnus Melin wrote:
Why do you think it needs to be rethought? To me the question isn't > whether SQLite can be tricked into working with web workers or not, >
because even assuming it's possible the need would have to be pretty >
great not to choose the blessed, standard solution (IndexedDB). If we go
out of our way to use technologies in ways they are not officially >
supported, it's just a recipe for disaster. We must leverage the >
platform, not work around it - unless really needed.
Please also consider the operating perspective: having access to the
data outside the TB application is important for rescue.
For example, quite recently I had trouble with a destroyed IMAP mailbox,
but I could still retrieve the mails from TB's local copies (which are
fortunately in mbox-alike format). Took a lot of work to sort everything
and filter out some garbage, but at least I still had my mails.
--mtx
--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287
On 14.03.2018 22:56, R Kent James wrote:
I'm not saying don't use IndexedDB, I was just pointing out that anyone > redoing the database should not just assume that IndexedDB is the
answer > without a thorough review. We need to make sure that it has the
capability we need, and that it is performant.
ACK
First, the message database. The use is pretty straightforward except
for threading. The message db uses some obscure Mork features to support
threading that may need some adaptation to work with non-Mork databases.
You're talking about the message index, correct ?
Tree structures can also be done in SQL - nested sets. Not the very
best performance (insert is a bit expensive, but fetch is pretty cheap),
but should work reasonably. Of course, a graph db would offer better
performance here.
Second, gloda. I have some experience recently using ElasticSearch as a
database, and the performance of that for full-text searches is pretty
amazing. In an ideal world we would not have two independent databases
with message content (Mork and gloda's sqlite), but if performance for
gloda was a goal, using a database like ElasticSearch (or the underlying
Lucene database) that is optimized for full-text search might make a
huge performance gain (if we could put up with requiring Java).
Full-text search indeed is a performance problem. I'm observing long
delays/lockups (several seconds) while typing into the filter box.
--mtx
--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287
On 14.03.2018 23:01, Jonathan Kamens wrote:
No, please please please don't require Java for Thunderbird. Please. I'm
begging here. OMG please. :cries:
ACK. Back in the 90th, Java looked very promising, but what they
made of it in the last decades is monstreaus.
--mtx
--
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287
Jonathan,
can you please explain what is the problem with requiring Java. In my
opinion this will be great way to speed Thunderbird development. Here i
see only advantages so can you please elaborate on what is so bad about
it, except is the most popular truly platform independent and fast
programming language worldwide.
regards,
Tito
On 14.03.2018 23:01, Jonathan Kamens wrote:
No, please please please don't require Java for Thunderbird. Please. I'm
begging here. OMG please. :cries:
On 3/14/18 5:56 PM, R Kent James wrote:
On 3/14/2018 1:49 PM, Magnus Melin wrote:
On 14-03-2018 20:02, R Kent James wrote:
As to the underlying database, previously we discussed using
IndexedDB for the sole reason that it was the only database then
supported in Web Workers. That needs to be rethought....
Why do you think it needs to be rethought?
I'm not saying don't use IndexedDB, I was just pointing out that
anyone redoing the database should not just assume that IndexedDB is
the answer without a thorough review. We need to make sure that it has
the capability we need, and that it is performant.
Two asides.
First, the message database. The use is pretty straightforward except
for threading. The message db uses some obscure Mork features to
support threading that may need some adaptation to work with non-Mork
databases.
Second, gloda. I have some experience recently using ElasticSearch as
a database, and the performance of that for full-text searches is
pretty amazing. In an ideal world we would not have two independent
databases with message content (Mork and gloda's sqlite), but if
performance for gloda was a goal, using a database like ElasticSearch
(or the underlying Lucene database) that is optimized for full-text
search might make a huge performance gain (if we could put up with
requiring Java).
:rkent
Maildev mailing list
Maildev@lists.thunderbird.net
http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net
On 14.03.2018 22:56, R Kent James wrote:
ing that may need some adaptation to work with non-Mork databases.
Second, gloda. I have some experience recently using ElasticSearch as a
database, and the performance of that for full-text searches is pretty
amazing. In an ideal world we would not have two independent databases
with message content (Mork and gloda's sqlite), but if performance for
gloda was a goal, using a database like ElasticSearch (or the underlying
Lucene database) that is optimized for full-text search might make a
huge performance gain (if we could put up with requiring Java).
:rkent
+1
I think that is a great idea, if jre will be part of the environment
some miracles will happen in the TB, in the sence of functionality. In
fact it is my oipinion this will have very positive effects not only on
the database technology but also on the addons that could be a part of TB.
Tito