On 16 Aug 2019 16:29, Mirko Brodesser wrote:
mPreFormattedMail
was just one example, there's for instance also
OutputFormatFlowed
which seems to be used only by Thunderbird.
Hi again,
just an answer on this snippet. Yes, this is very important for TB.
Looking at
https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false®exp=false&path=,
at least the code in
dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have some
use if someone used ENCODE_FLAGS_FORMAT_FLOWED.
Jörg.
FWIW, this code is very code and very central to Thunderbird. It is this
class that creates the outgoing emails, esp. the plain text part.
The plain text is still very relevant today, as many mail
servers/firewalls remove HTML and allow only plain text, for security
reasons, given that most "spear phishing" happens via directed HTML
emails and e.g. the German parliament and other high-profile places were
hacked this way, as far as I know, so plain text is still very relevant
today.
"Format=Flowed" allows line wrapping in plain text mails, and
unambiguous quotes etc.. It is what makes plain text mails usable. All
plain text mail that Thunderbird sends is format=flowed.
The nsPlainTextSerializer is the code that creates the plain text that
we send. The reason why it lives in Gecko content/ instead of in
Thunderbird is that the Gecko architecture did not allow any other way.
This code is highly involved, and contains many important corner cases.
Any tiny
change in there
will produce visible regressions. For example, we once had a bug where no
newline was
inserted after a <blockquote cite="">. This little change already
visibly messed up
all mails,
even though it was a 1-line change. There are many such cases. The HTML
to plaintext conversion
is very complex.
I cannot even imagine what happens when format=flowed doesn't work
correctly. We'd
be back to "comb line breaking" like I am showing here. format=flowed solves
very real problems.
For example, adapting to different windows sizes, correct quoting, and
others.
There are years of knowledge, feedback and bug fixes in this code.
All this would have to be re-written. This is a complex project and
needs to be done by somebody who understands all these complications and
corner cases. This is very central to Thunderbird, and mistakes here
will be obvious not only to our users, but also to their recipients.
Proposals:
Leave status quo. Just because you guys don't understand the code,
doesn't mean that it has to be destroyed. There have been numerous
technical changes (like string API changes, Parser API changes etc.) to
the code in the past, by people who didn't understand the subject
matter. You just have to preserve the logic.
Move the class into Thunderbird. Gecko would need to create the
infrastructure for such classes to live outside. The class would need to
stay exactly as it is, at least its logic.
Re-implement it on the Thunderbird side. This is a very complex
project with high chances of very visible regressions that impact users
in a major way. Realistically, this can be done only by the original
authors of the code (Daniel, Akkana Peck and me). Everybody else
wouldn't even know what to pay attention for.
I recommend option 1. Please don't break us.
Ben
Jörg Knobloch wrote on 16.08.19 23:19:
On 16 Aug 2019 16:29, Mirko Brodesser wrote:
mPreFormattedMail
was just one example, there's for instance also
OutputFormatFlowed
which seems to be used only by Thunderbird.
Hi again,
just an answer on this snippet. Yes, this is very important for TB.
Looking at
https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false®exp=false&path=,
at least the code in
dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have
some use if someone used ENCODE_FLAGS_FORMAT_FLOWED.
Jörg.
Maildev mailing list
Maildev@lists.thunderbird.net
http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net
While Thunderbird uses format=flowed we're pretty much the only widely
used client that has any support for it, with non of the big webmails,
not Outlook, not Mail on Mac... so the usefulness of it is fairly
questionable. Especially as once a non-supporting client is joining the
email thread it doesn't really work out anymore.
I do think general plain text email is fairly important, but claims that
"many servers/firewalls remove HTML and allow only plain text" would
have to come with some references. Over the years, I can't recall even
one case of that passing by. Perhaps something like that can be enforced
for very internal traffic, but it couldn't work for general email usage.
-Magnus
On 18-08-2019 17:25, Ben Bucksch wrote:
FWIW, this code is very code and very central to Thunderbird. It is this
class that creates the outgoing emails, esp. the plain text part.
The plain text is still very relevant today, as many mail
servers/firewalls remove HTML and allow only plain text, for security
reasons, given that most "spear phishing" happens via directed HTML
emails and e.g. the German parliament and other high-profile places were
hacked this way, as far as I know, so plain text is still very relevant
today.
"Format=Flowed" allows line wrapping in plain text mails, and
unambiguous quotes etc.. It is what makes plain text mails usable. All
plain text mail that Thunderbird sends is format=flowed.
The nsPlainTextSerializer is the code that creates the plain text that
we send. The reason why it lives in Gecko content/ instead of in
Thunderbird is that the Gecko architecture did not allow any other way.
This code is highly involved, and contains many important corner cases.
Any tiny
change in there
will produce visible regressions. For example, we once had a bug where no
newline was
inserted after a <blockquote cite="">. This little change already
visibly messed up
all mails,
even though it was a 1-line change. There are many such cases. The HTML
to plaintext conversion
is very complex.
I cannot even imagine what happens when format=flowed doesn't work
correctly. We'd
be back to "comb line breaking" like I am showing here. format=flowed solves
very real problems.
For example, adapting to different windows sizes, correct quoting, and
others.
There are years of knowledge, feedback and bug fixes in this code.
All this would have to be re-written. This is a complex project and
needs to be done by somebody who understands all these complications and
corner cases. This is very central to Thunderbird, and mistakes here
will be obvious not only to our users, but also to their recipients.
Proposals:
Leave status quo. Just because you guys don't understand the code,
doesn't mean that it has to be destroyed. There have been numerous
technical changes (like string API changes, Parser API changes etc.) to
the code in the past, by people who didn't understand the subject
matter. You just have to preserve the logic.
Move the class into Thunderbird. Gecko would need to create the
infrastructure for such classes to live outside. The class would need to
stay exactly as it is, at least its logic.
Re-implement it on the Thunderbird side. This is a very complex
project with high chances of very visible regressions that impact users
in a major way. Realistically, this can be done only by the original
authors of the code (Daniel, Akkana Peck and me). Everybody else
wouldn't even know what to pay attention for.
I recommend option 1. Please don't break us.
Ben
Jörg Knobloch wrote on 16.08.19 23:19:
On 16 Aug 2019 16:29, Mirko Brodesser wrote:
mPreFormattedMail
was just one example, there's for instance also
OutputFormatFlowed
which seems to be used only by Thunderbird.
Hi again,
just an answer on this snippet. Yes, this is very important for TB.
Looking at
https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false®exp=false&path=,
at least the code in
dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have
some use if someone used ENCODE_FLAGS_FORMAT_FLOWED.
Jörg.
Maildev mailing list
Maildev@lists.thunderbird.net
http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net
Magnus Melin wrote on 19.08.19 11:22:
I do think general plain text email is fairly important, but claims
that "many servers/firewalls remove HTML and allow only plain text"
would have to come with some references. Over the years, I can't
recall even one case of that passing by. Perhaps something like that
can be enforced for very internal traffic, but it couldn't work for
general email usage.
I've seen this often in my personal communications with some
origanizations, e.g. most banks, many government offices etc..
While Thunderbird uses format=flowed we're pretty much the only widely
used client that has any support for it, with non of the big webmails,
not Outlook, not Mail on Mac... so the usefulness of it is fairly
questionable. Especially as once a non-supporting client is joining
the email thread it doesn't really work out anymore.
Even if Thunderbird is the only email client that supports it, it's
still useful between Thunderbird users. It allows me to read your
messages properly, and vise versa. The same is true for organizations
that use Thunderbird. We have some very large organizations that use
Thunderbird, e.g. the French police, and half of the French ministries
use Thunderbird internally.
format=flowed isn't even the big problem here. Even if you drop
format=flowed, you still have to solve the same problems. Plaintext
seems simple at first glance, but has many problems and foot angles that
become apparent with use. format=flowed actually solves a lot of actual
and very real problems with plaintext, in a uniform way, instead of
having to hack around them. Please believe me, I've teared many hairs
with normal plaintext.
format=flowed is essentially "plaintext done right".
Ben Bucksch wrote on 18.08.19 16:25:
I've been thinking about this, and if we want to go this route, the
simplest approach would be:
The Gecko content/ parser parses the HTML for us. Then, it basically
takes the internal document structure as the parser sees it, and calls
the Serializer interface on it, which is basically a sequence of
ElementStarted(element), TextNode(text), ElementStopped(element) etc.
functions. The Serializer implements these functions and uses the calls
to output the document in the form that it needs. In case of the
nsPlaintextSerializer, it outputs plaintext, including all the special
stuff like quotes that are needed for plaintext email.
If we were to move this to Thunderbird, I would suggest that we start
with the DOM interface. We implement a little class that walks the DOM,
calls the ElementStarted(element), TextNode(text),
ElementStopped(element) etc. functions for each the node, and drives the
nsPlaintextSerializer this way. This shouldn't be hard. It's just
programming leg work. Given that the DOM interface is public, all this
can live in Thunderbird.
Ben
Ben Bucksch wrote on 18.08.19 16:25:
I've been thinking about this, and if we want to go this route:
Currently, what happens is: The Gecko content/ parser parses the HTML
for us. Then, it basically takes the internal document structure as the
parser sees it, and calls the Serializer interface on it, which is
basically a sequence of ElementStarted(element), TextNode(text),
ElementStopped(element) etc. functions. The Serializer implements these
functions and uses the calls to output the document in the form that it
needs. In case of the nsPlaintextSerializer, it outputs plaintext,
including all the special stuff like quotes that are needed for
plaintext email.
If we were to move this to Thunderbird, we could start with the DOM
interface. We implement a little class that walks the DOM, calls the
ElementStarted(element), TextNode(text), ElementStopped(element) etc.
functions for each the node, and drives the nsPlaintextSerializer this
way. This shouldn't be hard. It's just programming leg work. Given that
the DOM interface is public, all this can live in Thunderbird.
Ben
On 19 Aug 2019 11:39, Ben Bucksch wrote:
Even if Thunderbird is the only email client that supports it, it's
still useful between Thunderbird users. It allows me to read your
messages properly, and vise versa. The same is true for organizations
that use Thunderbird. We have some very large organizations that use
Thunderbird, e.g. the French police, and half of the French ministries
use Thunderbird internally.
[snip]
format=flowed is essentially "plaintext done right".
I fully agree.
Let's not forget that TB converts HTML to plaintext by default if the
message doesn't really contain any formatting elements. If that were
converted to non-flowed, we'd really have to think hard whether that's
still a viable option since the resulting cut-up plaintext can't be
decently interpreted any more.
Jörg.
Hi Ben, Jörg, Magnus, (+Cc Hsin-Yi)
thanks for your responses. All information you've provided helped
increasing my understanding of Thunderbird's relation to Gecko.
While having further analyzed the code, I've come up with smaller
refactoring ideas which intend to not break any existing functionality.
I'll likely reach out to you in the future again, so that we can further
improve our collaboration.
Mirko
On Mon, Aug 19, 2019 at 11:50 AM Ben Bucksch ben.bucksch@beonex.com wrote:
Ben Bucksch wrote on 18.08.19 16:25:
I've been thinking about this, and if we want to go this route:
Currently, what happens is: The Gecko content/ parser parses the HTML
for us. Then, it basically takes the internal document structure as the
parser sees it, and calls the Serializer interface on it, which is
basically a sequence of ElementStarted(element), TextNode(text),
ElementStopped(element) etc. functions. The Serializer implements these
functions and uses the calls to output the document in the form that it
needs. In case of the nsPlaintextSerializer, it outputs plaintext,
including all the special stuff like quotes that are needed for
plaintext email.
If we were to move this to Thunderbird, we could start with the DOM
interface. We implement a little class that walks the DOM, calls the
ElementStarted(element), TextNode(text), ElementStopped(element) etc.
functions for each the node, and drives the nsPlaintextSerializer this
way. This shouldn't be hard. It's just programming leg work. Given that
the DOM interface is public, all this can live in Thunderbird.
Ben
Hello Mirko,
thank you so much! This is deeply appreciated. This helps Thunderbird (and its users and their recipients) a lot. Relieve!
If in the future you want to de-couple the serializers, I think it would be a possible approach to implement a class that walks the DOM, calls the ElementStarted(element), TextNode(text), ElementStopped(element) etc. functions for each the node, and drives the Serializer. This shouldn't be hard to implement (unless I'm missing some technical details). Such a class might be useful for the other serializers as well, not just Thunderbird.
Ben
Mirko Brodesser wrote on 20.08.19 11:01:
Hi Ben, Jörg, Magnus, (+Cc Hsin-Yi)
thanks for your responses. All information you've provided helped increasing my understanding of Thunderbird's relation to Gecko.
While having further analyzed the code, I've come up with smaller refactoring ideas which intend to not break any existing functionality.
I'll likely reach out to you in the future again, so that we can further improve our collaboration.
Mirko
On Mon, Aug 19, 2019 at 11:50 AM Ben Bucksch <ben.bucksch@beonex.com> wrote:
Ben Bucksch wrote on 18.08.19 16:25:
> * Move the class into Thunderbird. Gecko would need to create the
> infrastructure for such classes to live outside. The class would need
> to stay exactly as it is, at least its logic.I've been thinking about this, and if we want to go this route:
Currently, what happens is: The Gecko content/ parser parses the HTML
for us. Then, it basically takes the internal document structure as the
parser sees it, and calls the Serializer interface on it, which is
basically a sequence of ElementStarted(element), TextNode(text),
ElementStopped(element) etc. functions. The Serializer implements these
functions and uses the calls to output the document in the form that it
needs. In case of the nsPlaintextSerializer, it outputs plaintext,
including all the special stuff like quotes that are needed for
plaintext email.If we were to move this to Thunderbird, we could start with the DOM
interface. We implement a little class that walks the DOM, calls the
ElementStarted(element), TextNode(text), ElementStopped(element) etc.
functions for each the node, and drives the nsPlaintextSerializer this
way. This shouldn't be hard. It's just programming leg work. Given that
the DOM interface is public, all this can live in Thunderbird.Ben