maildev@lists.thunderbird.net

Thunderbird email developers

View all threads

Re: [Maildev] Removing Thunderbird-code in nsPlainTextSerializer

JK
Jörg Knobloch
Fri, Aug 16, 2019 9:19 PM

On 16 Aug 2019 16:29, Mirko Brodesser wrote:

mPreFormattedMail was just one example, there's for instance also
OutputFormatFlowed which seems to be used only by Thunderbird.

Hi again,

just an answer on this snippet. Yes, this is very important for TB.

Looking at
https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false&regexp=false&path=,
at least the code in
dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have some
use if someone used ENCODE_FLAGS_FORMAT_FLOWED.

Jörg.

On 16 Aug 2019 16:29, Mirko Brodesser wrote: > `mPreFormattedMail` was just one example, there's for instance also > `OutputFormatFlowed` which seems to be used only by Thunderbird. Hi again, just an answer on this snippet. Yes, this is very important for TB. Looking at https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false&regexp=false&path=, at least the code in dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have some use if someone used ENCODE_FLAGS_FORMAT_FLOWED. Jörg.
BB
Ben Bucksch
Sun, Aug 18, 2019 2:25 PM

FWIW, this code is very code and very central to Thunderbird. It is this
class that creates the outgoing emails, esp. the plain text part.

The plain text is still very relevant today, as many mail
servers/firewalls remove HTML and allow only plain text, for security
reasons, given that most "spear phishing" happens via directed HTML
emails and e.g. the German parliament and other high-profile places were
hacked this way, as far as I know, so plain text is still very relevant
today.

"Format=Flowed" allows line wrapping in plain text mails, and
unambiguous quotes etc.. It is what makes plain text mails usable. All
plain text mail that Thunderbird sends is format=flowed.

The nsPlainTextSerializer is the code that creates the plain text that
we send. The reason why it lives in Gecko content/ instead of in
Thunderbird is that the Gecko architecture did not allow any other way.

This code is highly involved, and contains many important corner cases.
Any tiny
change in there
will produce visible regressions. For example, we once had a bug where no
newline was
inserted after a <blockquote cite="">. This little change already
visibly messed up
all mails,
even though it was a 1-line change. There are many such cases. The HTML
to plaintext conversion
is very complex.
I cannot even imagine what happens when format=flowed doesn't work
correctly. We'd
be back to "comb line breaking" like I am showing here. format=flowed solves
very real problems.
For example, adapting to different windows sizes, correct quoting, and
others.

There are years of knowledge, feedback and bug fixes in this code.

All this would have to be re-written. This is a complex project and
needs to be done by somebody who understands all these complications and
corner cases. This is very central to Thunderbird, and mistakes here
will be obvious not only to our users, but also to their recipients.

Proposals:

  • Leave status quo. Just because you guys don't understand the code,
    doesn't mean that it has to be destroyed. There have been numerous
    technical changes (like string API changes, Parser API changes etc.) to
    the code in the past, by people who didn't understand the subject
    matter. You just have to preserve the logic.

  • Move the class into Thunderbird. Gecko would need to create the
    infrastructure for such classes to live outside. The class would need to
    stay exactly as it is, at least its logic.

  • Re-implement it on the Thunderbird side. This is a very complex
    project with high chances of very visible regressions that impact users
    in a major way. Realistically, this can be done only by the original
    authors of the code (Daniel, Akkana Peck and me). Everybody else
    wouldn't even know what to pay attention for.

I recommend option 1. Please don't break us.

Ben

Jörg Knobloch wrote on 16.08.19 23:19:

On 16 Aug 2019 16:29, Mirko Brodesser wrote:

mPreFormattedMail was just one example, there's for instance also
OutputFormatFlowed which seems to be used only by Thunderbird.

Hi again,

just an answer on this snippet. Yes, this is very important for TB.

Looking at
https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false&regexp=false&path=,
at least the code in
dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have
some use if someone used ENCODE_FLAGS_FORMAT_FLOWED.

Jörg.


Maildev mailing list
Maildev@lists.thunderbird.net
http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net

FWIW, this code is very code and very central to Thunderbird. It is this class that creates the outgoing emails, esp. the plain text part. The plain text is still very relevant today, as many mail servers/firewalls remove HTML and allow only plain text, for security reasons, given that most "spear phishing" happens via directed HTML emails and e.g. the German parliament and other high-profile places were hacked this way, as far as I know, so plain text is still very relevant today. "Format=Flowed" allows line wrapping in plain text mails, and unambiguous quotes etc.. It is what makes plain text mails usable. All plain text mail that Thunderbird sends is format=flowed. The nsPlainTextSerializer is the code that creates the plain text that we send. The reason why it lives in Gecko content/ instead of in Thunderbird is that the Gecko architecture did not allow any other way. This code is highly involved, and contains many important corner cases. Any tiny change in there will produce visible regressions. For example, we once had a bug where no newline was inserted after a <blockquote cite="">. This little change already visibly messed up all mails, even though it was a 1-line change. There are many such cases. The HTML to plaintext conversion is very complex. I cannot even imagine what happens when format=flowed doesn't work correctly. We'd be back to "comb line breaking" like I am showing here. format=flowed solves very real problems. For example, adapting to different windows sizes, correct quoting, and others. There are years of knowledge, feedback and bug fixes in this code. All this would have to be re-written. This is a complex project and needs to be done by somebody who understands all these complications and corner cases. This is very central to Thunderbird, and mistakes here will be obvious not only to our users, but also to their recipients. Proposals: * Leave status quo. Just because you guys don't understand the code, doesn't mean that it has to be destroyed. There have been numerous technical changes (like string API changes, Parser API changes etc.) to the code in the past, by people who didn't understand the subject matter. You just have to preserve the logic. * Move the class into Thunderbird. Gecko would need to create the infrastructure for such classes to live outside. The class would need to stay exactly as it is, at least its logic. * Re-implement it on the Thunderbird side. This is a very complex project with high chances of very visible regressions that impact users in a major way. Realistically, this can be done only by the original authors of the code (Daniel, Akkana Peck and me). Everybody else wouldn't even know what to pay attention for. I recommend option 1. Please don't break us. Ben Jörg Knobloch wrote on 16.08.19 23:19: > On 16 Aug 2019 16:29, Mirko Brodesser wrote: >> `mPreFormattedMail` was just one example, there's for instance also >> `OutputFormatFlowed` which seems to be used only by Thunderbird. > > Hi again, > > just an answer on this snippet. Yes, this is very important for TB. > > Looking at > https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false&regexp=false&path=, > at least the code in > dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have > some use if someone used ENCODE_FLAGS_FORMAT_FLOWED. > > Jörg. > > > _______________________________________________ > Maildev mailing list > Maildev@lists.thunderbird.net > http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net >
MM
Magnus Melin
Mon, Aug 19, 2019 9:22 AM

While Thunderbird uses format=flowed we're pretty much the only widely
used client that has any support for it, with non of the big webmails,
not Outlook, not Mail on Mac... so the usefulness of it is fairly
questionable. Especially as once a non-supporting client is joining the
email thread it doesn't really work out anymore.

I do think general plain text email is fairly important, but claims that
"many servers/firewalls remove HTML and allow only plain text" would
have to come with some references. Over the years, I can't recall even
one case of that passing by. Perhaps something like that can be enforced
for very internal traffic, but it couldn't work for general email usage.

 -Magnus

On 18-08-2019 17:25, Ben Bucksch wrote:

FWIW, this code is very code and very central to Thunderbird. It is this
class that creates the outgoing emails, esp. the plain text part.

The plain text is still very relevant today, as many mail
servers/firewalls remove HTML and allow only plain text, for security
reasons, given that most "spear phishing" happens via directed HTML
emails and e.g. the German parliament and other high-profile places were
hacked this way, as far as I know, so plain text is still very relevant
today.

"Format=Flowed" allows line wrapping in plain text mails, and
unambiguous quotes etc.. It is what makes plain text mails usable. All
plain text mail that Thunderbird sends is format=flowed.

The nsPlainTextSerializer is the code that creates the plain text that
we send. The reason why it lives in Gecko content/ instead of in
Thunderbird is that the Gecko architecture did not allow any other way.

This code is highly involved, and contains many important corner cases.
Any tiny
change in there
will produce visible regressions. For example, we once had a bug where no
newline was
inserted after a <blockquote cite="">. This little change already
visibly messed up
all mails,
even though it was a 1-line change. There are many such cases. The HTML
to plaintext conversion
is very complex.
I cannot even imagine what happens when format=flowed doesn't work
correctly. We'd
be back to "comb line breaking" like I am showing here. format=flowed solves
very real problems.
For example, adapting to different windows sizes, correct quoting, and
others.

There are years of knowledge, feedback and bug fixes in this code.

All this would have to be re-written. This is a complex project and
needs to be done by somebody who understands all these complications and
corner cases. This is very central to Thunderbird, and mistakes here
will be obvious not only to our users, but also to their recipients.

Proposals:

  • Leave status quo. Just because you guys don't understand the code,
    doesn't mean that it has to be destroyed. There have been numerous
    technical changes (like string API changes, Parser API changes etc.) to
    the code in the past, by people who didn't understand the subject
    matter. You just have to preserve the logic.

  • Move the class into Thunderbird. Gecko would need to create the
    infrastructure for such classes to live outside. The class would need to
    stay exactly as it is, at least its logic.

  • Re-implement it on the Thunderbird side. This is a very complex
    project with high chances of very visible regressions that impact users
    in a major way. Realistically, this can be done only by the original
    authors of the code (Daniel, Akkana Peck and me). Everybody else
    wouldn't even know what to pay attention for.

I recommend option 1. Please don't break us.

Ben

Jörg Knobloch wrote on 16.08.19 23:19:

On 16 Aug 2019 16:29, Mirko Brodesser wrote:

mPreFormattedMail was just one example, there's for instance also
OutputFormatFlowed which seems to be used only by Thunderbird.

Hi again,

just an answer on this snippet. Yes, this is very important for TB.

Looking at
https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false&regexp=false&path=,
at least the code in
dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have
some use if someone used ENCODE_FLAGS_FORMAT_FLOWED.

Jörg.


Maildev mailing list
Maildev@lists.thunderbird.net
http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net

While Thunderbird uses format=flowed we're pretty much the only widely used client that has any support for it, with non of the big webmails, not Outlook, not Mail on Mac... so the usefulness of it is fairly questionable. Especially as once a non-supporting client is joining the email thread it doesn't really work out anymore. I do think general plain text email is fairly important, but claims that "many servers/firewalls remove HTML and allow only plain text" would have to come with some references. Over the years, I can't recall even one case of that passing by. Perhaps something like that can be enforced for very internal traffic, but it couldn't work for general email usage.  -Magnus On 18-08-2019 17:25, Ben Bucksch wrote: > FWIW, this code is very code and very central to Thunderbird. It is this > class that creates the outgoing emails, esp. the plain text part. > > The plain text is still very relevant today, as many mail > servers/firewalls remove HTML and allow only plain text, for security > reasons, given that most "spear phishing" happens via directed HTML > emails and e.g. the German parliament and other high-profile places were > hacked this way, as far as I know, so plain text is still very relevant > today. > > "Format=Flowed" allows line wrapping in plain text mails, and > unambiguous quotes etc.. It is what makes plain text mails usable. All > plain text mail that Thunderbird sends is format=flowed. > > The nsPlainTextSerializer is the code that creates the plain text that > we send. The reason why it lives in Gecko content/ instead of in > Thunderbird is that the Gecko architecture did not allow any other way. > > This code is highly involved, and contains many important corner cases. > Any tiny > change in there > will produce visible regressions. For example, we once had a bug where no > newline was > inserted after a <blockquote cite="">. This little change already > visibly messed up > all mails, > even though it was a 1-line change. There are many such cases. The HTML > to plaintext conversion > is very complex. > I cannot even imagine what happens when format=flowed doesn't work > correctly. We'd > be back to "comb line breaking" like I am showing here. format=flowed solves > very real problems. > For example, adapting to different windows sizes, correct quoting, and > others. > > There are years of knowledge, feedback and bug fixes in this code. > > All this would have to be re-written. This is a complex project and > needs to be done by somebody who understands all these complications and > corner cases. This is very central to Thunderbird, and mistakes here > will be obvious not only to our users, but also to their recipients. > > Proposals: > > * Leave status quo. Just because you guys don't understand the code, > doesn't mean that it has to be destroyed. There have been numerous > technical changes (like string API changes, Parser API changes etc.) to > the code in the past, by people who didn't understand the subject > matter. You just have to preserve the logic. > > * Move the class into Thunderbird. Gecko would need to create the > infrastructure for such classes to live outside. The class would need to > stay exactly as it is, at least its logic. > > * Re-implement it on the Thunderbird side. This is a very complex > project with high chances of very visible regressions that impact users > in a major way. Realistically, this can be done only by the original > authors of the code (Daniel, Akkana Peck and me). Everybody else > wouldn't even know what to pay attention for. > > I recommend option 1. Please don't break us. > > Ben > > > Jörg Knobloch wrote on 16.08.19 23:19: >> On 16 Aug 2019 16:29, Mirko Brodesser wrote: >>> `mPreFormattedMail` was just one example, there's for instance also >>> `OutputFormatFlowed` which seems to be used only by Thunderbird. >> Hi again, >> >> just an answer on this snippet. Yes, this is very important for TB. >> >> Looking at >> https://searchfox.org/mozilla-central/search?q=OutputFormatFlowed&case=false&regexp=false&path=, >> at least the code in >> dom/webbrowserpersist/WebBrowserPersistLocalDocument.cpp might have >> some use if someone used ENCODE_FLAGS_FORMAT_FLOWED. >> >> Jörg. >> >> >> _______________________________________________ >> Maildev mailing list >> Maildev@lists.thunderbird.net >> http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net >> > _______________________________________________ > Maildev mailing list > Maildev@lists.thunderbird.net > http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net
BB
Ben Bucksch
Mon, Aug 19, 2019 9:39 AM

Magnus Melin wrote on 19.08.19 11:22:

I do think general plain text email is fairly important, but claims
that "many servers/firewalls remove HTML and allow only plain text"
would have to come with some references. Over the years, I can't
recall even one case of that passing by. Perhaps something like that
can be enforced for very internal traffic, but it couldn't work for
general email usage.

I've seen this often in my personal communications with some
origanizations, e.g. most banks, many government offices etc..

While Thunderbird uses format=flowed we're pretty much the only widely
used client that has any support for it, with non of the big webmails,
not Outlook, not Mail on Mac... so the usefulness of it is fairly
questionable. Especially as once a non-supporting client is joining
the email thread it doesn't really work out anymore.

Even if Thunderbird is the only email client that supports it, it's
still useful between Thunderbird users. It allows me to read your
messages properly, and vise versa. The same is true for organizations
that use Thunderbird. We have some very large organizations that use
Thunderbird, e.g. the French police, and half of the French ministries
use Thunderbird internally.

format=flowed isn't even the big problem here. Even if you drop
format=flowed, you still have to solve the same problems. Plaintext
seems simple at first glance, but has many problems and foot angles that
become apparent with use. format=flowed actually solves a lot of actual
and very real problems with plaintext, in a uniform way, instead of
having to hack around them. Please believe me, I've teared many hairs
with normal plaintext.

format=flowed is essentially "plaintext done right".

Magnus Melin wrote on 19.08.19 11:22: > I do think general plain text email is fairly important, but claims > that "many servers/firewalls remove HTML and allow only plain text" > would have to come with some references. Over the years, I can't > recall even one case of that passing by. Perhaps something like that > can be enforced for very internal traffic, but it couldn't work for > general email usage. I've seen this often in my personal communications with some origanizations, e.g. most banks, many government offices etc.. > While Thunderbird uses format=flowed we're pretty much the only widely > used client that has any support for it, with non of the big webmails, > not Outlook, not Mail on Mac... so the usefulness of it is fairly > questionable. Especially as once a non-supporting client is joining > the email thread it doesn't really work out anymore. Even if Thunderbird is the only email client that supports it, it's still useful between Thunderbird users. It allows me to read your messages properly, and vise versa. The same is true for organizations that use Thunderbird. We have some very large organizations that use Thunderbird, e.g. the French police, and half of the French ministries use Thunderbird internally. format=flowed isn't even the big problem here. Even if you drop format=flowed, you still have to solve the same problems. Plaintext seems simple at first glance, but has many problems and foot angles that become apparent with use. format=flowed actually solves a lot of actual and very real problems with plaintext, in a uniform way, instead of having to hack around them. Please believe me, I've teared many hairs with normal plaintext. format=flowed is essentially "plaintext done right".
BB
Ben Bucksch
Mon, Aug 19, 2019 9:48 AM

Ben Bucksch wrote on 18.08.19 16:25:

  • Move the class into Thunderbird. Gecko would need to create the
    infrastructure for such classes to live outside. The class would need
    to stay exactly as it is, at least its logic.

I've been thinking about this, and if we want to go this route, the
simplest approach would be:

The Gecko content/ parser parses the HTML for us. Then, it basically
takes the internal document structure as the parser sees it, and calls
the Serializer interface on it, which is basically a sequence of
ElementStarted(element), TextNode(text), ElementStopped(element) etc.
functions. The Serializer implements these functions and uses the calls
to output the document in the form that it needs. In case of the
nsPlaintextSerializer, it outputs plaintext, including all the special
stuff like quotes that are needed for plaintext email.

If we were to move this to Thunderbird, I would suggest that we start
with the DOM interface. We implement a little class that walks the DOM,
calls the ElementStarted(element), TextNode(text),
ElementStopped(element) etc. functions for each the node, and drives the
nsPlaintextSerializer this way. This shouldn't be hard. It's just
programming leg work. Given that the DOM interface is public, all this
can live in Thunderbird.

Ben

Ben Bucksch wrote on 18.08.19 16:25: > * Move the class into Thunderbird. Gecko would need to create the > infrastructure for such classes to live outside. The class would need > to stay exactly as it is, at least its logic. I've been thinking about this, and if we want to go this route, the simplest approach would be: The Gecko content/ parser parses the HTML for us. Then, it basically takes the internal document structure as the parser sees it, and calls the Serializer interface on it, which is basically a sequence of ElementStarted(element), TextNode(text), ElementStopped(element) etc. functions. The Serializer implements these functions and uses the calls to output the document in the form that it needs. In case of the nsPlaintextSerializer, it outputs plaintext, including all the special stuff like quotes that are needed for plaintext email. If we were to move this to Thunderbird, I would suggest that we start with the DOM interface. We implement a little class that walks the DOM, calls the ElementStarted(element), TextNode(text), ElementStopped(element) etc. functions for each the node, and drives the nsPlaintextSerializer this way. This shouldn't be hard. It's just programming leg work. Given that the DOM interface is public, all this can live in Thunderbird. Ben
BB
Ben Bucksch
Mon, Aug 19, 2019 9:50 AM

Ben Bucksch wrote on 18.08.19 16:25:

  • Move the class into Thunderbird. Gecko would need to create the
    infrastructure for such classes to live outside. The class would need
    to stay exactly as it is, at least its logic.

I've been thinking about this, and if we want to go this route:

Currently, what happens is: The Gecko content/ parser parses the HTML
for us. Then, it basically takes the internal document structure as the
parser sees it, and calls the Serializer interface on it, which is
basically a sequence of ElementStarted(element), TextNode(text),
ElementStopped(element) etc. functions. The Serializer implements these
functions and uses the calls to output the document in the form that it
needs. In case of the nsPlaintextSerializer, it outputs plaintext,
including all the special stuff like quotes that are needed for
plaintext email.

If we were to move this to Thunderbird, we could start with the DOM
interface. We implement a little class that walks the DOM, calls the
ElementStarted(element), TextNode(text), ElementStopped(element) etc.
functions for each the node, and drives the nsPlaintextSerializer this
way. This shouldn't be hard. It's just programming leg work. Given that
the DOM interface is public, all this can live in Thunderbird.

Ben

Ben Bucksch wrote on 18.08.19 16:25: > * Move the class into Thunderbird. Gecko would need to create the > infrastructure for such classes to live outside. The class would need > to stay exactly as it is, at least its logic. I've been thinking about this, and if we want to go this route: Currently, what happens is: The Gecko content/ parser parses the HTML for us. Then, it basically takes the internal document structure as the parser sees it, and calls the Serializer interface on it, which is basically a sequence of ElementStarted(element), TextNode(text), ElementStopped(element) etc. functions. The Serializer implements these functions and uses the calls to output the document in the form that it needs. In case of the nsPlaintextSerializer, it outputs plaintext, including all the special stuff like quotes that are needed for plaintext email. If we were to move this to Thunderbird, we could start with the DOM interface. We implement a little class that walks the DOM, calls the ElementStarted(element), TextNode(text), ElementStopped(element) etc. functions for each the node, and drives the nsPlaintextSerializer this way. This shouldn't be hard. It's just programming leg work. Given that the DOM interface is public, all this can live in Thunderbird. Ben
JK
Jörg Knobloch
Mon, Aug 19, 2019 10:32 AM

On 19 Aug 2019 11:39, Ben Bucksch wrote:

Even if Thunderbird is the only email client that supports it, it's
still useful between Thunderbird users. It allows me to read your
messages properly, and vise versa. The same is true for organizations
that use Thunderbird. We have some very large organizations that use
Thunderbird, e.g. the French police, and half of the French ministries
use Thunderbird internally.

[snip]

format=flowed is essentially "plaintext done right".

I fully agree.

Let's not forget that TB converts HTML to plaintext by default if the
message doesn't really contain any formatting elements. If that were
converted to non-flowed, we'd really have to think hard whether that's
still a viable option since the resulting cut-up plaintext can't be
decently interpreted any more.

Jörg.

On 19 Aug 2019 11:39, Ben Bucksch wrote: > Even if Thunderbird is the only email client that supports it, it's > still useful between Thunderbird users. It allows me to read your > messages properly, and vise versa. The same is true for organizations > that use Thunderbird. We have some very large organizations that use > Thunderbird, e.g. the French police, and half of the French ministries > use Thunderbird internally. > > [snip] > > format=flowed is essentially "plaintext done right". I fully agree. Let's not forget that TB converts HTML to plaintext by default if the message doesn't really contain any formatting elements. If that were converted to non-flowed, we'd really have to think hard whether that's still a viable option since the resulting cut-up plaintext can't be decently interpreted any more. Jörg.
MB
Mirko Brodesser
Tue, Aug 20, 2019 9:01 AM

Hi Ben, Jörg, Magnus, (+Cc Hsin-Yi)

thanks for your responses. All information you've provided helped
increasing my understanding of Thunderbird's relation to Gecko.
While having further analyzed the code, I've come up with smaller
refactoring ideas which intend to not break any existing functionality.

I'll likely reach out to you in the future again, so that we can further
improve our collaboration.

Mirko

On Mon, Aug 19, 2019 at 11:50 AM Ben Bucksch ben.bucksch@beonex.com wrote:

Ben Bucksch wrote on 18.08.19 16:25:

  • Move the class into Thunderbird. Gecko would need to create the
    infrastructure for such classes to live outside. The class would need
    to stay exactly as it is, at least its logic.

I've been thinking about this, and if we want to go this route:

Currently, what happens is: The Gecko content/ parser parses the HTML
for us. Then, it basically takes the internal document structure as the
parser sees it, and calls the Serializer interface on it, which is
basically a sequence of ElementStarted(element), TextNode(text),
ElementStopped(element) etc. functions. The Serializer implements these
functions and uses the calls to output the document in the form that it
needs. In case of the nsPlaintextSerializer, it outputs plaintext,
including all the special stuff like quotes that are needed for
plaintext email.

If we were to move this to Thunderbird, we could start with the DOM
interface. We implement a little class that walks the DOM, calls the
ElementStarted(element), TextNode(text), ElementStopped(element) etc.
functions for each the node, and drives the nsPlaintextSerializer this
way. This shouldn't be hard. It's just programming leg work. Given that
the DOM interface is public, all this can live in Thunderbird.

Ben

Hi Ben, Jörg, Magnus, (+Cc Hsin-Yi) thanks for your responses. All information you've provided helped increasing my understanding of Thunderbird's relation to Gecko. While having further analyzed the code, I've come up with smaller refactoring ideas which intend to not break any existing functionality. I'll likely reach out to you in the future again, so that we can further improve our collaboration. Mirko On Mon, Aug 19, 2019 at 11:50 AM Ben Bucksch <ben.bucksch@beonex.com> wrote: > > Ben Bucksch wrote on 18.08.19 16:25: > > * Move the class into Thunderbird. Gecko would need to create the > > infrastructure for such classes to live outside. The class would need > > to stay exactly as it is, at least its logic. > > > I've been thinking about this, and if we want to go this route: > > Currently, what happens is: The Gecko content/ parser parses the HTML > for us. Then, it basically takes the internal document structure as the > parser sees it, and calls the Serializer interface on it, which is > basically a sequence of ElementStarted(element), TextNode(text), > ElementStopped(element) etc. functions. The Serializer implements these > functions and uses the calls to output the document in the form that it > needs. In case of the nsPlaintextSerializer, it outputs plaintext, > including all the special stuff like quotes that are needed for > plaintext email. > > If we were to move this to Thunderbird, we could start with the DOM > interface. We implement a little class that walks the DOM, calls the > ElementStarted(element), TextNode(text), ElementStopped(element) etc. > functions for each the node, and drives the nsPlaintextSerializer this > way. This shouldn't be hard. It's just programming leg work. Given that > the DOM interface is public, all this can live in Thunderbird. > > Ben > > > >
BB
Ben Bucksch
Tue, Aug 20, 2019 2:22 PM

Hello Mirko,

thank you so much! This is deeply appreciated. This helps Thunderbird (and its users and their recipients) a lot. Relieve!

If in the future you want to de-couple the serializers, I think it would be a possible approach to implement a class that walks the DOM, calls the ElementStarted(element), TextNode(text), ElementStopped(element) etc. functions for each the node, and drives the Serializer. This shouldn't be hard to implement (unless I'm missing some technical details). Such a class might be useful for the other serializers as well, not just Thunderbird.

Ben

Mirko Brodesser wrote on 20.08.19 11:01:

Hi Ben, Jörg, Magnus, (+Cc Hsin-Yi)

thanks for your responses. All information you've provided helped increasing my understanding of Thunderbird's relation to Gecko.

While having further analyzed the code, I've come up with smaller refactoring ideas which intend to not break any existing functionality.

I'll likely reach out to you in the future again, so that we can further improve our collaboration.

Mirko

On Mon, Aug 19, 2019 at 11:50 AM Ben Bucksch <ben.bucksch@beonex.com> wrote:

Ben Bucksch wrote on 18.08.19 16:25:
> * Move the class into Thunderbird. Gecko would need to create the
> infrastructure for such classes to live outside. The class would need
> to stay exactly as it is, at least its logic.

I've been thinking about this, and if we want to go this route:

Currently, what happens is: The Gecko content/ parser parses the HTML
for us. Then, it basically takes the internal document structure as the
parser sees it, and calls the Serializer interface on it, which is
basically a sequence of ElementStarted(element), TextNode(text),
ElementStopped(element) etc. functions. The Serializer implements these
functions and uses the calls to output the document in the form that it
needs. In case of the nsPlaintextSerializer, it outputs plaintext,
including all the special stuff like quotes that are needed for
plaintext email.

If we were to move this to Thunderbird, we could start with the DOM
interface. We implement a little class that walks the DOM, calls the
ElementStarted(element), TextNode(text), ElementStopped(element) etc.
functions for each the node, and drives the nsPlaintextSerializer this
way. This shouldn't be hard. It's just programming leg work. Given that
the DOM interface is public, all this can live in Thunderbird.

Ben