maildev@lists.thunderbird.net

Thunderbird email developers

View all threads

Kent doesn't use body searches because they don't work reliably

JK
Jörg Knobloch
Mon, Jan 1, 2018 11:17 PM

Hi all,

the subject summarises it: Body searches "traditionally" haven't worked
well in TB.

One aspect was that base64-encoded bodies weren't treated correctly and
many false positives were created by matching raw base64 data of body
parts, both text and attachments.

Very recently I discovered that Joshua had already addressed this issue
back in bug 132340 (six digits, from 2002, fixed in 2009), but the
solution went only 60% of the way and didn't cover common scenarios like
nested multipart messages (text, HTML and attachment, or text and HTML
with embedded image).

Instead of waiting for the all-happy-making new MIME implementation, I
beat the code in nsMsgBodyHandler.cpp into shape and added 10 new
real-life test cases. That file has its own small MIME header parser and
does direct input from the mailbox/maildir files. Sadly, due to a logic
error, instead of searching through one message, it read many lines of
subsequent messages giving false positives ... and bad performance.

So all in all, I killed another of my pet hates and body search should
have just become a whole lot more reliable.

Jörg.

Hi all, the subject summarises it: Body searches "traditionally" haven't worked well in TB. One aspect was that base64-encoded bodies weren't treated correctly and many false positives were created by matching raw base64 data of body parts, both text and attachments. Very recently I discovered that Joshua had already addressed this issue back in bug 132340 (six digits, from 2002, fixed in 2009), but the solution went only 60% of the way and didn't cover common scenarios like nested multipart messages (text, HTML and attachment, or text and HTML with embedded image). Instead of waiting for the all-happy-making new MIME implementation, I beat the code in nsMsgBodyHandler.cpp into shape and added 10 new real-life test cases. That file has its own small MIME header parser and does direct input from the mailbox/maildir files. Sadly, due to a logic error, instead of searching through one message, it read many lines of subsequent messages giving false positives ... and bad performance. So all in all, I killed another of my pet hates and body search should have just become a whole lot more reliable. Jörg.
JK
Jörg Knobloch
Wed, Jan 3, 2018 12:33 AM

On 02/01/2018 00:17, Jörg Knobloch wrote:

So all in all, I killed another of my pet hates and body search should
have just become a whole lot more reliable.

Another bug landed today that fixed the problem that in multipart
messages only ASCII test was ever found (unless the message part
encoding matched the default folder encoding) since charset information
of parts was generally ignored.

So body search of local messages (including quick filter) and local
filter operations should now work in all cases.

Jörg.

On 02/01/2018 00:17, Jörg Knobloch wrote: > So all in all, I killed another of my pet hates and body search should > have just become a whole lot more reliable. Another bug landed today that fixed the problem that in multipart messages only ASCII test was ever found (unless the message part encoding matched the default folder encoding) since charset information of parts was generally ignored. So body search of local messages (including quick filter) and local filter operations should now work in all cases. Jörg.