Empathy List Archives

maildev@lists.thunderbird.net

Thunderbird email developers

Our untested areas

Ben Bucksch

Thu, Sep 7, 2017 2:07 AM

Joshua Cranmer 🐧 wrote on 07.09.17 03:43:

A more serious problem is your belief that repeating very basic tests
on lots of servers is in any way good test coverage. The basic tests
are so shallow that the diversity of servers is illusory: servers
really don't act differently in basic scenarios, they act differently
when things get hard. When you put message/rfc822 attachments in
messages and base64-encode them. When you have two simultaneous
connections to the same folder in IMAP and start deleting things with
one connection and adding them in the other. When massive messages get
packet boundaries that routinely sit in the middle of the CRLF
endings. Quite frankly, for the test you've suggested, by the second
real-world server, running any more tests isn't going to tell me
anything interesting about diversity of implementation.

You're clearly thinking of a different sort of test than I do. I am not
saying that the kinds of tests that you mention are not important. But
please don't tell that the tests I propose are ridiculous, either.

Server implementations do react very differently, esp. before and after
login. It starts with SSL, continues with CAPS, and SASL and account
configurations. And that changes fairly regularly.

It does happen that we suddenly break on very large ISPs.

Example 1: Some ISPs even decide to outsource email and change everything.

Example 2: Yahoo screwed up recently. Actually, we screwed up many years
ago, by sending SASL commands in lowerase although they are specced as
upper case and not case sensitive, but it didn't matter, because all
servers out there treated it as case insensitive. Until Yahoo changed
the implementation and insisted on the spec and we broke. From one day
to the next. Of course it broke all users at once, in release, because
the server changed. Luckily, they rolled back, but only after a while.
We fixed it, too, but about a few hundred thousand users couldn't get
mail for a long time.

Example 3: It could be a subtle SSL cert problem. If you would see the
SSL certs that are running on IMAP and SMTP servers of some large ISPs,
you would whine. You know that we use our own SSL library, and we're the
only email client to use that, and we might be stricter in the checks
than others.

That's just a few examples. There are many more.

Yes, it's "only" about login and basic operations. But they are complex
and and do break regularly, depending on which ISP you use. It's about
completely breaking users, all of them.

We're thinking of different scenarios and even different goals. The test
suits I'm proposing is to ensure quality and operation for end users,
not to make developer's refactoring life easier.

Where we both convene, and why I responded to your post, is that
fakeserver is not the answer to everything and we massively increase
test coverage by running against real servers. I think running dovecot
and Cyrus in a container is a great idea, if you can set that up. Yes,
it will help us a lot. I just think that this is also incomplete and
checking against Yahoo and Gmail every 6 hours will be necessary as
additional tests, too.

Ben

Joshua Cranmer 🐧 wrote on 07.09.17 03:43: > A more serious problem is your belief that repeating very basic tests > on lots of servers is in any way good test coverage. The basic tests > are so shallow that the diversity of servers is illusory: servers > really don't act differently in basic scenarios, they act differently > when things get hard. When you put message/rfc822 attachments in > messages and base64-encode them. When you have two simultaneous > connections to the same folder in IMAP and start deleting things with > one connection and adding them in the other. When massive messages get > packet boundaries that routinely sit in the middle of the CRLF > endings. Quite frankly, for the test you've suggested, by the second > real-world server, running any more tests isn't going to tell me > anything interesting about diversity of implementation. You're clearly thinking of a different sort of test than I do. I am not saying that the kinds of tests that you mention are not important. But please don't tell that the tests I propose are ridiculous, either. Server implementations do react very differently, esp. before and after login. It starts with SSL, continues with CAPS, and SASL and account configurations. And that changes fairly regularly. It does happen that we suddenly break on very large ISPs. Example 1: Some ISPs even decide to outsource email and change everything. Example 2: Yahoo screwed up recently. Actually, we screwed up many years ago, by sending SASL commands in lowerase although they are specced as upper case and not case sensitive, but it didn't matter, because all servers out there treated it as case insensitive. Until Yahoo changed the implementation and insisted on the spec and we broke. From one day to the next. Of course it broke all users at once, in release, because the server changed. Luckily, they rolled back, but only after a while. We fixed it, too, but about a few hundred thousand users couldn't get mail for a long time. Example 3: It could be a subtle SSL cert problem. If you would see the SSL certs that are running on IMAP and SMTP servers of some large ISPs, you would whine. You know that we use our own SSL library, and we're the only email client to use that, and we might be stricter in the checks than others. That's just a few examples. There are many more. Yes, it's "only" about login and basic operations. But they are complex and and do break regularly, depending on which ISP you use. It's about completely breaking users, all of them. We're thinking of different scenarios and even different goals. The test suits I'm proposing is to ensure quality and operation for end users, not to make developer's refactoring life easier. Where we both convene, and why I responded to your post, is that fakeserver is not the answer to everything and we massively increase test coverage by running against real servers. I think running dovecot and Cyrus in a container is a great idea, if you can set that up. Yes, it will help us a lot. I just think that this is also incomplete and checking against Yahoo and Gmail every 6 hours will be necessary as additional tests, too. Ben