Empathy List Archives

JK

Jörg Knobloch

Mon, Aug 28, 2017 8:08 AM

Hi all,

yesterday I had an interesting conversation with Ben who wanted to know how I'm doing. We talked about bustage-fixes in some detail.

According to the weekly reports there are upto 11 bustages per week (22nd August 2017: 10, 8th August 2017: 11) and it is my job to fix it.

Bustage fixes can be as trivial as adding a 0/nullptr parameter to one single C++ call, or they can be as complicated as taking two weeks to implement (bug 1363281, port new Rust-based encoding to TB: 2017-05-30 to 2017-06-15). Luckily that change was announced before, so it didn't hit us by surprise.

The worst bustage fixes are the ones where M-C merge 150 changesets and one of our tests stops working and it is very hard to relate the test failure to any of the 150 commit messages. I have magic powers and a secret weapon to do this, but it's stressful. That said, that sort of bustage is fortunately not too frequent but much-hated.

So Ben suggested to improve continuous integration:

After every M-C merge (of say more then 10 changesets, since they also merge single changesets) trigger a new build of TB. Currently I'm monitoring M-C merges and I'm triggering builds manually by pushing a patch or re-running jobs. I think it's not realistic to do a build after every single Mozilla commit since the commit to two incoming queues and changes get merged to M-C in blocks of 20-150 changesets.
Have an automated bisector that once a problem is detected can re-run builds to locate the failure. Ben suggested for me to buy a fast Linux machine where a build takes 10 minutes, but it would need to be pretty automated, since I can't nurture along a bisection.

Thanks for your attention, I'm just trying to capture some ideas here.

Jörg.

RK

R Kent James

Mon, Aug 28, 2017 4:51 PM

On 8/28/2017 1:08 AM, Jörg Knobloch wrote:

So Ben suggested to improve continuous integration:

After every M-C merge (of say more then 10 changesets, since they
also merge single changesets) trigger a new build of TB. Currently
I'm monitoring M-C merges and I'm triggering builds manually by
pushing a patch or re-running jobs. I think it's not realistic to
do a build after every single Mozilla commit since the commit to
two incoming queues and changes get merged to M-C in blocks of
20-150 changesets.
Have an automated bisector that once a problem is detected can
re-run builds to locate the failure. Ben suggested for me to buy a
fast Linux machine where a build takes 10 minutes, but it would
need to be pretty automated, since I can't nurture along a bisection.

Thanks for your attention, I'm just trying to capture some ideas here.

It might be practical to pick one of our platforms (probably 64 bit
Linux) and automatically run a complete set of tests on each m-c
checkin. I doubt if an automatic bisection would be practical.

:rkent

On 8/28/2017 1:08 AM, Jörg Knobloch wrote: > So Ben suggested to improve continuous integration: > > * After every M-C merge (of say more then 10 changesets, since they > also merge single changesets) trigger a new build of TB. Currently > I'm monitoring M-C merges and I'm triggering builds manually by > pushing a patch or re-running jobs. I think it's not realistic to > do a build after every single Mozilla commit since the commit to > two incoming queues and changes get merged to M-C in blocks of > 20-150 changesets. > * Have an automated bisector that once a problem is detected can > re-run builds to locate the failure. Ben suggested for me to buy a > fast Linux machine where a build takes 10 minutes, but it would > need to be pretty automated, since I can't nurture along a bisection. > > Thanks for your attention, I'm just trying to capture some ideas here. > It might be practical to pick one of our platforms (probably 64 bit Linux) and automatically run a complete set of tests on each m-c checkin. I doubt if an automatic bisection would be practical. :rkent

JC

Joshua Cranmer 🐧

Mon, Aug 28, 2017 5:12 PM

On 8/28/2017 11:51 AM, R Kent James wrote:

On 8/28/2017 1:08 AM, Jörg Knobloch wrote:

So Ben suggested to improve continuous integration:

After every M-C merge (of say more then 10 changesets, since they
also merge single changesets) trigger a new build of TB.
Currently I'm monitoring M-C merges and I'm triggering builds
manually by pushing a patch or re-running jobs. I think it's not
realistic to do a build after every single Mozilla commit since
the commit to two incoming queues and changes get merged to M-C
in blocks of 20-150 changesets.
Have an automated bisector that once a problem is detected can
re-run builds to locate the failure. Ben suggested for me to buy
a fast Linux machine where a build takes 10 minutes, but it would
need to be pretty automated, since I can't nurture along a bisection.

Thanks for your attention, I'm just trying to capture some ideas here.

It might be practical to pick one of our platforms (probably 64 bit
Linux) and automatically run a complete set of tests on each m-c
checkin. I doubt if an automatic bisection would be practical.

It would be fairly easy, I think, to make a shared script for bisecting
m-c failures if the failing test is an xpcshell test (I have written
that before, I think; lord knows where that went off to).

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

On 8/28/2017 11:51 AM, R Kent James wrote: > On 8/28/2017 1:08 AM, Jörg Knobloch wrote: >> So Ben suggested to improve continuous integration: >> >> * After every M-C merge (of say more then 10 changesets, since they >> also merge single changesets) trigger a new build of TB. >> Currently I'm monitoring M-C merges and I'm triggering builds >> manually by pushing a patch or re-running jobs. I think it's not >> realistic to do a build after every single Mozilla commit since >> the commit to two incoming queues and changes get merged to M-C >> in blocks of 20-150 changesets. >> * Have an automated bisector that once a problem is detected can >> re-run builds to locate the failure. Ben suggested for me to buy >> a fast Linux machine where a build takes 10 minutes, but it would >> need to be pretty automated, since I can't nurture along a bisection. >> >> Thanks for your attention, I'm just trying to capture some ideas here. >> > > It might be practical to pick one of our platforms (probably 64 bit > Linux) and automatically run a complete set of tests on each m-c > checkin. I doubt if an automatic bisection would be practical. It would be fairly easy, I think, to make a shared script for bisecting m-c failures if the failing test is an xpcshell test (I have written that before, I think; lord knows where that went off to). -- Joshua Cranmer Thunderbird and DXR developer Source code archæologist

JK

Jörg Knobloch

Mon, Aug 28, 2017 6:21 PM

On 28/08/2017 18:51, R Kent James wrote:

It might be practical to pick one of our platforms (probably 64 bit
Linux) and automatically run a complete set of tests on each m-c checkin.

Preferably a platform that can run tests as all tests are broken on
Linux right now ;-)

But yes, we don't need to re-run all platforms. With "m-c checkin" you
mean "m-c merge" of 50+ changesets or do you propose to do it more
granularly?

Jörg.

On 28/08/2017 18:51, R Kent James wrote: > It might be practical to pick one of our platforms (probably 64 bit > Linux) and automatically run a complete set of tests on each m-c checkin. Preferably a platform that can run tests as all tests are broken on Linux right now ;-) But yes, we don't need to re-run all platforms. With "m-c checkin" you mean "m-c merge" of 50+ changesets or do you propose to do it more granularly? Jörg.

JK

Jörg Knobloch

Mon, Aug 28, 2017 6:24 PM

On 28/08/2017 19:12, Joshua Cranmer 🐧 wrote:

It would be fairly easy, I think, to make a shared script for
bisecting m-c failures if the failing test is an xpcshell test (I have
written that before, I think; lord knows where that went off to).

Sometimes a Xpcshell test fails, sometimes Mozmill. JS and
database/mozStorage changes typically show up in Xpcshell tests, DOM
security/content policy, editor, etc. more in Mozmill, but there is no rule.

Jörg.

On 28/08/2017 19:12, Joshua Cranmer 🐧 wrote: > It would be fairly easy, I think, to make a shared script for > bisecting m-c failures if the failing test is an xpcshell test (I have > written that before, I think; lord knows where that went off to). Sometimes a Xpcshell test fails, sometimes Mozmill. JS and database/mozStorage changes typically show up in Xpcshell tests, DOM security/content policy, editor, etc. more in Mozmill, but there is no rule. Jörg.

RK

R Kent James

Mon, Aug 28, 2017 6:30 PM

On 8/28/2017 11:21 AM, Jörg Knobloch wrote:

On 28/08/2017 18:51, R Kent James wrote:

It might be practical to pick one of our platforms (probably 64 bit
Linux) and automatically run a complete set of tests on each m-c
checkin.

Preferably a platform that can run tests as all tests are broken on
Linux right now ;-)

But yes, we don't need to re-run all platforms. With "m-c checkin" you
mean "m-c merge" of 50+ changesets or do you propose to do it more
granularly?

It seems to me if we are going to do it at all, it would make sense to
do it per changeset on one platform.

:rkent

On 8/28/2017 11:21 AM, Jörg Knobloch wrote: > On 28/08/2017 18:51, R Kent James wrote: >> It might be practical to pick one of our platforms (probably 64 bit >> Linux) and automatically run a complete set of tests on each m-c >> checkin. > > Preferably a platform that can run tests as all tests are broken on > Linux right now ;-) > > But yes, we don't need to re-run all platforms. With "m-c checkin" you > mean "m-c merge" of 50+ changesets or do you propose to do it more > granularly? It seems to me if we are going to do it at all, it would make sense to do it per changeset on one platform. :rkent

JK

Jörg Knobloch

Mon, Aug 28, 2017 6:51 PM

On 28/08/2017 20:30, R Kent James wrote:

It seems to me if we are going to do it at all, it would make sense to
do it per changeset on one platform.

Hmm, yes, in principle. The problem is that some M-C bugs consist of
multiple changesets which might be interdependent.

My point is that if 150 changesets get merged, it would cost a lot of
resources to do 150 builds, but it would only cost 10 compilations to
bisect a failure.

If we had a machine that took 10 minutes to build (Ben claims that his
not top-of-the-line Ryzen-based machine can do it in 12 minutes) and
assuming tests take 10+10 minutes, we'd have a result in five hours.

That's of course only for non-obvious cases. As I said, many times,
studying the commit messages carefully can identify the problem.

Jörg.

On 28/08/2017 20:30, R Kent James wrote: > It seems to me if we are going to do it at all, it would make sense to > do it per changeset on one platform. Hmm, yes, in principle. The problem is that some M-C bugs consist of multiple changesets which might be interdependent. My point is that if 150 changesets get merged, it would cost a lot of resources to do 150 builds, but it would only cost 10 compilations to bisect a failure. If we had a machine that took 10 minutes to build (Ben claims that his not top-of-the-line Ryzen-based machine can do it in 12 minutes) and assuming tests take 10+10 minutes, we'd have a result in five hours. That's of course only for non-obvious cases. As I said, many times, studying the commit messages carefully can identify the problem. Jörg.

BB

Ben Bucksch

Mon, Aug 28, 2017 11:00 PM

Jörg Knobloch wrote on 28.08.2017 20:51:

On 28/08/2017 20:30, R Kent James wrote:

It seems to me if we are going to do it at all, it would make sense
to do it per changeset on one platform.

Hmm, yes, in principle. The problem is that some M-C bugs consist of
multiple changesets which might be interdependent.

My point is that if 150 changesets get merged, it would cost a lot of
resources to do 150 builds, but it would only cost 10 compilations to
bisect a failure.

If we had a machine that took 10 minutes to build (Ben claims that his
not top-of-the-line Ryzen-based machine can do it in 12 minutes) and
assuming tests take 10+10 minutes, we'd have a result in five hours.

That's of course only for non-obvious cases. As I said, many times,
studying the commit messages carefully can identify the problem.

Yes, if we can script the bisect, that would be great.

Given that we can run tests automatically, and know their result, there
shouldn't be a problem to bisect for both xpcshell and mozmill test
failures.

Ben

Jörg Knobloch wrote on 28.08.2017 20:51: > On 28/08/2017 20:30, R Kent James wrote: >> It seems to me if we are going to do it at all, it would make sense >> to do it per changeset on one platform. > > Hmm, yes, in principle. The problem is that some M-C bugs consist of > multiple changesets which might be interdependent. > > My point is that if 150 changesets get merged, it would cost a lot of > resources to do 150 builds, but it would only cost 10 compilations to > bisect a failure. > > If we had a machine that took 10 minutes to build (Ben claims that his > not top-of-the-line Ryzen-based machine can do it in 12 minutes) and > assuming tests take 10+10 minutes, we'd have a result in five hours. > > That's of course only for non-obvious cases. As I said, many times, > studying the commit messages carefully can identify the problem. Yes, if we can script the bisect, that would be great. Given that we can run tests automatically, and know their result, there shouldn't be a problem to bisect for both xpcshell and mozmill test failures. Ben

AH

Andrei Hajdukewycz

Mon, Aug 28, 2017 11:13 PM

On 2017-08-28 11:51 AM, Jörg Knobloch wrote:

If we had a machine that took 10 minutes to build (Ben claims that his
not top-of-the-line Ryzen-based machine can do it in 12 minutes) and
assuming tests take 10+10 minutes, we'd have a result in five hours.

I don't know how often this is needed, but if you need superfast builds
like this, AWS has 20 core machines (m4.10xlarge) for $2ish per hour and
linode has similar for $0.96/hour. There are even more cores(the most
seems to be 64vcpus which is 32 hyperthreaded cores) available on the
AWS side as well if beneficial.

Given that the Mozilla build infrastructure runs on AWS, it seems like
it ought to be possible to do Taskcluster builds that do exactly this?
If not, any automation script you can write should be runnable on AWS or
Linode.

On 2017-08-28 11:51 AM, Jörg Knobloch wrote: > If we had a machine that took 10 minutes to build (Ben claims that his > not top-of-the-line Ryzen-based machine can do it in 12 minutes) and > assuming tests take 10+10 minutes, we'd have a result in five hours. I don't know how often this is needed, but if you need superfast builds like this, AWS has 20 core machines (m4.10xlarge) for $2ish per hour and linode has similar for $0.96/hour. There are even more cores(the most seems to be 64vcpus which is 32 hyperthreaded cores) available on the AWS side as well if beneficial. Given that the Mozilla build infrastructure runs on AWS, it seems like it ought to be possible to do Taskcluster builds that do exactly this? If not, any automation script you can write should be runnable on AWS or Linode.

BB

Ben Bucksch

Tue, Aug 29, 2017 12:27 PM

Andrei Hajdukewycz wrote on 29.08.2017 01:13:

On 2017-08-28 11:51 AM, Jörg Knobloch wrote:

If we had a machine that took 10 minutes to build (Ben claims that
his not top-of-the-line Ryzen-based machine can do it in 12 minutes)
and assuming tests take 10+10 minutes, we'd have a result in five hours.

I don't know how often this is needed, but if you need superfast
builds like this, AWS has 20 core machines (m4.10xlarge) for $2ish per
hour and linode has similar for $0.96/hour. There are even more
cores(the most seems to be 64vcpus which is 32 hyperthreaded cores)
available on the AWS side as well if beneficial.

Jörg said he gets 150 m-c commits (in a single merge) twice a day, so
that's 300 builds per day.
If such a machine needs 20 minutes to compile and test (but I am not
sure 10 mins are sufficient for the tests), that means $0.32 USD per
build or 35000 USD/year.

I think we're much cheaper off renting dedicated servers (e.g. 8-core
for 50 Eur/month) at a hosting facility like hetzner.de. 10 such 8-core
machines would cost only 6000 Eur/year. I would hope that 10 machines
are plenty sufficient as build farm. 300 commits/day means 12
builds/hour, which means about 1 build/h per 8-core machine.

We wouldn't necessarily have to wait for the build until the m-c merge.
We could build commits on m-c incoming, and then the m-c merge as a
single build. If the latter fails, we check the incoming commit builds.

Ben

Andrei Hajdukewycz wrote on 29.08.2017 01:13: > On 2017-08-28 11:51 AM, Jörg Knobloch wrote: >> If we had a machine that took 10 minutes to build (Ben claims that >> his not top-of-the-line Ryzen-based machine can do it in 12 minutes) >> and assuming tests take 10+10 minutes, we'd have a result in five hours. > > I don't know how often this is needed, but if you need superfast > builds like this, AWS has 20 core machines (m4.10xlarge) for $2ish per > hour and linode has similar for $0.96/hour. There are even more > cores(the most seems to be 64vcpus which is 32 hyperthreaded cores) > available on the AWS side as well if beneficial. Jörg said he gets 150 m-c commits (in a single merge) twice a day, so that's 300 builds per day. If such a machine needs 20 minutes to compile and test (but I am not sure 10 mins are sufficient for the tests), that means $0.32 USD per build or 35000 USD/year. I think we're much cheaper off renting dedicated servers (e.g. 8-core for 50 Eur/month) at a hosting facility like hetzner.de. 10 such 8-core machines would cost only 6000 Eur/year. I would hope that 10 machines are plenty sufficient as build farm. 300 commits/day means 12 builds/hour, which means about 1 build/h per 8-core machine. We wouldn't necessarily have to wait for the build until the m-c merge. We could build commits on m-c incoming, and then the m-c merge as a single build. If the latter fails, we check the incoming commit builds. Ben

maildev@lists.thunderbird.net

How could better continuous integration make my live better?