maildev@lists.thunderbird.net

Thunderbird email developers

View all threads

Running thunderbird within mochitest UNDER VALGRIND

I
ISHIKAWA,chiaki
Thu, Oct 15, 2020 9:20 AM

Has anyone been able to run thunderbird under mochitest under valgrind?

I have been struggling to run TB under valgrind when executing mochitest
to check for memory errors.

(I used to this under |make mozmill| test framework, and I could fix
maybe a few dozen memory errors.)

Unfortunately, after switching to mochitest framework since December
2019, I have not been able to
run TB under valgrind satisfactorily at all.

Recent problem is that I see successful invocation of TB (I have to use
a replacement binary for TB, that pretends to be thunderbird binary, but
actually executes a saved TB binary under valgrind because there is an
argument passing issue to |mach| when valgrind is invoked using
|--valgrind| option.)
after upping the number of threads under valgrind to 5000, I get timeout.
Timeout IS expected because TB runs slowly under valgrind, but I am
not sure where the
timeout occurs exactly.

runtests.py seems to wait for "browser" and then times out.

From the log:

 0:03.53 INFO Application command:
/NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird -marionette
-foreground -profile /COMM-CENTRAL/TMP-DIR/tmpbShgrj.mozrunner
 0:03.54 INFO runtests.py | Application pid: 6110
 0:03.54 Started process GECKO(6110)
 0:03.54 GECKO(6110) run-valgrind (masquerading as thunderbird binary)
 0:03.54 GECKO(6110) final command line is:
 0:03.55 GECKO(6110) valgrind --trace-children=yes --fair-sched=yes
--smc-check=all-non-file --gen-suppressions=all -v --trace-signals=yes
--vex-iropt-register-updates=allregs-at-mem-access --track-origins=yes
--child-silent-after-fork=yes
--trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,/bin/certutil,/bin/pk12util,/bin/ssltunnel,/bin/uname,/bin/which,/bin/ps,/bin/grep,/bin/java,/fix-stacks,/firefox/firefox,/bin/firefox-esr,/bin/python,/bin/python2,/bin/python2.7,/bin/bash
--max-threads=5000  --max-stackframe=16000000 --num-transtab-sectors=24
--tool=memcheck --freelist-vol=500000000 --redzone-size=128
--px-default=allregs-at-mem-access
--px-file-backed=unwindregs-at-mem-access --malloc-fill=0xA5
--free-fill=0xC3 --num-callers=50
--suppressions=/home/ishikawa/Dropbox/myown.sup
--show-mismatched-frees=no --show-possibly-lost=no
--read-inline-info=yes
/KERNEL-SRC/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -marionette
-foreground -profile /COMM-CENTRAL/TMP-DIR/tmpbShgrj.mozrunner
 0:03.56 GECKO(6110) ==6113== Memcheck, a memory error detector
 0:03.56 GECKO(6110) ==6113== Copyright (C) 2002-2017, and GNU GPL'd,
by Julian Seward et al.
 0:03.56 GECKO(6110) ==6113== Using
Valgrind-3.16.0.GIT-c48fb614a1-20200507X and LibVEX; rerun with -h for
copyright info
 0:03.56 GECKO(6110) ==6113== Command:
/KERNEL-SRC/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -marionette
-foreground -profile /COMM-CENTRAL/TMP-DIR/tmpbShgrj.mozrunner
 0:03.56 GECKO(6110) ==6113==
 0:03.56 GECKO(6110) --6113-- Valgrind options:
 0:03.56 GECKO(6110) --6113--    --trace-children=yes
 0:03.56 GECKO(6110) --6113--    --fair-sched=yes
 0:03.56 GECKO(6110) --6113--    --smc-check=all-non-file
 0:03.56 GECKO(6110) --6113--    --gen-suppressions=all
 0:03.56 GECKO(6110) --6113--    -v
 0:03.56 GECKO(6110) --6113--    --trace-signals=yes
 0:03.56 GECKO(6110) --6113--
--vex-iropt-register-updates=allregs-at-mem-access
 0:03.56 GECKO(6110) --6113--    --track-origins=yes
 0:03.56 GECKO(6110) --6113--    --child-silent-after-fork=yes
 0:03.56 GECKO(6110) --6113--
--trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,
/bin/certutil,/bin/pk12util,/bin/ssltunnel,/bin/uname,/bin/which,/bin/ps,/bin/grep,/bin/java,/fix-stacks,/firefox/firefox,/bin/firefox-esr,/bin/python,/bin/python2,/bin/python2.7,/bin/bash
 0:03.56 GECKO(6110) --6113--    --max-threads=5000
 0:03.56 GECKO(6110) --6113--    --max-stackframe=16000000
 0:03.56 GECKO(6110) --6113--    --num-transtab-sectors=24
 0:03.56 GECKO(6110) --6113--    --tool=memcheck
 0:03.56 GECKO(6110) --6113--    --freelist-vol=500000000
 0:03.56 GECKO(6110) --6113--    --redzone-size=128
 0:03.56 GECKO(6110) --6113-- --px-default=allregs-at-mem-access
 0:03.56 GECKO(6110) --6113-- --px-file-backed=unwindregs-at-mem-access
 0:03.56 GECKO(6110) --6113--    --malloc-fill=0xA5
 0:03.56 GECKO(6110) --6113--    --free-fill=0xC3
 0:03.56 GECKO(6110) --6113--    --num-callers=50
 0:03.56 GECKO(6110) --6113--
--suppressions=/home/ishikawa/Dropbox/myown.sup
 0:03.56 GECKO(6110) --6113--    --show-mismatched-frees=no
 0:03.56 GECKO(6110) --6113--    --show-possibly-lost=no
 0:03.56 GECKO(6110) --6113--    --read-inline-info=yes
 0:03.56 GECKO(6110) --6113-- Contents of /proc/version:
 0:03.56 GECKO(6110) --6113--   Linux version 5.6.0-1-amd64
(debian-kernel@lists.debian.org) (gcc version 9.3.0 (Debian 9.3.0-11))
#1 SMP Debian 5.6.7-1 (2020-04-29)
 0:03.56 GECKO(6110) --6113--
 0:03.56 GECKO(6110) --6113-- Arch and hwcaps: AMD64, LittleEndian,
amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-rdrand
 0:03.56 GECKO(6110) --6113-- Page sizes: currently 4096, max supported
4096
 0:03.56 GECKO(6110) --6113-- Valgrind library directory:
/usr/local/lib/valgrind

The later on, I see the following line:

 2:03.56 INFO runtests.py | Waiting for browser...

Eventually,

20:06.17 INFO TEST-UNEXPECTED-TIMEOUT | automation.py | application
timed out after 2960 seconds with no output
20:06.46 ERROR Force-terminating active process(es).
20:06.74 INFO Determining child pids from psutil...
20:11.00 INFO [6111]
20:12.27 INFO ==> process 6113 launched child process 6147
20:12.31 INFO Found child pids: set([6147, 6111])
20:13.01 INFO Failed to get child procs
20:13.02 INFO Killing process: 6147
20:13.38 INFO Failed to retrieve MOZ_UPLOAD_DIR env var
20:13.46 INFO Can't trigger Breakpad, just killing process
20:13.56 INFO Error: Failed to kill process 6147: psutil.NoSuchProcess
no process found with pid 6147
20:13.56 INFO Killing process: 6111
20:13.56 INFO Not taking screenshot here: see the one that was
previously logged
20:13.56 INFO Can't trigger Breakpad, just killing process
20:44.39 INFO psutil found pid 6111 dead
20:44.39 INFO failed to kill pid 6119 after 30s
20:44.39 INFO failed to kill pid 6113 after 30s
20:44.39 INFO Killing process: 6110
20:44.39 INFO Not taking screenshot here: see the one that was
previously logged
20:44.39 INFO Can't trigger Breakpad, just killing process
20:44.49 INFO psutil found pid 6110 dead
20:44.49 INFO psutil found pid 6110 dead
23:25.34 WARNING runtests.py | Failed to get app exit code -
running/crashed?
23:25.40 Main app process: exit 0
23:25.52 INFO runtests.py | Application ran for: 0:23:21.973599
23:25.55 INFO zombiecheck | Reading PID log:
/COMM-CENTRAL/TMP-DIR/tmpeA04Wqpidlog
23:25.85 INFO ==> process 6113 launched child process 6147
23:26.18 INFO zombiecheck | Checking for orphan process with PID: 6147
Traceback (most recent call last):
  File
"/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/testing/mochitest/runtests.py",
line 2993, in doTests
    e10s=options.e10s
  File
"/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/testing/mochitest/runtests.py",
line 2456, in runApp
    six.reraise(exc, value, tb)
  File
"/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/testing/mochitest/runtests.py",
line 2369, in runApp
    self.marionette.start_session()
  File
"/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/decorators.py",
line 36, in _
    m._handle_socket_failure()
  File
"/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/marionette.py",
line 654, in _handle_socket_failure
    reraise(exc_cls, exc, tb)
  File
"/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/decorators.py",
line 26, in _
    return func(*args, **kwargs)
  File
"/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/marionette.py",
line 1112, in start_session
    self.raise_for_port(timeout=timeout)
  File
"/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/marionette.py",
line 573, in raise_for_port
    self.host, self.port))
timeout: Timed out waiting for connection on 127.0.0.1:2828!
23:42.19 ERROR Automation Error: Received unexpected exception while
running application
23:42.28 INFO Stopping web server
23:44.90 INFO Stopping web socket server
23:45.82 INFO Stopping ssltunnel

So nothing useful has been logged at all.
(Do I need to exclude a program that is supposed to service port 2828
from running under valgrind as in
"--trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,/bin/certutil,/bin/pk12util,/bin/ssltunnel,/bin/uname,/bin/which,/bin/ps,/bin/grep,/bin/java,/fix-stacks,/firefox/firefox,/bin/firefox-esr,/bin/python,/bin/python2,/bin/python2.7,*/bin/bash"
???)

With |make mozmill| after lengthening  timeout value setting, TB under
valgrind ran the tests more or less successfully.

Any thought or personal experience with valgrind with TB with mochitest?

Chiaki

Has anyone been able to run thunderbird under mochitest under valgrind? I have been struggling to run TB under valgrind when executing mochitest to check for memory errors. (I used to this under |make mozmill| test framework, and I could fix maybe a few dozen memory errors.) Unfortunately, after switching to mochitest framework since December 2019, I have not been able to run TB under valgrind satisfactorily at all. Recent problem is that I see successful invocation of TB (I have to use a replacement binary for TB, that pretends to be thunderbird binary, but actually executes a saved TB binary under valgrind because there is an argument passing issue to |mach| when valgrind is invoked using |--valgrind| option.) after upping the number of threads under valgrind to 5000, I get timeout. Timeout *IS* expected because TB runs slowly under valgrind, but I am not sure where the timeout occurs exactly. runtests.py seems to wait for "browser" and then times out. From the log:  0:03.53 INFO Application command: /NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird -marionette -foreground -profile /COMM-CENTRAL/TMP-DIR/tmpbShgrj.mozrunner  0:03.54 INFO runtests.py | Application pid: 6110  0:03.54 Started process `GECKO(6110)`  0:03.54 GECKO(6110) run-valgrind (masquerading as thunderbird binary)  0:03.54 GECKO(6110) final command line is:  0:03.55 GECKO(6110) valgrind --trace-children=yes --fair-sched=yes --smc-check=all-non-file --gen-suppressions=all -v --trace-signals=yes --vex-iropt-register-updates=allregs-at-mem-access --track-origins=yes --child-silent-after-fork=yes --trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel,*/bin/uname,*/bin/which,*/bin/ps,*/bin/grep,*/bin/java,*/fix-stacks,*/firefox/firefox,*/bin/firefox-esr,*/bin/python,*/bin/python2,*/bin/python2.7,*/bin/bash --max-threads=5000  --max-stackframe=16000000 --num-transtab-sectors=24 --tool=memcheck --freelist-vol=500000000 --redzone-size=128 --px-default=allregs-at-mem-access --px-file-backed=unwindregs-at-mem-access --malloc-fill=0xA5 --free-fill=0xC3 --num-callers=50 --suppressions=/home/ishikawa/Dropbox/myown.sup --show-mismatched-frees=no --show-possibly-lost=no --read-inline-info=yes /KERNEL-SRC/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -marionette -foreground -profile /COMM-CENTRAL/TMP-DIR/tmpbShgrj.mozrunner  0:03.56 GECKO(6110) ==6113== Memcheck, a memory error detector  0:03.56 GECKO(6110) ==6113== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.  0:03.56 GECKO(6110) ==6113== Using Valgrind-3.16.0.GIT-c48fb614a1-20200507X and LibVEX; rerun with -h for copyright info  0:03.56 GECKO(6110) ==6113== Command: /KERNEL-SRC/moz-obj-dir/objdir-tb3/dist/bin/thunderbird-bin -marionette -foreground -profile /COMM-CENTRAL/TMP-DIR/tmpbShgrj.mozrunner  0:03.56 GECKO(6110) ==6113==  0:03.56 GECKO(6110) --6113-- Valgrind options:  0:03.56 GECKO(6110) --6113--    --trace-children=yes  0:03.56 GECKO(6110) --6113--    --fair-sched=yes  0:03.56 GECKO(6110) --6113--    --smc-check=all-non-file  0:03.56 GECKO(6110) --6113--    --gen-suppressions=all  0:03.56 GECKO(6110) --6113--    -v  0:03.56 GECKO(6110) --6113--    --trace-signals=yes  0:03.56 GECKO(6110) --6113-- --vex-iropt-register-updates=allregs-at-mem-access  0:03.56 GECKO(6110) --6113--    --track-origins=yes  0:03.56 GECKO(6110) --6113--    --child-silent-after-fork=yes  0:03.56 GECKO(6110) --6113-- --trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel,*/bin/uname,*/bin/which,*/bin/ps,*/bin/grep,*/bin/java,*/fix-stacks,*/firefox/firefox,*/bin/firefox-esr,*/bin/python,*/bin/python2,*/bin/python2.7,*/bin/bash  0:03.56 GECKO(6110) --6113--    --max-threads=5000  0:03.56 GECKO(6110) --6113--    --max-stackframe=16000000  0:03.56 GECKO(6110) --6113--    --num-transtab-sectors=24  0:03.56 GECKO(6110) --6113--    --tool=memcheck  0:03.56 GECKO(6110) --6113--    --freelist-vol=500000000  0:03.56 GECKO(6110) --6113--    --redzone-size=128  0:03.56 GECKO(6110) --6113-- --px-default=allregs-at-mem-access  0:03.56 GECKO(6110) --6113-- --px-file-backed=unwindregs-at-mem-access  0:03.56 GECKO(6110) --6113--    --malloc-fill=0xA5  0:03.56 GECKO(6110) --6113--    --free-fill=0xC3  0:03.56 GECKO(6110) --6113--    --num-callers=50  0:03.56 GECKO(6110) --6113-- --suppressions=/home/ishikawa/Dropbox/myown.sup  0:03.56 GECKO(6110) --6113--    --show-mismatched-frees=no  0:03.56 GECKO(6110) --6113--    --show-possibly-lost=no  0:03.56 GECKO(6110) --6113--    --read-inline-info=yes  0:03.56 GECKO(6110) --6113-- Contents of /proc/version:  0:03.56 GECKO(6110) --6113--   Linux version 5.6.0-1-amd64 (debian-kernel@lists.debian.org) (gcc version 9.3.0 (Debian 9.3.0-11)) #1 SMP Debian 5.6.7-1 (2020-04-29)  0:03.56 GECKO(6110) --6113--  0:03.56 GECKO(6110) --6113-- Arch and hwcaps: AMD64, LittleEndian, amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-rdrand  0:03.56 GECKO(6110) --6113-- Page sizes: currently 4096, max supported 4096  0:03.56 GECKO(6110) --6113-- Valgrind library directory: /usr/local/lib/valgrind The later on, I see the following line:  2:03.56 INFO runtests.py | Waiting for browser... Eventually, 20:06.17 INFO TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 2960 seconds with no output 20:06.46 ERROR Force-terminating active process(es). 20:06.74 INFO Determining child pids from psutil... 20:11.00 INFO [6111] 20:12.27 INFO ==> process 6113 launched child process 6147 20:12.31 INFO Found child pids: set([6147, 6111]) 20:13.01 INFO Failed to get child procs 20:13.02 INFO Killing process: 6147 20:13.38 INFO Failed to retrieve MOZ_UPLOAD_DIR env var 20:13.46 INFO Can't trigger Breakpad, just killing process 20:13.56 INFO Error: Failed to kill process 6147: psutil.NoSuchProcess no process found with pid 6147 20:13.56 INFO Killing process: 6111 20:13.56 INFO Not taking screenshot here: see the one that was previously logged 20:13.56 INFO Can't trigger Breakpad, just killing process 20:44.39 INFO psutil found pid 6111 dead 20:44.39 INFO failed to kill pid 6119 after 30s 20:44.39 INFO failed to kill pid 6113 after 30s 20:44.39 INFO Killing process: 6110 20:44.39 INFO Not taking screenshot here: see the one that was previously logged 20:44.39 INFO Can't trigger Breakpad, just killing process 20:44.49 INFO psutil found pid 6110 dead 20:44.49 INFO psutil found pid 6110 dead 23:25.34 WARNING runtests.py | Failed to get app exit code - running/crashed? 23:25.40 Main app process: exit 0 23:25.52 INFO runtests.py | Application ran for: 0:23:21.973599 23:25.55 INFO zombiecheck | Reading PID log: /COMM-CENTRAL/TMP-DIR/tmpeA04Wqpidlog 23:25.85 INFO ==> process 6113 launched child process 6147 23:26.18 INFO zombiecheck | Checking for orphan process with PID: 6147 Traceback (most recent call last):   File "/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/testing/mochitest/runtests.py", line 2993, in doTests     e10s=options.e10s   File "/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/testing/mochitest/runtests.py", line 2456, in runApp     six.reraise(exc, value, tb)   File "/NEW-SSD/moz-obj-dir/objdir-tb3/_tests/testing/mochitest/runtests.py", line 2369, in runApp     self.marionette.start_session()   File "/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/decorators.py", line 36, in _     m._handle_socket_failure()   File "/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/marionette.py", line 654, in _handle_socket_failure     reraise(exc_cls, exc, tb)   File "/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/decorators.py", line 26, in _     return func(*args, **kwargs)   File "/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/marionette.py", line 1112, in start_session     self.raise_for_port(timeout=timeout)   File "/NEW-SSD/NREF-COMM-CENTRAL/mozilla/testing/marionette/client/marionette_driver/marionette.py", line 573, in raise_for_port     self.host, self.port)) timeout: Timed out waiting for connection on 127.0.0.1:2828! 23:42.19 ERROR Automation Error: Received unexpected exception while running application 23:42.28 INFO Stopping web server 23:44.90 INFO Stopping web socket server 23:45.82 INFO Stopping ssltunnel So nothing useful has been logged at all. (Do I need to exclude a program that is supposed to service port 2828 from running under valgrind as in "--trace-children-skip=/usr/bin/lsb_release,/usr/bin/hg,/bin/rm,*/bin/certutil,*/bin/pk12util,*/bin/ssltunnel,*/bin/uname,*/bin/which,*/bin/ps,*/bin/grep,*/bin/java,*/fix-stacks,*/firefox/firefox,*/bin/firefox-esr,*/bin/python,*/bin/python2,*/bin/python2.7,*/bin/bash" ???) With |make mozmill| after lengthening  timeout value setting, TB under valgrind ran the tests more or less successfully. Any thought or personal experience with valgrind with TB with mochitest? Chiaki