Empathy List Archives

maildev@lists.thunderbird.net

Thunderbird email developers

Fwd: [Bug 1423840] Rewrite the prefs parser

Wayne Mery

Thu, Dec 7, 2017 8:02 PM

FYI (not seen this yet on m.d. platform)

-------- Forwarded Message --------
Subject: [Bug 1423840] Rewrite the prefs parser
Date: Thu, 07 Dec 2017 10:35:58 +0000

*Comment # 1 https://bugzilla.mozilla.org/show_bug.cgi?id=1423840#c1
on Bug 1423840 https://bugzilla.mozilla.org/show_bug.cgi?id=1423840 at
2017-12-07 02:35:58 PST *

Createdattachment 8935310 <attachment.cgi?id=8935310> [details]
<attachment.cgi?id=8935310&action=edit>
Bug 1423840 <show_bug.cgi?id=1423840> - Rewrite the prefs parser.

The prefs parser has two significant problems.

It doesn't separate tokenizing from parsing.
It is implemented as a loop around a big switch on a "current state"
variable.

As a result, it is hard to understand and modify, slower than it could be, and
in obscure cases it fails to parse some valid input.

This patch replaces it with a recursive descent parser (albeit one without any
recursion!) that has separate tokenization. The new parser is easier to
understand and modify, more correct, and has better error messages (including
filenames). It also runs about 1.9x faster than the existing parser. (As
measured by parsing greprefs.js's contents from memory 1000 times in
succession, omitting the prefs hash table construction. If the table
construction is included, it's about 1.4x faster.)

The new parser is slightly stricter than the old parser in a few ways.

Disconcertingly, the old parser allowed arbitrary junk between prefs
(including at the start and end of the prefs file) so long as that junk
didn't include any of the following chars: '/', '#', 'u', 's', 'p'. I.e. a
line like this:

!foo@bar&pref("prefname", true);

would be treated the same as this:

pref("prefname", true);

The new parser disallows such junk because it isn't necessary and seems like
an unintentional botch by the old parser.
The old parser allowed character 0x1a (SUB) between tokens and treated it
like '\n'.

The new parser does not allow this character. SUB was used to indicate
end-of-file (not end-of-line) in some old operating systems such as MS-DOS,
but this doesn't seem necessary today.
The old parser tolerated (with a warning) invalid escape sequences within
string literals -- such as "\q" (not a valid escape) and "\x1" and "\u12"
(both of which have insufficient hex digits) -- accepting them literally.

The new parser does not tolerates invalid escape sequences because it doesn't
seem necessary and would complicate things.
The old parser tolerated character 0x00 (NUL) within string literals; this is
dangerous because C++ code that manipulates string values with embedded NULs
will almost certainly consider those chars as end-of-string markers.

The new parser treats NUL chars as end-of-file, to avoid this danger and
because it facilitates a significant optimization (described within the
code).
The old parser allowed integer literals to overflow, silently wrapping them.

The new parser treats integer overflow as a parse error. This seems better,
and it caught an existing overflow in testing/profiles/prefs_general.js, for
places.database.lastMaintenance.

The first of these changes meant that a couple of existing prefs with ";;" at
the end had to be changed.

The minor increase in strictness shouldn't be a problem for default pref files
such as greprefs.js within the application (which we can modify), nor for
app-written prefs files such as prefs.js. It could affect user-written prefs
files such as user.js; the experience above suggests that ";;" is the most
likely problem in practice. In my opinion, the risk here is acceptable.

The new parser should also do a better job of tracking line numbers because it
(a) treats "\r\n" sequences as a single end-of-line marker, and (a) pays
attention to end-of-line sequences within string literals.

The patch also adds a thorough test of valid syntax. (Note that the old parser
fails to correctly parse this test in a few places because it misimplemented
the grammar.) I'd like to also add tests for invalid syntax but the current
design makes this challenging, because parse errors trigger an error message on
the web console and a NS_ERROR_FILE_CORRUPTED return value from Parse().

Review commit:https://reviewboard.mozilla.org/r/206210/diff/#index_header
See other reviews:https://reviewboard.mozilla.org/r/206210/

Product/Component: Core :: Preferences: Backend

Tracking Flags:

status-firefox59:affected

You are receiving this mail because:

You are watching the component for the bug.

FYI (not seen this yet on m.d. platform) -------- Forwarded Message -------- Subject: [Bug 1423840] Rewrite the prefs parser Date: Thu, 07 Dec 2017 10:35:58 +0000 *Comment # 1 <https://bugzilla.mozilla.org/show_bug.cgi?id=1423840#c1> on Bug 1423840 <https://bugzilla.mozilla.org/show_bug.cgi?id=1423840> at 2017-12-07 02:35:58 PST * Createdattachment 8935310 <attachment.cgi?id=8935310> [details] <attachment.cgi?id=8935310&action=edit> Bug 1423840 <show_bug.cgi?id=1423840> - Rewrite the prefs parser. The prefs parser has two significant problems. - It doesn't separate tokenizing from parsing. - It is implemented as a loop around a big switch on a "current state" variable. As a result, it is hard to understand and modify, slower than it could be, and in obscure cases it fails to parse some valid input. This patch replaces it with a recursive descent parser (albeit one without any recursion!) that has separate tokenization. The new parser is easier to understand and modify, more correct, and has better error messages (including filenames). It also runs about 1.9x faster than the existing parser. (As measured by parsing greprefs.js's contents from memory 1000 times in succession, omitting the prefs hash table construction. If the table construction is included, it's about 1.4x faster.) The new parser is slightly stricter than the old parser in a few ways. - Disconcertingly, the old parser allowed arbitrary junk between prefs (including at the start and end of the prefs file) so long as that junk didn't include any of the following chars: '/', '#', 'u', 's', 'p'. I.e. a line like this: !foo@bar&pref("prefname", true); would be treated the same as this: pref("prefname", true); The new parser disallows such junk because it isn't necessary and seems like an unintentional botch by the old parser. - The old parser allowed character 0x1a (SUB) between tokens and treated it like '\n'. The new parser does not allow this character. SUB was used to indicate end-of-file (*not* end-of-line) in some old operating systems such as MS-DOS, but this doesn't seem necessary today. - The old parser tolerated (with a warning) invalid escape sequences within string literals -- such as "\q" (not a valid escape) and "\x1" and "\u12" (both of which have insufficient hex digits) -- accepting them literally. The new parser does not tolerates invalid escape sequences because it doesn't seem necessary and would complicate things. - The old parser tolerated character 0x00 (NUL) within string literals; this is dangerous because C++ code that manipulates string values with embedded NULs will almost certainly consider those chars as end-of-string markers. The new parser treats NUL chars as end-of-file, to avoid this danger and because it facilitates a significant optimization (described within the code). - The old parser allowed integer literals to overflow, silently wrapping them. The new parser treats integer overflow as a parse error. This seems better, and it caught an existing overflow in testing/profiles/prefs_general.js, for places.database.lastMaintenance. The first of these changes meant that a couple of existing prefs with ";;" at the end had to be changed. The minor increase in strictness shouldn't be a problem for default pref files such as greprefs.js within the application (which we can modify), nor for app-written prefs files such as prefs.js. It could affect user-written prefs files such as user.js; the experience above suggests that ";;" is the most likely problem in practice. In my opinion, the risk here is acceptable. The new parser should also do a better job of tracking line numbers because it (a) treats "\r\n" sequences as a single end-of-line marker, and (a) pays attention to end-of-line sequences within string literals. The patch also adds a thorough test of valid syntax. (Note that the old parser fails to correctly parse this test in a few places because it misimplemented the grammar.) I'd like to also add tests for invalid syntax but the current design makes this challenging, because parse errors trigger an error message on the web console and a NS_ERROR_FILE_CORRUPTED return value from Parse(). Review commit:https://reviewboard.mozilla.org/r/206210/diff/#index_header See other reviews:https://reviewboard.mozilla.org/r/206210/ ------------------------------------------------------------------------ Product/Component: Core :: Preferences: Backend ------------------------------------------------------------------------ *Tracking Flags:* * status-firefox59:affected ------------------------------------------------------------------------ *You are receiving this mail because:* * You are watching the component for the bug.