maildev@lists.thunderbird.net

Thunderbird email developers

View all threads

Reformatting the Thunderbird and Mailnews C++ code to Google style

JK
Jörg Knobloch
Sat, Jul 27, 2019 11:24 AM

Hello friends of Thunderbird and lovers of pretty and well-functioning code!

I always wanted to write up my experiences with the reformatting, but
never got around to it, so here goes on a rainy Saturday afternoon:

The reformatting went ahead and was completed before the end of the 68
cycle, that is, all ESR 68, beta and trunk code is reformatted.

In principle the reformatting worked simply at the "stroke of a pen" or
"push of a key", but the devil, like always, is in the detail.

Although splitting the reformatting into various directories, the
resulting patches ended up may megabytes big and up to 90.000 lines
long, IIRC. At first, I trusted the machine and pushed patches without
much checking, but later I realised that all reformatting needed careful
visual inspection.

In general, the automated reformatting works well on "normal" code, but
can mess up completely on specially laid out code or carefully crafted
comments or "ASCII art". Many comments had to be reshuffled manually,
especially inline comments which made lines over-long. And yes,
reformatting strictly enforces a maximum line length of 80 characters.

I added the directive // clang-format off on various occasions:

For example here,
https://searchfox.org/comm-central/rev/abf689ec0b31a95d13e03d44f1194bf87ef4a516/mailnews/addrbook/src/nsVCard.cpp#221,
for special tables or here for specially aligned code:
https://searchfox.org/comm-central/rev/abf689ec0b31a95d13e03d44f1194bf87ef4a516/mailnews/base/public/nsMsgLocalFolderHdrs.h#8

clang-reformatting has its own ideas of how to break up long lines, and
it times, I didn't agree with the results, so I added more directives,
for example here:
https://searchfox.org/comm-central/rev/abf689ec0b31a95d13e03d44f1194bf87ef4a516/mailnews/imap/src/nsImapIncomingServer.cpp#2987

That's not ideal. The better way would have been to define more specific
penalties for breaking lines at certain positions as described here:
https://clang.llvm.org/docs/ClangFormatStyleOptions.html

However, M-C decided not to define any such detail in their
.clang-format, so I chose not to do it either.

Another "tragic" thing happened once the reformatting was half done. M-C
decided to change their options to enforce char* foo pointer style.
Before the change, they maintained the prevalent style in the source
file, so if there was more of char *foo, that was maintained.
Consequently M-C reformatted a second time to use "left pointer" style.
I decided to avoid further churn and to maintain the prevalence rule in
our options. comm-central code mostly uses "right pointer" style, so it
mostly would have had to be reformatted a second time. See
https://bugzilla.mozilla.org/show_bug.cgi?id=1546364#c30 for some metrics.

Conclusions:

Although I checked all patches in the later part of the process visually
line be line, not much fun when they are tens of thousands of lines
long, and I rechecked some of the patches from the earlier part of the
process, it's possible that some "ugly" formatted comments or code are
now in the code base. If you see such a case, please let me know.

For future patches affecting C++ code it would be good to run the
formatting mach clang-format -p comm/dir/of/your/choice before
presenting the patch for review or checkin.

Any questions, please contact me.

Jörg.

Hello friends of Thunderbird and lovers of pretty and well-functioning code! I always wanted to write up my experiences with the reformatting, but never got around to it, so here goes on a rainy Saturday afternoon: The reformatting went ahead and was completed before the end of the 68 cycle, that is, all ESR 68, beta and trunk code is reformatted. In principle the reformatting worked simply at the "stroke of a pen" or "push of a key", but the devil, like always, is in the detail. Although splitting the reformatting into various directories, the resulting patches ended up may megabytes big and up to 90.000 lines long, IIRC. At first, I trusted the machine and pushed patches without much checking, but later I realised that all reformatting needed careful visual inspection. In general, the automated reformatting works well on "normal" code, but can mess up completely on specially laid out code or carefully crafted comments or "ASCII art". Many comments had to be reshuffled manually, especially inline comments which made lines over-long. And yes, reformatting strictly enforces a maximum line length of 80 characters. I added the directive `// clang-format off` on various occasions: For example here, https://searchfox.org/comm-central/rev/abf689ec0b31a95d13e03d44f1194bf87ef4a516/mailnews/addrbook/src/nsVCard.cpp#221, for special tables or here for specially aligned code: https://searchfox.org/comm-central/rev/abf689ec0b31a95d13e03d44f1194bf87ef4a516/mailnews/base/public/nsMsgLocalFolderHdrs.h#8 clang-reformatting has its own ideas of how to break up long lines, and it times, I didn't agree with the results, so I added more directives, for example here: https://searchfox.org/comm-central/rev/abf689ec0b31a95d13e03d44f1194bf87ef4a516/mailnews/imap/src/nsImapIncomingServer.cpp#2987 That's not ideal. The better way would have been to define more specific penalties for breaking lines at certain positions as described here: https://clang.llvm.org/docs/ClangFormatStyleOptions.html However, M-C decided not to define any such detail in their .clang-format, so I chose not to do it either. Another "tragic" thing happened once the reformatting was half done. M-C decided to change their options to enforce `char* foo` pointer style. Before the change, they maintained the prevalent style in the source file, so if there was more of `char *foo`, that was maintained. Consequently M-C reformatted a second time to use "left pointer" style. I decided to avoid further churn and to maintain the prevalence rule in our options. comm-central code mostly uses "right pointer" style, so it mostly would have had to be reformatted a second time. See https://bugzilla.mozilla.org/show_bug.cgi?id=1546364#c30 for some metrics. Conclusions: Although I checked all patches in the later part of the process visually line be line, not much fun when they are tens of thousands of lines long, and I rechecked some of the patches from the earlier part of the process, it's possible that some "ugly" formatted comments or code are now in the code base. If you see such a case, please let me know. For future patches affecting C++ code it would be good to run the formatting `mach clang-format -p comm/dir/of/your/choice` before presenting the patch for review or checkin. Any questions, please contact me. Jörg.
BC
Ben Campbell
Tue, Aug 20, 2019 10:52 PM

(old thread, I know, but I had a question)

On 27/07/19 23:24, Jörg Knobloch wrote:

I always wanted to write up my experiences with the reformatting, but
never got around to it, so here goes on a rainy Saturday afternoon:

First off - thanks for that writeup! Stuff like this always seems simple
on paper, but it's always the small details which cause the trouble.

[snip]

Another "tragic" thing happened once the reformatting was half done. M-C
decided to change their options to enforce char* foo pointer style.
Before the change, they maintained the prevalent style in the source
file, so if there was more of char *foo, that was maintained.
Consequently M-C reformatted a second time to use "left pointer" style.
I decided to avoid further churn and to maintain the prevalence rule in
our options. comm-central code mostly uses "right pointer" style, so it
mostly would have had to be reformatted a second time. See
https://bugzilla.mozilla.org/show_bug.cgi?id=1546364#c30 for some metrics.

When I run mach clang-format -s -p comm/blahblah... it always comes
back with loads of Thingy *p -> Thingy* p changes, that is,
converting from the prevalent C-C convention to the M-C convention.
I don't suppose there's some C-C specific config somewhere I'm missing?

My personal preference [1] isn't important here, but if it gets us clean
clang-format passes then the case for migrating seems to gain a little
weight.

Ben.

[1] FWIW, I lean toward the M-C Thingy* p convention, and I avoid
Thingy* thing1, thing2 multi-declarations (even after 25 years of
C/C++, I still have to spend a minute to work out what type thing2 is :- )

(old thread, I know, but I had a question) On 27/07/19 23:24, Jörg Knobloch wrote: > I always wanted to write up my experiences with the reformatting, but > never got around to it, so here goes on a rainy Saturday afternoon: First off - thanks for that writeup! Stuff like this always seems simple on paper, but it's always the small details which cause the trouble. [snip] > Another "tragic" thing happened once the reformatting was half done. M-C > decided to change their options to enforce `char* foo` pointer style. > Before the change, they maintained the prevalent style in the source > file, so if there was more of `char *foo`, that was maintained. > Consequently M-C reformatted a second time to use "left pointer" style. > I decided to avoid further churn and to maintain the prevalence rule in > our options. comm-central code mostly uses "right pointer" style, so it > mostly would have had to be reformatted a second time. See > https://bugzilla.mozilla.org/show_bug.cgi?id=1546364#c30 for some metrics. When I run `mach clang-format -s -p comm/blahblah...` it always comes back with loads of `Thingy *p` -> `Thingy* p` changes, that is, converting from the prevalent C-C convention to the M-C convention. I don't suppose there's some C-C specific config somewhere I'm missing? My personal preference [1] isn't important here, but if it gets us clean clang-format passes then the case for migrating seems to gain a little weight. Ben. [1] FWIW, I lean toward the M-C `Thingy* p` convention, and I avoid `Thingy* thing1, thing2` multi-declarations (even after 25 years of C/C++, I _still_ have to spend a minute to work out what type thing2 is :- )
JK
Jörg Knobloch
Tue, Aug 20, 2019 10:59 PM

On 21 Aug 2019 00:52, Ben Campbell wrote:

When I run mach clang-format -s -p comm/blahblah... it always comes
back with loads of Thingy *p -> Thingy* p changes, that is,
converting from the prevalent C-C convention to the M-C convention.
I don't suppose there's some C-C specific config somewhere I'm missing?

I don't know what's happening there. Aceman reported the same.

It works fine on Windows, if I do |mach clang-format -p comm/mailnews|
there isn't a single change.

There is a .clang-format file in comm/ so that should be honoured for
all the stuff in comm/.

It would be good to solve this since it stops Linux people delivering
"correct" patches.

Jörg.

On 21 Aug 2019 00:52, Ben Campbell wrote: > When I run `mach clang-format -s -p comm/blahblah...` it always comes > back with loads of `Thingy *p` -> `Thingy* p` changes, that is, > converting from the prevalent C-C convention to the M-C convention. > I don't suppose there's some C-C specific config somewhere I'm missing? I don't know what's happening there. Aceman reported the same. It works fine on Windows, if I do |mach clang-format -p comm/mailnews| there isn't a single change. There is a .clang-format file in comm/ so that should be honoured for all the stuff in comm/. It would be good to solve this since it stops Linux people delivering "correct" patches. Jörg.
GL
Geoff Lankow
Tue, Aug 20, 2019 11:20 PM

Is it like mach eslint which gives different results depending on
whether you're in the comm directory or not?

GL

On 21/08/19 10:59, Jörg Knobloch wrote:

On 21 Aug 2019 00:52, Ben Campbell wrote:

When I run mach clang-format -s -p comm/blahblah... it always comes
back with loads of Thingy *p -> Thingy* p changes, that is,
converting from the prevalent C-C convention to the M-C convention.
I don't suppose there's some C-C specific config somewhere I'm missing?

I don't know what's happening there. Aceman reported the same.

It works fine on Windows, if I do |mach clang-format -p comm/mailnews|
there isn't a single change.

There is a .clang-format file in comm/ so that should be honoured for
all the stuff in comm/.

It would be good to solve this since it stops Linux people delivering
"correct" patches.

Jörg.


Maildev mailing list
Maildev@lists.thunderbird.net
http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net

Is it like `mach eslint` which gives different results depending on whether you're in the comm directory or not? GL On 21/08/19 10:59, Jörg Knobloch wrote: > On 21 Aug 2019 00:52, Ben Campbell wrote: >> When I run `mach clang-format -s -p comm/blahblah...` it always comes >> back with loads of `Thingy *p` -> `Thingy* p` changes, that is, >> converting from the prevalent C-C convention to the M-C convention. >> I don't suppose there's some C-C specific config somewhere I'm missing? > > I don't know what's happening there. Aceman reported the same. > > It works fine on Windows, if I do |mach clang-format -p comm/mailnews| > there isn't a single change. > > There is a .clang-format file in comm/ so that should be honoured for > all the stuff in comm/. > > It would be good to solve this since it stops Linux people delivering > "correct" patches. > > Jörg. > > > _______________________________________________ > Maildev mailing list > Maildev@lists.thunderbird.net > http://lists.thunderbird.net/mailman/listinfo/maildev_lists.thunderbird.net >
BC
Ben Campbell
Tue, Aug 20, 2019 11:50 PM

On 21/08/19 11:20, Geoff Lankow wrote:

Is it like mach eslint which gives different results depending on
whether you're in the comm directory or not?

No, I tried that yesterday, but it applied the same M-C rules:

$ cd comm
$ ../mach clang-format -s -p comm/mailnews/foo/bar/wibble.cpp

(you still need the comm/ in the path otherwise it doesn't pick it up).

Anyway, glad it's not just me!
Seeing as it's a linux-only issue it's probably something for me to try
and track down

Ben.

On 21/08/19 11:20, Geoff Lankow wrote: > Is it like `mach eslint` which gives different results depending on > whether you're in the comm directory or not? No, I tried that yesterday, but it applied the same M-C rules: $ cd comm $ ../mach clang-format -s -p comm/mailnews/foo/bar/wibble.cpp (you still need the comm/ in the path otherwise it doesn't pick it up). Anyway, glad it's not just me! Seeing as it's a linux-only issue it's probably something for me to try and track down Ben.
BC
Ben Campbell
Wed, Aug 21, 2019 6:20 AM

On 21/08/19 10:59, Jörg Knobloch wrote:

There is a .clang-format file in comm/ so that should be honoured for
all the stuff in comm/.

Quick & dirty linux (and mac?) workaround do a:
$ cp comm/.clang-format obj-x86_64-pc-linux-gnu

(or whatever your obj-... dir is called) before running ./mach clang-format.

Explanation:

I think the problem occurs because the mach frontend to clang-format
first copies the files out into a tmp dir (under obj-... ) before
invoking clang-format upon it. And because the file is no longer under
comm/, our .clang-format is being ignored.

No idea why it's working on windows!

Bug filed:

https://bugzilla.mozilla.org/show_bug.cgi?id=1575449

Ben.

On 21/08/19 10:59, Jörg Knobloch wrote: > There is a .clang-format file in comm/ so that should be honoured for > all the stuff in comm/. Quick & dirty linux (and mac?) workaround do a: $ cp comm/.clang-format obj-x86_64-pc-linux-gnu (or whatever your obj-... dir is called) before running ./mach clang-format. Explanation: I think the problem occurs because the mach frontend to clang-format first copies the files out into a tmp dir (under obj-... ) before invoking clang-format upon it. And because the file is no longer under comm/, our .clang-format is being ignored. No idea why it's working on windows! Bug filed: https://bugzilla.mozilla.org/show_bug.cgi?id=1575449 Ben.
BB
Ben Bucksch
Wed, Aug 28, 2019 10:27 PM

Hi Jörg,

thanks for the report.

The code generally looks nice, but as you said, the problem is in the
details.

Jörg Knobloch wrote on 27.07.19 13:24:

I decided to avoid further churn and to maintain the prevalence rule
in our options. comm-central code mostly uses "right pointer" style,
so it mostly would have had to be reformatted a second time. See
https://bugzilla.mozilla.org/show_bug.cgi?id=1546364#c30 for some
metrics.

FWIW, that came with a warning. I told you that "char* foo" is the right
style and mail code that used "char *foo" is wrong and should be
changed. The reason why we had that was that some of our code originates
around 1993/1994, and people still ate pure steel for breakfast and
pushed bytes in the memory with a knife, which they had left over from
their delivery pizza for dinner.

Today, we use a more sophisticated tool to push the * over to the left.

Ben

Hi Jörg, thanks for the report. The code generally looks nice, but as you said, the problem is in the details. Jörg Knobloch wrote on 27.07.19 13:24: > I decided to avoid further churn and to maintain the prevalence rule > in our options. comm-central code mostly uses "right pointer" style, > so it mostly would have had to be reformatted a second time. See > https://bugzilla.mozilla.org/show_bug.cgi?id=1546364#c30 for some > metrics. FWIW, that came with a warning. I told you that "char* foo" is the right style and mail code that used "char *foo" is wrong and should be changed. The reason why we had that was that some of our code originates around 1993/1994, and people still ate pure steel for breakfast and pushed bytes in the memory with a knife, which they had left over from their delivery pizza for dinner. Today, we use a more sophisticated tool to push the * over to the left. Ben
MM
Magnus Melin
Thu, Aug 29, 2019 7:09 AM

On 29-08-2019 01:27, Ben Bucksch wrote:

Hi Jörg,

thanks for the report.

The code generally looks nice, but as you said, the problem is in the
details.

Jörg Knobloch wrote on 27.07.19 13:24:

I decided to avoid further churn and to maintain the prevalence rule
in our options. comm-central code mostly uses "right pointer" style,
so it mostly would have had to be reformatted a second time. See
https://bugzilla.mozilla.org/show_bug.cgi?id=1546364#c30 for some
metrics.

FWIW, that came with a warning. I told you that "char* foo" is the
right style and mail code that used "char *foo" is wrong and should be
changed. The reason why we had that was that some of our code
originates around 1993/1994, and people still ate pure steel for
breakfast and pushed bytes in the memory with a knife, which they had
left over from their delivery pizza for dinner.

Today, we use a more sophisticated tool to push the * over to the left.

I agree left style is the right style ;)

I do find the "maintain the prevalence rule" quite odd. That's just...
what? It doesn't give you consistency which is what the reformatting is
all about.

 -Magnus

On 29-08-2019 01:27, Ben Bucksch wrote: > Hi Jörg, > > thanks for the report. > > The code generally looks nice, but as you said, the problem is in the > details. > > Jörg Knobloch wrote on 27.07.19 13:24: >> I decided to avoid further churn and to maintain the prevalence rule >> in our options. comm-central code mostly uses "right pointer" style, >> so it mostly would have had to be reformatted a second time. See >> https://bugzilla.mozilla.org/show_bug.cgi?id=1546364#c30 for some >> metrics. > > > FWIW, that came with a warning. I told you that "char* foo" is the > right style and mail code that used "char *foo" is wrong and should be > changed. The reason why we had that was that some of our code > originates around 1993/1994, and people still ate pure steel for > breakfast and pushed bytes in the memory with a knife, which they had > left over from their delivery pizza for dinner. > > Today, we use a more sophisticated tool to push the * over to the left. I agree left style is the right style ;) I do find the "maintain the prevalence rule" quite odd. That's just... what? It doesn't give you consistency which is what the reformatting is all about.  -Magnus
JK
Jörg Knobloch
Thu, Aug 29, 2019 7:47 AM

On 29 Aug 2019 09:09, Magnus Melin wrote:

I do find the "maintain the prevalence rule" quite odd. That's just...
what? It doesn't give you consistency which is what the reformatting
is all about.

We can fix that at the end of the 76 cycle if people have such strong
feelings.

Prevalence was chosen as the default since it was the default in M-C at
the time of starting the formatting. That they changed it half way
through, was surprising. I wanted to avoid more churn in moving them
all. As I said: "Right style" is prevalent in C-C.

Are there other opinions? I prefer "right style" due to the |int *x, y;|
issue.

BTW, I've researched it a bit and there are differing opinions, so if
nothing else, we should vote on it.

Jörg.

On 29 Aug 2019 09:09, Magnus Melin wrote: > I do find the "maintain the prevalence rule" quite odd. That's just... > what? It doesn't give you consistency which is what the reformatting > is all about. We can fix that at the end of the 76 cycle if people have such strong feelings. Prevalence was chosen as the default since it was the default in M-C at the time of *starting* the formatting. That they changed it half way through, was surprising. I wanted to avoid more churn in moving them all. As I said: "Right style" is prevalent in C-C. Are there other opinions? I prefer "right style" due to the |int *x, y;| issue. BTW, I've researched it a bit and there are differing opinions, so if nothing else, we should vote on it. Jörg.