Hi Kent,
you recently grumbled about strings not not passing the C++ to JS boundary well and giving you some problems with non-ASCII data in ExQuilla. Can you please give some details.
I'm a bit surprised since IDL allows AUTF8String that should be able to carry any UTF-8 string, and JS strings can even carry binary data, like here:
https://dxr.mozilla.org/comm-central/rev/902ced95970415d12786f23af1292c5fd3800971/mail/components/compose/content/MsgComposeCommands.js#5727 -- streamData += stream.readBytes(stream.available());
Joshua wrote on the issue recently:
On 30/12/2016 16:11, Joshua Cranmer 🐧 wrote:
Parameters of type string or ACString are interpreted as bytestrings when called from JS, which is to say that each character is converted by dropping the high byte (i.e., ISO-8859-1). A parameter of type AUTF8String is converted to UTF-8, and wstring and AString retain UTF-16. The confusing part is that all of these parameters are treated as the same string type in JS, and ACString and AUTF8String have the same C++ representation.
Jörg.
On 8/27/2017 1:06 PM, Jörg Knobloch wrote:
Hi Kent,
you recently grumbled about strings not not passing the C++ to JS
boundary well and giving you some problems with non-ASCII data in
ExQuilla. Can you please give some details.
The specific issues that I have were with the folder name, which gets
confusing because of complex interactions between the displayed folder
name and folder URI, file folder name, and canonical folder URIs
("Trash", "Inbox") whose names may get translated by either us or by the
host server. Add to that the very strange way that the
nsIDBFolderInfo.idl interacts with nsIMsgFolderCacheElement and
panacea.dat, and it is amazingly difficult to even figure out how folder
names get set. This is a great example of a design that is out of control.
In any case, XPCONNECT does have types like AUTF8String that work, but
in the critical location nsIDBFolderInfo.idl we use ACString instead for
folderName. It is non-intuitive that ACString does not work for UTF-8
strings, but I know that now. That one is not my fault, but knowing what
I know today, this one in nsIMsgFolder is my fault (but fortunately is
rarely used):
|ACString getInheritedStringProperty(in string propertyName); |
So changes need to be made both in nsIDBFolderInfo and
nsIMsgFolderCacheElement to convert ACString to AUTF8String to allow
proper handling of UTF8 folder properties.
At some point I need to start updating ExQuilla for the upcoming TB 59,
and I'll try to address these issues at that point.
:rkent
||
R Kent James wrote on 28.08.2017 19:40:
The specific issues that I have were with the folder name, which gets
confusing because of complex interactions between the displayed folder
name and folder URI, file folder name, and canonical folder URIs
("Trash", "Inbox") whose names may get translated by either us or by
the host server. Add to that the very strange way that the
nsIDBFolderInfo.idl interacts with nsIMsgFolderCacheElement and
panacea.dat, and it is amazingly difficult to even figure out how
folder names get set. This is a great example of a design that is out
of control.
Well, that's just warts. Not necessarily bad design.
In any case, XPCONNECT does have types like AUTF8String that work, but
in the critical location nsIDBFolderInfo.idl we use ACString instead
for folderName. It is non-intuitive that ACString does not work for
UTF-8 strings, but I know that now. That one is not my fault, but
knowing what I know today, this one in nsIMsgFolder is my fault (but
fortunately is rarely used):
|ACString getInheritedStringProperty(in string propertyName); |
So changes need to be made both in nsIDBFolderInfo and
nsIMsgFolderCacheElement to convert ACString to AUTF8String to allow
proper handling of UTF8 folder properties.
Yeah, ACString is ASCII only, IIRC.
AUTF8String is Unicode, obviously.
Both are 8 bit character strings, but the encoding is different. With
ACString, the number of characters matches the number of bytes, which
allows easier allocation, access, comparison etc.. With UTF8, that is
not the case.
You want ACString only for things that are protocol codes (e.g. HTML
tags, IMAP commands etc.) Anything user-visible should be UTF8.
Given that JS strings are always Unicode, using AUTF8String there might
make things easier, too. Like you said :)
Ben
On 8/28/2017 6:58 PM, Ben Bucksch wrote:
Yeah, ACString is ASCII only, IIRC.
ACString and string convert by basically zero-extending the bytes or
lopping off high bits. There used to be a check that warned in debug
builds if values in the range \x80-\xff was passed in, but I think that
was removed for one or both types. (The term that the JS engine uses for
these kinds of strings is Latin1String--that is, they can only store the
Latin-1 page of Unicode, U+0000-U+00FF).
You want ACString only for things that are protocol codes (e.g. HTML
tags, IMAP commands etc.) Anything user-visible should be UTF8.
Given that JS strings are always Unicode, using AUTF8String there
might make things easier, too. Like you said :)
I'd say everything should be AUTF8String--it can be hard to tell what
you might want to make non-ASCII eventually. There are some cases that
might be ACString or string, but if you're asking the question, it means
you don't know, and the cases that want ACString or string are really
only for cases where people should know the answer.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist