Request #1293

From:
Account Type:
Free Account
Dreamwidth:
Account Name: [personal profile] marcmagus
Style: (S2) core: public, layout: public, theme: public, user: custom
Email confirmed? Yes
cluster: 10
data version: 10
scheme: gradation-vertical-local
Media storage used: 0.000 MB (0.0%)
Support category:
Time posted:
Mon, 11 May 2009 09:35:19 GMT (845 weeks ago)
Status:
closed (10 points to [personal profile] afuna)
Summary:
Can't edit entry - invalid text encoding
Original Request:
I've managed to create an entry which believes it has an encoding issue, such that when I try to use "Edit Entry", I receive the error message, "Client error: Invalid text encoding: Cannot display this post."

The entry itself, at http://marcmagus.dreamwidth.org/100771.html displays correctly, at least here in Firefox 3.0.7. It's posted using a jlj+vim+markdown toolchain in an en_US.utf8 locale. [Interestingly, the same toolchain posting to LiveJournal created an editable entry which clearly incorrectly encoded non-ASCII characters, whereas Dreamwidth seems to have encoded them in a way I can view, but left the entry uneditable, which prevents cross-posting.]

Please let me know if I can help track this down if it's a Dreamwidth error, or where to start looking for what's wrong if the error is on my end.
Diagnostics: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.7) Gecko/2009031314 Gentoo Firefox/3.0.7
zarhooie: Sheep growing on a stalk with Kat and SupportHelp (_support, springsheep) [personal profile] zarhooie - Cream Puff with Teeth
Answer (#6194)
Posted: Tue, 12 May 2009 15:37:24 GMT (844 weeks ago)
Hi marcmagus,

Sorry it's taken so long to get back to you. I apologize for the delay.

I am not sure if this will work, but I'd like you to give it a try, if you're willing. Please go to the display tab on the Manage Settings page (http://www.dreamwidth.org/manage/settings/?cat=display). The fifth item down is a radio button to choose which editor you'd like to use for your default. Please select HTML, then scroll down and click "save". After you've done this, please try to edit your entry, and let me know if that works. If it doesn't, then I'll do some more digging.

Thank you again for your patience!

Best,
Kat
marcmagus: Me playing cribbage in regency attire (regency cards) [personal profile] marcmagus - Magus
Comment (#6202)
Posted: Tue, 12 May 2009 16:03:01 GMT (844 weeks ago)
My editor was already set to HTML. I switched to Rich Text and failed open it there, then back to HTML, and it still didn't work. Then I played around a bit, and discovered that my other entries open for editing with the HTML editor regardless of whether my Entry Editor Default is set to Rich Text, HTML, or Last Used.

It seems this setting only affects which editor gets used for new entries, existing entries being opened in the "appropriate" editor based on some metadata in the entry?

Please let me know if there's anything else I can do to try to help track this down.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (me, standing outside a broken phone booth) [staff profile] denise - Denise
Answer (#6540)
Posted: Thu, 14 May 2009 22:50:46 GMT (844 weeks ago)
Dear Magus,

We're still trying to figure out what's causing this problem, but as a bit of an assist, can you please try making another entry using the same method you used to make the entry that's giving you trouble? When you do, if you could also include accented characters (like you used in the original entry), and then try to edit the entry and see if you get the same encoding error, that may help us figure out what step in the chain is causing the problem.

If you can reproduce the problem in that fashion, it would also be helpful if you could try removing one step in your preprocessing with each test entry you post (first taking Markdown out of the equation, then vim if possible, etc), and let us know the results of each attempt to post. We're trying to figure out which particular bit is causing the problem, but we're having some trouble reproducing the error for troubleshooting purposes, which makes it a bit more difficult.

Best,
  Denise
marcmagus: Me playing cribbage in regency attire (regency cards) [personal profile] marcmagus - Magus
Comment (#7054)
Posted: Mon, 18 May 2009 02:56:12 GMT (844 weeks ago)
Denise,

Thanks for continuing to work on this. Sorry it took me a few days to get back to you, but I was out of town with minimal computer availability for a long weekend.

I've posted a set of test posts to attempt to narrow down the source of the problem. They can be viewed at http://marcmagus.dreamwidth.org/tag/encoding

All 7 tests exhibit the problem behavior as described in the original request. Other than attempting to establish whether a specific one of é or è is the problem, which I can do if you think it will be helpful, I'm at a loss as to any further tests I can offer.

It seems like the problem is either jlj or dreamwidth (or both). That dreamwidth creates posts which appear differently in my browser than LiveJournal does might be indicative of part of the area of the code where this is going on?

JLJ can be found at http://umlautllama.com/projects/perl/#jlj -- I am using 2.12, which is the latest version there.
afuna: Cat under a blanket. Text: "Cats are just little people with Fur and Fangs" (afuna, cats) [personal profile] afuna - afuna
Answer (#7143)
Posted: Mon, 18 May 2009 17:23:31 GMT (844 weeks ago)
Dear Magus,

Thank you for all the trouble you went to, to make the test entries! Based on those, and a bit more investigation of your own, we've managed to figure out a few things:

Dreamwidth does handle encoding differently from LiveJournal. LiveJournal has to keep around a legacy mode, which detects Unicode entries posted from a non-Unicode-compliant client and flags them. Since Dreamwidth doesn't have to worry about non-Unicode data (no legacy data to adjust for, plus we've made some changes to how encoding is handled), we've gradually begun to rip out the legacy mode.

It's taking us a while to do this, since we'll have to be very careful and double-check each step, to make sure we're not breaking any site functionality in the process.

Unfortunately, this allowed for the edge case you ran into where, if you are posting Unicode text from an older client, which doesn't explicitly state it can handle Unicode, then a flag is set on the entry. Then, Dreamwidth handles the storage and display of the text correctly, but *believes* it can't, and prevents you from editing your entry.

The good news is that, since your toolchain is handling Unicode correctly, JLJ just needs to indicate to Dreamwidth that it's Unicode-compliant by setting a client version. This won't let you edit your existing entries (we'll work to get that fixed, as it's something we'll need to handle on our end), but will work for entries going forward.

If you're comfortable with Perl, you can do this by appending "&ver=1" to the $form that's posted to the server. If you're not comfortable enough with Perl to do so, it may be worth while to get in contact with the client author, and get them to update it for you.

(Incidentally, this will also fix the incorrect encoding for entries posted to LiveJournal)

Regards,
Afuna
marcmagus: Me playing cribbage in regency attire (regency cards) [personal profile] marcmagus - Magus
Comment (#7160)
Posted: Mon, 18 May 2009 19:15:48 GMT (844 weeks ago)
Excellent, thank you!

My patch is naive (just adds "&ver=1" to the form), but I suspect that isn't actually correct...I plan to take a look at the lj/dw api docs and such to confirm my suspicion that it's only correct to do this if actually in a Unicode locale, and figure out the correct way to do this. Any pointers would be appreciated.

[Once I have a "correct" patch I'll post it to my journal for others to use, as I don't believe the author is maintaining the client. I presume doing that and linking an FAQ entry to it would be appropriate, on the off chance someone else stumbles into this problem?]

Go to: previous open request, next open request
Return to the list of open requests.
Back to the Support Area.