Talk:Main Page: Difference between revisions
|  (Spam.) | No edit summary | ||
| Line 101: | Line 101: | ||
| Maintenance on the database has been completed, and it should be safe to enter new multibyte data now. --[[User:JeffLane|JeffLane]] 17:29, 25 Nov 2005 (PST) | Maintenance on the database has been completed, and it should be safe to enter new multibyte data now. --[[User:JeffLane|JeffLane]] 17:29, 25 Nov 2005 (PST) | ||
| <A HREF='http://www.the-cloak.com/'>anon</A | |||
| === Repairing damaged articles === | === Repairing damaged articles === | ||
Revision as of 15:43, 22 December 2005
Archive 1: 19:28, 28 Nov 2005 (PST)
Welcome to the discussion page for the Main Page! To start a new section, click on the "+" tab. To respond in an existing section, click on the "edit" link to the right of the section title. Please don't forget to sign your comments, using either three tildes (~~~) for your name, or four tildes (~~~~) for your name and a timestamp.
News discussion page?
I expect not being the only one wanting to discuss a few news posts on the main page, and I don't think we have a standardised page for that yet.
I'd appreciate if the source code part of the new SDK could be made available for the Linux Steam client, too. I'm currently funneling my efforts into making the HL2 code compile happily with GCC 4 without -fpermissive and with -Wall -Werror, and I find it a bit painful having to reboot to Windows, let Steam update, run the wizard to extract the code, reboot and transfer the stuff onto my Linux partition. I don't mind this being "restricted content" to e.g. only HL2 buyers; I have a registered copy of HL2 and the readiness to supply my username and password to the steam binary.
I might send an e-mail to [hlcoders] quite soon, asking for this explicitly... just wanted to know if any DevCommunity members have already tried and were denied the wish, or if anybody else thinks this would be a good idea. Thanks a lot. ~~Ravu 08:09, 14 Nov 2005 (PST)
Multibyte character(especially Japanese & Chinese) contents corruption by the Wiki update
After the wiki update, all previous contents in Japanese got corrupted, near impossible to read parts from parts. This includes Greg's test contents.
It seems to me it's a charcode issue. It could be database content itself(MySQL setting?), or input/output charcode conversions(php setting and MediaWiki script). Please fix it or we won't be able to continue our translation project - n-neko 00:34, 19 Nov 2005 (PST)
Also I'm not 100% sure(I can read Chinese characters because they are also used in Japanese, but can't read/speak the language Chinese), but Chinese document also looks something wrong compared with google cache original version. Some characters have been changed, making less sense. - n-neko 01:24, 19 Nov 2005 (PST)
It was probably a wiki charset setting that changed. I saw this on another wiki, even though it was still English, things got messed up. You'll probably have to go rewrite the articles.. :( --AndrewNeo 08:26, 19 Nov 2005 (PST)
- That would be too much work... Well, I used my page to test Japanese contents addition. It looks ok.
- So I think there was a screw up while update process and database content corrupted. I feel really depressed... I really hope there is an working database content backup... - n-neko 08:47, 19 Nov 2005 (PST)
Update, Water Shader indicates ”character was also got corrupted. -n-neko 06:40, 21 Nov 2005 (PST)
- Don't get depressed yet. If Google cached everything, we can make an automated (or semi-automated) solution. —Maven (talk) 12:48, 21 Nov 2005 (PST)
- Wait... what update? —Maven (talk) 12:53, 21 Nov 2005 (PST)
- If I've got time tomorrow I'll have a closer look to see if there's actual data been lost, or if it's just got mangled into another form. I know a little about UTF-8 and other character encoding systems so if the information is still there, it should be possible to retrieve it in a semi-automated manner... --Cargo Cult (info, talk) 16:40, 21 Nov 2005 (PST)
 
Here's a sample of what's different, from GregCoomer's Talk page. (Google cache here)
(partial hex dump removed; see below instead)
That's for text  2005年8月6日「MMBBオンラインゲームフェスタ」開催! (bad version: 2005年8月6日「MMBBオンラインゲームフェスタ�?開催�?—notice the very end)
I haven't checked meta tags and Content-Type stuff yet. I don't know enough about the matter to be useful except perhaps to implement a solution that someone else suggests. —Maven (talk) 16:48, 21 Nov 2005 (PST)
Both versions have the same meta http-equiv="Content-Type" in the HTML header. As far as I know, that overrides the Content-Type given by the server (although I'm not sure about that). —Maven (talk) 16:58, 21 Nov 2005 (PST)
- Oops, the corrupted version has Content-Type "text/html; charset=utf-8", with "utf-8" in lowercase, whereas the cached version has it in all caps. Probably doesn't make a difference, but with browsers being as they are, one never knows. —Maven (talk) 17:02, 21 Nov 2005 (PST)
- Not sure how you're comparing UTF-16 strings - you can't copy-and-paste from a corrupted UTF-8 stream! ;-)
 
- I did some lower-level comparisons, and it looks like there's actually been information loss occurring. Here's a fragment of an original from Google's cache, separated into bytes and characters:
 
- (3c) (75) (6c) (3e) (3c) (6c) (69) (3e) (20) (e9,80,9a) (e6,b0,97) (e5,8f,a3) (e3,81,af) (e6,8e,a7) (e3,81,88) (e3,82,81) (e3,81, etc)
 
- Here's what the Wiki's currently sending. Different bytes are marked in bold.
 
- (3c) (75) (6c) (3e) (3c) (6c) (69) (3e) (20) (e9,80,9a) (e6,b0,97) (e5,3f,a3) (e3,3f,af) (e6,8e,a7) (e3,3f,88) (e3,82,3f) (e3,3f, etc)
 
- There's both an 81and an8fbeing turned into a3f- there's definitely information gone missing. Coincidentally,3fcorresponds to a question mark in ASCII. I suspect something's tried converting to Windows-1252 because in my comparison paragraph, all the 'forbidden' bytes (marked in green on the chart) have been squished...
 
- There's both an 
- Oh dear... I need to grab Google caches of all documents before it's too late... -n-neko 04:53, 22 Nov 2005 (PST)
 
 
- I had been wondering why what's supposed to be UTF-8 was coming out UCS-2. I figured that it was a case of things being intentionally mislabeled in order to workaround the faults of some non-compliant browser. Anyway, I didn't copy-paste—I saved both webpages and opened them in a hex editor. I figured that would preserve the Unicode, but I guess there was some conversion during the HTML save. —Maven (talk) 05:57, 22 Nov 2005 (PST)
 
 
Well, backups are the best solution. Failing that, the only possible solution I can think of is to use digram (and trigram, if available) frequency analysis to reconstruct the corrupted bytes. I have some old cryptanalysis code that can be adapted for that purpose with the addition of a UTF-8/UCS-2 converter. However, I certainly don't have digram data for Japanese, and I've never actually tried frequency analyses on such a large character set, so I don't know what theoretical reliability we'd get. If there are fewer Chinese-language articles (I think there are?) then they can be corrected by hand. In any case, this is a last resort. —Maven (talk) 06:30, 22 Nov 2005 (PST)
- Can't find the data I'd need, so I'd have to generate it. If we have to do this, I'll need some help from someone familiar with Japanese-language writing. —Maven (talk) 06:45, 22 Nov 2005 (PST)
- I do keep many(not all, though) of original translation data in raw text. Also I've grabbed uncorrupted google caches. They can be used as references when backup doesn't work... -n-neko 09:20, 22 Nov 2005 (PST)
 
- We have backups of the database before the upgrade, and one of our system administrators in analyzing the issue. My current understanding is that it's not a simple restore from backup, or it would have already have been done. --JeffLane 08:46, 22 Nov 2005 (PST)
- Thanks for looking into this issue, Jeff. My fear is that the backup data itself could have got corrupted through database/php/whatever backup app's charset configuration. I hope this is not the case... -n-neko 09:20, 22 Nov 2005 (PST)
- Jeff, do you think that you can fix this before Dec 2? On that day Robin Walker will speak at Ritsumei Univ, also HL2 GOTY package will be released in Japan. If you don't think so, I'll start manual copy and paste stub articles...The document state right now is miserable... -n-neko 18:44, 24 Nov 2005 (PST)
 
 
- Thanks for looking into this issue, Jeff. My fear is that the backup data itself could have got corrupted through database/php/whatever backup app's charset configuration. I hope this is not the case... -n-neko 09:20, 22 Nov 2005 (PST)
Important update regarding multibyte language data
Unfortunately, what n-neko has suggested appears to be what has occurred. Due to recent changes in language parameter options in mysqldump between MySQL 4.0 and 4.1, the backups we have been creating also do not have the correct character conversions, so the data has been irrecoverably lost. You can read more about this issue with MySQL and MediaWiki databases here.
We are correcting our backup procedures to prevent this from happening again in the future.
For now, please hold off on correcting or adding the multibyte character articles. Multibyte languages should work correctly now, but we are going to take the server offline and perform some tests to make sure the data is being backed up correctly. This process should be completed during the downtime later today. We'll post another update here when it is all clear to post multibyte data again.
We apologize for any inconvenience this unfortunate loss of data has caused. --JeffLane 14:42, 25 Nov 2005 (PST)
- Hi, Jeff. Thanks for making the situation clear and preventing future corruptions.
- I decided to set up a mirror site for Japanese documents on my server using google cache. Fortunately I could secure most of the articles(before the data loss) in html, and edits went well so far(I hope). So here is less inconvenience for Japanese readers...
- The site calls VDC for its images, so I should have asked your permissions first...But please forgive me for setting it up for now as a temporary solution. As soon as I find an automated way converting these secured html files back into Wiki codes(and posting them back), then I'll stop the mirror if you don't want it continue. -n-neko 02:41, 26 Nov 2005 (PST)
- I've converted two articles over using copy-and-paste, just to see how much work would be involved in doing it manually - it's s rather dull task, but doesn't need any knowledge of Japanese (fortunately!). I think something which converts the HTML backup to Wiki-format with a simple look-up-table might prove simpler - I'll have a go at writing such a thing if you like. —Cargo Cult (info, talk) 07:34, 26 Nov 2005 (PST)
 
- Great! Your Copy&Paste edits look perfect:) HTML backup to wiki converter will be really helpful. Thanks in advance -n-neko 09:00, 26 Nov 2005 (PST)
 
 
- Yes, that's fine with the images for now, good work. We'll let you know if that changes.
- This may be of assistance: http://diberri.dyndns.org/html2wiki.html --JeffLane 08:47, 26 Nov 2005 (PST)
 
 
- Thanks Jeff. Let me know when it becomes inappropriate, then I'll remove or replace image links. -n-neko 09:00, 26 Nov 2005 (PST)
 
 
 
All Clear
Maintenance on the database has been completed, and it should be safe to enter new multibyte data now. --JeffLane 17:29, 25 Nov 2005 (PST) <A HREF='http://www.the-cloak.com/'>anon</A
Repairing damaged articles
Apparently more than just multibyte character articles were messed up in the database upgrade. Searching Google for a damaged character comes up with many results. We should task-force fixing as many of these as we can. --AndrewNeo 18:37, 28 Nov 2005 (PST)
Cleaning Main Page Talk
This page now has a warning on the edit page. I recommend that we archive older discussion (at this point, probably everything before the multibyte issue) in this page once it reaches a certain length (like now). I recommend something like Talk:Main Page/Archive 1 .. Talk:Main Page/Archive 2 format for every time we archive the page. --AndrewNeo 18:37, 28 Nov 2005 (PST)