Ticket #590 (closed bug: fixed)

Opened 3 years ago

Last modified 2 years ago

Character problems with cyrilics

Reported by: neoen Owned by: omry
Priority: Normal Milestone: 1.5
Component: FireStats Version: 1.4
Severity: Normal Keywords:
Cc:

Description

I have some examples when referrers are not displayed correctly in Firestats hits (IE7), search term in URL is missing:

1) http://go.mail.ru/search?&q=%CC%E0%F0%F3%F1%E5%E2+%E4%EC%E8%F2%F0%E8%E9&no_morph=n&num=10&sf=10

You can see it on attached Cyrilics_problem.png

2) http://www.yandex.ru/yandsearch?text=%ED%E0%E9%F2%E8+%F7%F2%EE+%F2%E0%EA%EE%E5+%EA%F0%EE%E2%ED%E0%FF+%EC%E5%F1%F2%FC

You can see it on attached Cyrilics_problem_2.png

3) http://www.rambler.ru/srch?words=%E2%EE%E9%ED%E0+%F0%EE%F1%F1%E8%E9%F1%EA%E8%E5+%F1%EE%EB%E4%E0%F2%FB&old_q=www.%F1%EE%EB%E4%E0%F2%FB+%F0%EE%F1%F1%E8%E9.ru&btnG=%CD%E0%E9%F2%E8%21

When are these hits added to database, then Firestast stop showing older hits, even if there is set to show 200 hits, only 15 hits is displayed and the last is one of example above.

It is interesting, but for example following works well: http://www.yandex.ru/yandsearch?&p=3&text=%D1%80%D0%B0%D1%81%D1%81%D1%82%D1%80%D0%B5%D0%BB%20%D1%84%D0%B5%D0%B4%D0%B5%D1%80%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D1%85%20%D1%81%D0%B8%D0%BB%20%D1%87%D0%B5%D1%87%D0%B5%D0%BD%D1%81%D0%BA%D0%B8%D0%BC%D0%B8%20%D0%B1%D0%BE%D0%B5%D0%B2%D0%B8%D0%BA%D0%B0%D0%BC%D0%B8

Attachments

Cyrilics_problem.PNG (4.2 kB) - added by neoen 3 years ago.
Cyrilics_problem_2.PNG (4.3 kB) - added by neoen 3 years ago.

Change History

Changed 3 years ago by neoen

Changed 3 years ago by neoen

Changed 2 years ago by Daemony

So, those trouble in Firefox too. And this is not a browsers problem. Any ideas?

Changed 2 years ago by omry

  • milestone set to 1.5

Changed 2 years ago by omry

  • status changed from new to closed
  • resolution set to fixed

All the urls except the last one are encoded in windows-1251. the last one is encoded in utf8. I fixed FireStats to convert windows-1251 to utf8 for those urls, but it will break the last url which is already in utf8.

there is nothing more I can do, you might want to contact the search engine developers and ask them to make up their mind about what encoding they use.

Fixed in 1.5.

Changed 2 years ago by omry

Note: converting to utf-8 only works for urls that are detected as a search engine url. the encoding to use is defined in the search engine definition in searchengines.php. currently only windows 1251 is supported.

Changed 2 years ago by anonymous

Why not just check the url, if it is really utf-8 encoded and if it is not - do the conversion?

Changed 2 years ago by omry

because guessing encodings never works.

Changed 2 years ago by anonymous

Who said that? You? I fixed it for me, it works. You should also correct the "Recalculate search engine terms", after pressing that all will crash to cyrillic and so on.

Changed 2 years ago by omry

Joel Spolsky said that. look for "No Such Thing As Plain Text" in the article.

Add/Change #590 (Character problems with cyrilics)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.