Search

See my site - fantabulous.ru

??????? - it’s cyrilic text.
How to rectify an bug ?

Hi Axxo — Symphony definitely supports Cyrillic and Brahmic languages. Make sure that your MySQL server, the database, and all tables have character_set set to utf8 and collation set to utf8_general_ci.

It sounds like you have your database set up with another collation. I’ve found that sometimes MySQL bizarrely defaults to latin1/latin1_swedish_ci rather than utf8/ut8_general_ci.

Isn’t collation just the way MySQL sorts data on retrieval?

Isn’t collation just the way MySQL sorts data on retrieval?

If you say so, I have no idea. I always make sure both are set to be sure.

You definitely need character_set set to utf8. Setting collation accordingly seems at least useful.

to nickdunn
Yes. You are right. Thanks.

Шаша

Рад видить Русские использования Symphony CMS.

But ….. i add new category ” Новый раздел ” see fantabulous.ru Error loadXML(): Input is not proper UTF-8, indicate encoding ! Bytes: 0xD1 0x2D 0xD0 0xB0 in Entity, line: 18

mysql encoding utf8 general ci

  <item handle="Новый-�-аздел" id="10">Новый раздел</item>

http://7image.ru/images/5644497.gif

http://7image.ru/images/5646850.gif

Your two GIFs did not come through. Here they are for future reference: alt text alt text While I do clearly see the Новый раздел, I don’t see any character_set settings in these GIFs. But we’ll take your word for it that it’s set to UTF-8.

What sort of input field are you using on the back end to enter “новый раздел” (trans. “New Section”)?

The text itself seems to come through to the node value, but the creation of the handle gets tripped up in the Symphony system.

http://www.youtube.com/watch?v=sMlPGfwysPg

@Axxx It seems to be a specific issue with the lowercase “р” (r in English) and the PHP process that takes the text that is entered and makes the “handles”. In your first example (Новый раздел), the “р” at the beginning of the second word (раздел) causes the errant handle. Second, in the YouTube video you entered “Супер Новый Раздел (Super New Section)” and the handle is rendered “Супе�-Новый-Раздел-super-new-section”.

Two Observations

  1. The capital Cyrillic letters are not being rendered in lowercase in the handle.

  2. In the second example @Axxx gives, the capital “Р” in Раздел no longer causes the problem but the lowercase “р” in Супер (Super) causes the issue. So it seems that maybe it’s just the particular lowercase “р” that’s causing the problem.

What’s Next?

I don’t know enough about the inner-workings of Symphony at this point to be of much further use to you. Perhaps one of the development team or those who are more intimate with the inner-workings of Symphony can chime in and elucidate us. Or point me to the class/file that drives the creation of handles and I can try and troubleshoot there.

FWIW I brought in @Axxx’s two GIFs above where we see screenshots of PHPmyAdmin.

I think, an error in formation “handles”.

http://www.youtube.com/watch?v=aAfU87OCXvU

Thank you, wjnielsen. Let’s wait for the answer of developers.

Interestingly, I tried both Новый раздел and Супер Новый Раздел (Super New Section). Both handles were correctly rendered.

<title handle="Новый-раздел">Новый раздел</title>

and

<title handle="упер-Новый-Раздел-super-new-section">упер Новый Раздел (Super New Section)</title>

I have a suspicion it might be to do with the DB saving process. It appears that your handles get truncated, but it might be happening when saving into the database rather than the handle creation function of Symphony, however I cannot be sure.

The difference between my local test copy and your copy is my DB is using utf8_unicode_ci instead of utf8_general_ci but I don’t think that should matter.

I’m not sure whether it’s relevant, but we’re building a multi-lingual site at the moment and a colleague discovered that string truncation functions (in PHP) were assuming that each character was usually represented one byte, but the more complex characters used several bytes. Therefore he had to modify the core, changing all strlen (and similar) functions to an alternative (which escapes me right now). Could be related?

Okay, found the problem. My Lang::createHandle() function is different. Open up lib/toolkit/class.lang.php and replace the function createHandle() with the following:

public static function createHandle($string, $max_length=255, $delim='-', $uriencode=false, $apply_transliteration=true, $additional_rule_set=NULL){

    ## Use the transliteration table if provided
    if($apply_transliteration) $string = _t($string);

    $max_length = intval($max_length);

    ## Strip out any tag
    $string = strip_tags($string);

    ## Remove punctuation
    $string = preg_replace('/[\.'"]+/', NULL, $string);   

    ## Trim it
    if($max_length != NULL && is_numeric($max_length)) $string = General::limitWords($string, $max_length);

    ## Replace spaces (tab, newline etc) with the delimiter
    $string = preg_replace('/[s]+/', $delim, $string);                 

    ## Find all legal characters
    preg_match_all('/[^<>?@:!-/[-`ëí;‘’]+/u', $string, $matches);

    ## Join only legal character with the $delim
    $string = implode($delim, $matches[0]);

    ## Allow for custom rules
    if(is_array($additional_rule_set) && !empty($additional_rule_set)){
        foreach($additional_rule_set as $rule => $replacement) $string = preg_replace($rule, $replacement, $string);
    }

    ## Remove leading or trailing delim characters
    $string = trim($string, $delim);

    ## Encode it for URI use
    if($uriencode) $string = urlencode($string);    

    ## Make it lowercase
    $string = strtolower($string);      

    return $string;

}

See if that helps. The main difference is the use of /u on the regular expression. It will attempt to match multi-byte unicode characters.

I can confirm that truncation in the backend (e.g. in an entry list) does not work correctly with multi-byte characters like German umlaut characters. Sometimes the effects are really strange…

Thanks Alistair. please edit line

## Remove punctuation
$string = preg_replace('/([\.'"]++)/', '', $string);  

Problem is solved. But I have found out other problem with seo-urls. I add new article with Cyrilic name (example see http://fantabulous.ru/ last articles)

 <title handle="">Статья11 (123)ваафу?*(*?(*:%:%:%(*)__)+__)</title>
    ## Encode it for URI use
if($uriencode) $string = urlencode($string);    

## Make it lowercase
$string = strtolower($string);  

These functions is not unicode.

http://ru.php.net/strtolower

http://ru.php.net/urlencode

Probably, because of these functions there are errors.

We should be using multi-byte string functions, however, it is a non-default extension http://au.php.net/manual/en/mbstring.installation.php, so requiring it would mean another Symphony installation requirement.

A couple of approaches might be 1) use the mbstring functions if they are available, defaulting to the normal string functions if they are not, or 2) allow the createHandle() function to be overloaded somehow by extensions.

Isn’t mbstring included in most PHP installations? So solution 1 sounds good to me. A remark about multibyte support requirements in the installation instructions would be cool, of course.

Create an account or sign in to comment.

Symphony • Open Source XSLT CMS

Server Requirements

  • PHP 5.3-5.6 or 7.0-7.3
  • PHP's LibXML module, with the XSLT extension enabled (--with-xsl)
  • MySQL 5.5 or above
  • An Apache or Litespeed webserver
  • Apache's mod_rewrite module or equivalent

Compatible Hosts

Sign in

Login details