Minor issue with unicode + text trim in backend
A bug in 2.1.1, submitted by Komb on 20 August 2010
Announcement
Symphony's issue tracker has been moved to Github.
Issues are displayed here for reference only and cannot be created or edited.
Browse
Closed#394: Minor issue with unicode + text trim in backend
I have a feeling this is because Symphony uses standard string functions whereas multibyte string operations should ideally use the mbstring
functions (http://php.net/manual/en/book.mbstring.php). The problem I think is that PHP needs to be specially compiled with mbstring functions (which aren’t included by default) so it can’t be relied on as standard.
Agreed. We have been talking about mbstring functions more than once, and it seems that you can not rely on them being available. And so, to be honest, I don’t see any solution to this problem (which has always been there, and it probably is annoying in most languages, as it is in German).
While it may not be relied upon all the time, I think it’d be worth using function_exists
so that if mbstring
functions are available, we can provide the best experience to the user, and if not, fallback to our current, not so bulletproof implementation.
I came up with another solution the other day, where mbstring
wasn’t available to me. Instead of trying to trim properly, I allow the incorrect trim to occur (thereby creating a broken character at the end of the string) but then remove any non UTF-8 characters from the string at the end thereby removing anything broken:
$string = preg_replace('/[^(x20-x7F)]*/','', $string);
Could that be a quick fix?
I had no such luck with that regex Nick. It seemed to remove all Unicode characters and leave behind nothing.
I did however fix this particular issue (#394) by using mb_substr
if available so that the Field->prepareTableValue
would not cut off multibyte characters.
If mb_substr
is not available/installed in PHP, the behaviour will be as it is described by this bug.
This issue is closed.
Hello.
Not a biggie, but i’ve noticed, that, when trimming occurs on a unicode letter, it somehow gets corrupted. Observed this since 2.0.8 RC1 - don’t know, if older versions had this.
Screen attached. Last letter must have been ‘i’ with a lengthening dash.
This is only visual - data itself remains intact, but it gets ugly as such lists grow up.
Not a browser issue (tested) and not a server issue (it also happens on deployment servers - linux/apache). Localhost in a screenshot is Wamp 2.0 on Win7.