“Minor issue with unicode + text trim in backend” – Issues – Discuss

Minor issue with unicode + text trim in backend

A bug in 2.1.1, submitted by Komb on 20 August 2010

Announcement

Symphony's issue tracker has been moved to Github.

Issues are displayed here for reference only and cannot be created or edited.

Browse

Closed#394: Minor issue with unicode + text trim in backend

- Komb
- Comment #1
- 20 Aug 10, 10:45 pm

Hello.

Not a biggie, but i’ve noticed, that, when trimming occurs on a unicode letter, it somehow gets corrupted. Observed this since 2.0.8 RC1 - don’t know, if older versions had this.

Screen attached. Last letter must have been ‘i’ with a lengthening dash.

This is only visual - data itself remains intact, but it gets ugly as such lists grow up.

Not a browser issue (tested) and not a server issue (it also happens on deployment servers - linux/apache). Localhost in a screenshot is Wamp 2.0 on Win7.

Attachments:: sym_unicode.jpg

- nickdunn
- Comment #2
- 21 Aug 10, 8:06 am

I have a feeling this is because Symphony uses standard string functions whereas multibyte string operations should ideally use the mbstring functions (http://php.net/manual/en/book.mbstring.php). The problem I think is that PHP needs to be specially compiled with mbstring functions (which aren’t included by default) so it can’t be relied on as standard.

http://getsymphony.com/discuss/thread/29858/#position-19

- michael-e
- Comment #3
- 28 Aug 10, 6:20 am

Agreed. We have been talking about mbstring functions more than once, and it seems that you can not rely on them being available. And so, to be honest, I don’t see any solution to this problem (which has always been there, and it probably is annoying in most languages, as it is in German).

- brendo
- Comment #4
- 04 Sep 10, 10:19 am

While it may not be relied upon all the time, I think it’d be worth using function_exists so that if mbstring functions are available, we can provide the best experience to the user, and if not, fallback to our current, not so bulletproof implementation.

- nickdunn
- Comment #5
- 21 Nov 10, 8:21 am

I came up with another solution the other day, where mbstring wasn’t available to me. Instead of trying to trim properly, I allow the incorrect trim to occur (thereby creating a broken character at the end of the string) but then remove any non UTF-8 characters from the string at the end thereby removing anything broken:

$string = preg_replace('/[^(x20-x7F)]*/','', $string);

Could that be a quick fix?

- brendo
- Comment #6
- 22 Dec 10, 4:25 pm

I had no such luck with that regex Nick. It seemed to remove all Unicode characters and leave behind nothing.

I did however fix this particular issue (#394) by using mb_substr if available so that the Field->prepareTableValue would not cut off multibyte characters.

If mb_substr is not available/installed in PHP, the behaviour will be as it is described by this bug.

This issue is closed.

Symphony.