[solved] Combining characters in Hindi text

eamccullough · June 2012

I'm trying to display Hindi text in OpenSesame (0.26), and most of it is working, except for the "i" character. This is a combining character that is inserted to the right of some consonant, but displays to the left. It appears properly (to the left of the consonant) in the lists within OpenSesame, but when actually running the script and displaying stimuli, it appears to the right of the consonant. Is there some way to deal with this?

sebastiaan · June 2012

Hi Eamccullough,

You've already managed to get further than I have! Could you explain in more detail the steps that you have taken so far to show the Hindi text (i.e. how are you showing the text? What font are you using?).

Alternatively, a quick and dirty way to show text is of course to create bitmaps for each sentence. But depending on the experiment, this may not be a convenient solution of course.

Cheers,
Sebastiaan

eamccullough · June 2012

The bitmap route is my backup plan, but I figured I'd at least ask about this first!

I've followed the instructions given for non-western alphabets here:
http://osdoc.cogsci.nl/miscellaneous/non-western-alphabets
The Hindi text exists as entries in a list, and is shown on a sketchpad. I've tried four different fonts (mostly Mangal, but also Aparajita, Kokila, and Utsaah), and the same problem occurs with all of them. I have been populating the lists (where everything appears correctly) by pasting from an Excel spreadsheet (where everything appears correctly). Almost everything appears correctly when running the script, save for this one character.

sebastiaan · June 2012

I see. I took me a while to understand what was happening here, but the basic problem is that the i is not automatically put before the preceding character. The Unicode notation for the i sign is U+093F, so

U+0915U+093F

should become

U+093FU+0915

This doesn't happen automatically, but if you open the inline_script and reverse the Unicode yourself it should work. Does this make sense?

Alternatively, you can automatically fix the text with a regular expression. As far as I can tell, the code below flips the i's to make it work. But since I don't understand Hindi I can't be sure!

You need to insert this code to the prepare phase of an inline_script at the start of a trial. Adjust src_var to match the name of the variable that you want to fix.

import re
src_var = 'text'
s = self.get(src_var)
# A horrible regular expression to circumvent that inline
# scripts dont't deal well with backslashes
s = re.sub(r'U'+chr(92)+'+('+chr(92)+'w'+chr(92)+'w'+chr(92)+'w'+chr(92)+'w)U'+chr(92)+'+093F',
    'U+'+'093FU+'+chr(92)+'1', s)
exp.set(src_var, s)

Hope this helps!

eamccullough · June 2012

Thanks! I used the regular expression approach, and it worked quite well, in that the script runs without issue and everything displays properly. I do get an error message when I click on the inline_script object after creating it, though:

invalid literal for int() with base 16: "'+'0" (Edit the script to fix this)

I'm not sure what this means, but I'm sharing it mainly for posterity, as my immediate issue has been solved. Thanks again!

sebastiaan · June 2012

Ah, thanks for pointing that out! It's a bug in the way that the Unicode notation (U+1234) is parsed, but it should be easily fixed.

Good to hear that the script is working for you.

Howdy, Stranger!

Categories

[solved] Combining characters in Hindi text

Comments

Howdy, Stranger!

Quick Links

Categories

[solved] Combining characters in Hindi text

Comments