UnicodeEncodeError after using ü in loop variable
Hi
I stumbled upon a problem with character encoding, when I want to concatenate strings to following way inside a loop-sequence block:
sendMessage(slideChangeEvent('Word_' + var.word))
It results in the following error message:
Error while executing inline script
item-stack: experiment[run].word_loop[run].word_sequence[run].sendMessageWord[run]
exception type: UnicodeEncodeError
exception message: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)
item: sendMessageWord
time: Mon Feb 01 15:24:05 2016
phase: runTraceback:
File "dist\libopensesame\inline_script.py", line 102, in run
File "dist\libopensesame\python_workspace.py", line 159, in _exec
File "", line 2, in
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)
These are the possible values of var.word:
- glücklich
- wütend
- traurig
- erstaunt
- angeekelt
Of course the problem only arises with "glücklich" and "wütend".
How can i solve this problem?
Thanks for your suggestions.
Cheers,
Stefan
Comments
Hi Stefan,
I suspect that this bit:
... works fine, and returns a
unicodeobject (e.g.u'Word_glücklich'). You can try this by just doing the concatenation on it's own, not embedded in a larger command.So the question is: What is
slideChangeEventorsendMessagedoing that triggers aUnicodeEncodeError? Without seeing the full code I cannot tell!Cheers,
Sebastiaan
Check out SigmundAI.eu for our OpenSesame AI assistant!
Hi Sebastiaan
You're right of course, a simple
print('word' + var.word)is working fine.sendMessage()takes the string provided byslideChangeEvent()to send a message through a TCP socket to the external API of iMotions where we capture the people's facial responses together with markers sent from OpenSesame.The relevant code is the following:
Edit: nevermind my previous edit
I suspect the bad guy is here:
In your case,
messageis aunicodeobject, which will be automatically converted to astrobject. Kind of like this:And this goes wrong, because the default is assumed is
ascii, which doesn't contain special characters. To make this unicode-safe, you need to explicitly say which character encoding you want to use. For example:This will send the message as a utf-8 encoded bytestring. Whether the receiver will take kindly to that is, of course, another matter.
Cheers,
Sebastiaan
Check out SigmundAI.eu for our OpenSesame AI assistant!
sockExtAPI.send(message.encode('utf-8'))did the trick at least inside OpenSesame. However,write.log(message.encode('uft-8'))gives this funny mixture of symbols, same in the exported data from iMotions.Hm, looks like we will need to encode the stimuli differently.
But thank you very much for your patience. I'm far from being experienced in python... :-)
Cheers,
Stefan
The (very common) mistake that you're making is thinking that the problem lies in how the file is written, whereas it lies in how you're reading it. The OpenSesame log file is fine, but it's utf-8 encoded. If it looks funny, this is because the text editor/ spreadsheet has used the wrong encoding to read it, and you'll have to explicitly tell it to use utf-8.
See also:
(I should write a blog about character encoding one of these days. It's really one of the main issues that people struggle with.)
Check out SigmundAI.eu for our OpenSesame AI assistant!
On Windows, many editors assume the default ANSI encoding (CP1252 on US Windows) instead of UTF-8 if there is no byte order mark (BOM) character at the start of the file. Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. read_csv takes an encoding option to deal with files in different formats. So, you have to specify an encoding, such as utf-8.
df.to_csv('D:\panda.csv',sep='\t',encoding='utf-8')If you don't specify an encoding, then the encoding used by df.tocsv defaults to ascii in Python2, or utf-8 in Python3.
Also, you can encode a problematic series first then decode it back to utf-8.
df['column-name'] = df['column-name'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))This will also rectify the problem.
Thanks so much for your contribution @carlhyde !
Did you like my answer? Feel free to
