Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Supported by

UnicodeEncodeError after using ü in loop variable

edited February 2016 in OpenSesame

Hi

I stumbled upon a problem with character encoding, when I want to concatenate strings to following way inside a loop-sequence block:
sendMessage(slideChangeEvent('Word_' + var.word))

It results in the following error message:

Error while executing inline script

item-stack: experiment[run].word_loop[run].word_sequence[run].sendMessageWord[run]
exception type: UnicodeEncodeError
exception message: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)
item: sendMessageWord
time: Mon Feb 01 15:24:05 2016
phase: run

Traceback:
File "dist\libopensesame\inline_script.py", line 102, in run
File "dist\libopensesame\python_workspace.py", line 159, in _exec
File "", line 2, in
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 1: ordinal not in range(128)

These are the possible values of var.word:

  • glücklich
  • wütend
  • traurig
  • erstaunt
  • angeekelt
    Of course the problem only arises with "glücklich" and "wütend".

How can i solve this problem?
Thanks for your suggestions.

Cheers,
Stefan

Comments

  • edited 1:52AM

    Hi Stefan,

    I suspect that this bit:

    'Word_' + var.word
    

    ... works fine, and returns a unicode object (e.g. u'Word_glücklich'). You can try this by just doing the concatenation on it's own, not embedded in a larger command.

    So the question is: What is slideChangeEvent or sendMessage doing that triggers a UnicodeEncodeError? Without seeing the full code I cannot tell!

    Cheers,
    Sebastiaan

  • edited February 2016

    Hi Sebastiaan

    You're right of course, a simple print('word' + var.word) is working fine.

    sendMessage()takes the string provided by slideChangeEvent() to send a message through a TCP socket to the external API of iMotions where we capture the people's facial responses together with markers sent from OpenSesame.

    The relevant code is the following:

    import socket
    
    ### Some global settings/variables used
    lnbr = '\r\n'
    IP = "127.0.0.1"
    EAPI_PORT = 8089        # iMotions external API
    CTRL_PORT = 8087        # iMotions Remote Control
    # iMotions parameters
    studyName = 'GamblingGame'  
    subjectName = str(var.subject_nr)
    
    # setup sockets and connect
    sockExtAPI = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sockExtAPI.connect((IP, EAPI_PORT))
    sockCTRL = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    sockCTRL.connect((IP, CTRL_PORT))
    
    # send external API message (mouseEvent or slideChangeEvent)    
    def sendMessage(message):
        sockExtAPI.send(message)
        #response = sockCTRL.recv(4092)
        log.write('ExtAPI message sent: ' + message)
    
    def slideChangeEvent(slideID):
        # discrete header
        # version 2
        header = 'M;2;'
        # field 5: slideID
        # field 7: marker type N
        # (marks the start of the next segment, automatically closing any currently
        # open segment.)
        event = ';;' + slideID + ';;N;I'
        return header + event + lnbr
    

    Edit: nevermind my previous edit

  • edited 1:52AM

    I suspect the bad guy is here:

    sockExtAPI.send(message)
    

    In your case, message is a unicode object, which will be automatically converted to a str object. Kind of like this:

    _message = str(message)
    sockExtAPI.send(_message)
    

    And this goes wrong, because the default is assumed is ascii, which doesn't contain special characters. To make this unicode-safe, you need to explicitly say which character encoding you want to use. For example:

    _message = message.encode('utf-8')
    sockExtAPI.send(_message)
    

    This will send the message as a utf-8 encoded bytestring. Whether the receiver will take kindly to that is, of course, another matter.

    Cheers,
    Sebastiaan

  • edited 1:52AM

    sockExtAPI.send(message.encode('utf-8')) did the trick at least inside OpenSesame. However, write.log(message.encode('uft-8'))gives this funny mixture of symbols, same in the exported data from iMotions.

    Hm, looks like we will need to encode the stimuli differently.
    But thank you very much for your patience. I'm far from being experienced in python... :-)

    Cheers,
    Stefan

  • edited 1:52AM

    However, write.log(message.encode('uft-8'))gives this funny mixture of symbols, same in the exported data from iMotions.

    The (very common) mistake that you're making is thinking that the problem lies in how the file is written, whereas it lies in how you're reading it. The OpenSesame log file is fine, but it's utf-8 encoded. If it looks funny, this is because the text editor/ spreadsheet has used the wrong encoding to read it, and you'll have to explicitly tell it to use utf-8.

    See also:

    (I should write a blog about character encoding one of these days. It's really one of the main issues that people struggle with.)

  • edited April 2021

    On Windows, many editors assume the default ANSI encoding (CP1252 on US Windows) instead of UTF-8 if there is no byte order mark (BOM) character at the start of the file. Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. read_csv takes an encoding option to deal with files in different formats. So, you have to specify an encoding, such as utf-8.

     df.to_csv('D:\panda.csv',sep='\t',encoding='utf-8')

    If you don't specify an encoding, then the encoding used by df.tocsv defaults to ascii in Python2, or utf-8 in Python3.

    Also, you can encode a problematic series first then decode it back to utf-8.

    df['column-name'] = df['column-name'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))

    This will also rectify the problem.

  • Thanks so much for your contribution @carlhyde !

    Did you like my answer? Feel free to Buy Me A Coffee :)

Sign In or Register to comment.