Does var automatically decode in UTF-8?
I have encountered some strange behavior when reading and decoding text files and using var. When using
var.variable instead of
variable, decoding with UTF-8 produces an
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)
I have reproduced this using a txt file containing only the word 'text' and the following inline script in OpenSesame using xpyriment backend
import string #this works with open(pool[u'text.txt']) as file: myString = file.read() var.myString=myString.decode(u'UTF-8-sig') print(var.myString) #this does not work with open(pool[u'text.txt']) as file: var.myString = file.read() var.myString=var.myString.decode(u'UTF-8-sig') #error is here print(var.myString) #some testing without line 10 print(myString) #works #print(myString.encode(u'UTF-8')) #error print(var.myString) #works print(var.myString.encode(u'UTF-8')) #works
The error message indicates that the error is located in line 10.
And indeed, this works without line 10. Moreover myString cannot be encoded again using UTF-8, whereas var.myString can be. Without line 10, the script also works when adding non-ascii characters to the txt file. This seems to indicate that reading the file to var.myString automatically decodes to UTF-8.
Is there some automatic decoding/encoding when using var and is this intended? And which way should be used in a program?