Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Supported by

Does var automatically decode in UTF-8?

LeoLeo
edited January 15 in OpenSesame

Hi,


I have encountered some strange behavior when reading and decoding text files and using var. When using

var.variable instead of variable, decoding with UTF-8 produces an

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

I have reproduced this using a txt file containing only the word 'text' and the following inline script in OpenSesame using xpyriment backend

import string
#this works
with open(pool[u'text.txt']) as file:
   myString = file.read()
var.myString=myString.decode(u'UTF-8-sig')
print(var.myString)
#this does not work
with open(pool[u'text.txt']) as file:
   var.myString = file.read()
var.myString=var.myString.decode(u'UTF-8-sig') #error is here
print(var.myString)

#some testing without line 10
print(myString) #works
#print(myString.encode(u'UTF-8')) #error
print(var.myString) #works
print(var.myString.encode(u'UTF-8')) #works

The error message indicates that the error is located in line 10.

And indeed, this works without line 10. Moreover myString cannot be encoded again using UTF-8, whereas var.myString can be. Without line 10, the script also works when adding non-ascii characters to the txt file. This seems to indicate that reading the file to var.myString automatically decodes to UTF-8.


Is there some automatic decoding/encoding when using var and is this intended? And which way should be used in a program?


Best

Leo

Comments

  • Hi Leo,

    Yes, you're correct. In Python 2, the var_store automatically decodes str objects to unicode objects. So this:

    var.myString.decode(u'UTF-8-sig')
    

    Is calling decode() on a unicode object. What happens in that case is a bit strange. Python 2 will automatically call encode() to first create a str object, assuming ascii encoding, and only then call decode() . And the encode step is where it goes wrong.

    In any case, it's not necessary to decode, because everything is already unicode !

    Cheers!

    Sebastiaan

    Buy Me A Coffee

  • LeoLeo
    edited January 16

    Hi Sebastiaan,


    Thank you, that explains a lot. But it seems var always converts using UTF-8, so I guess it's a little safer to use manual decoding (since sadly some applications automatically use UTF-8-BOM). See e.g. this example using a UTF-8-BOM encoded file. The second part also doesn't work for ANSI-encoded files and the error message suggests that the decoding of varis always performed using UTF-8.

    import string
    #reading to variable
    with open(pool[u'text.txt']) as file:
      myString = file.read()  
    var.myString=myString.decode(u'UTF-8')
    print(var.myString[0]==u't') #false
    var.myString=myString.decode(u'UTF-8-sig')
    print(var.myString[0]==u't') #true
    #reading to var.variable
    with open(pool[u'text.txt']) as file:
      var.myString = file.read()
    print(var.myString[0]==u't') #false
    

    Error message using ANSI:

    UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 4: invalid continuation byte
    

    Best

    Leo

  • Hi Leo,

    Thanks for bringing this to my attention, because this definitely should be clarified in the documentation. But yes, it's as you say: the var object automatically decodes bytecode strings to unicode , assuming utf-8 encoding.

    Cheers!

    Sebastiaan

    Buy Me A Coffee

Sign In or Register to comment.

agen judi bola , sportbook, casino, togel, number game, singapore, tangkas, basket, slot, poker, dominoqq, agen bola. Semua permainan bisa dimainkan hanya dengan 1 ID. minimal deposit 50.000 ,- bonus cashback hingga 10% , diskon togel hingga 66% bisa bermain di android dan IOS kapanpun dan dimana pun. poker , bandarq , aduq, domino qq , dominobet. Semua permainan bisa dimainkan hanya dengan 1 ID. minimal deposit 10.000 ,- bonus turnover 0.5% dan bonus referral 20%. Bonus - bonus yang dihadirkan bisa terbilang cukup tinggi dan memuaskan, anda hanya perlu memasang pada situs yang memberikan bursa pasaran terbaik yaitu http://45.77.173.118/ Bola168. Situs penyedia segala jenis permainan poker online kini semakin banyak ditemukan di Internet, salah satunya TahunQQ merupakan situs Agen Judi Domino66 Dan BandarQ Terpercaya yang mampu memberikan banyak provit bagi bettornya. Permainan Yang Di Sediakan Dewi365 Juga sangat banyak Dan menarik dan Peluang untuk memenangkan Taruhan Judi online ini juga sangat mudah . Mainkan Segera Taruhan Sportbook anda bersama Agen Judi Bola Bersama Dewi365 Kemenangan Anda Berapa pun akan Terbayarkan. Tersedia 9 macam permainan seru yang bisa kamu mainkan hanya di dalam 1 ID saja. Permainan seru yang tersedia seperti Poker, Domino QQ Dan juga BandarQ Online. Semuanya tersedia lengkap hanya di ABGQQ. Situs ABGQQ sangat mudah dimenangkan, kamu juga akan mendapatkan mega bonus dan setiap pemain berhak mendapatkan cashback mingguan. ABGQQ juga telah diakui sebagai Bandar Domino Online yang menjamin sistem FAIR PLAY disetiap permainan yang bisa dimainkan dengan deposit minimal hanya Rp.25.000. DEWI365 adalah Bandar Judi Bola Terpercaya & resmi dan terpercaya di indonesia. Situs judi bola ini menyediakan fasilitas bagi anda untuk dapat bermain memainkan permainan judi bola. Didalam situs ini memiliki berbagai permainan taruhan bola terlengkap seperti Sbobet, yang membuat DEWI365 menjadi situs judi bola terbaik dan terpercaya di Indonesia. Tentunya sebagai situs yang bertugas sebagai Bandar Poker Online pastinya akan berusaha untuk menjaga semua informasi dan keamanan yang terdapat di POKERQQ13. Kotakqq adalah situs Judi Poker Online Terpercayayang menyediakan 9 jenis permainan sakong online, dominoqq, domino99, bandarq, bandar ceme, aduq, poker online, bandar poker, balak66, perang baccarat, dan capsa susun. Dengan minimal deposit withdraw 15.000 Anda sudah bisa memainkan semua permaina pkv games di situs kami. Jackpot besar,Win rate tinggi, Fair play, PKV Games