accent and mask convertion : ANSI, utf-8 format ?
Hello everyone,
I'm editing a script OS and I have a problem.
In this task we present 4 French words followed by a mask.
The material is in a pool directory containing 4 .txt files.
The mask depends on the size of the words so that there are as many '#' as letters when masking:
mask = ''
mask = len(word1)*'#'+' '+len(word2)*'#'+' '+len(word3)*'#'+' '+len(word4)*'#'
exp.input_canvas.text(mask)
It works for words without accent.
For words with accents - this adds a character e.g. 'poussé' -> '#######'
So I made a list:
mask = ''
acc_word = ["poussé","....","...."]
if word1 in acc_word
mask = (len(word1)-1)*'#'+' '+len(word2)*'#'+' '+len(word3)*'#'+' '+len(word4)*'#'
[...]
That works well !
The problem lies on the first words of each file.
The words are displayed correctly but changing to the mask generates 3 '#' more
e.g., 'alan' -> '#######'
This only occurs for the words at the beginning of the file
Since there are 4 files it concerns 4 trials in the task.
So I wrote the material myself in a txt file in AINSI format
That works well !
The first word of the file is now correctly coded
e.g. 'alan' -> '####'
But now this message is displayed for words with accents
exception type: UnicodeDecodeError
exception message: 'utf8' codec can't
decode byte 0xea in position 15: invalid continuation byte
At last I tried csv files but that's the same message.
If anyone could help me ?
Thanks,
Comments
Salut Chris,
The devil lies in the details when it comes to character encoding. You mention that you read from a text file. To make things easy, I would ensure that the text file is saved in
utf-8
encoding. (If possible, also indicate that there should not be aBOM
[byte order mark], which often shows up as an extraneous invisible character at the start of the file.)Then, when you read in the text, convert everything to unicode as soon as possible and only then do stuff with it. The following script will do this in Python 2, which is what OpenSesame uses by default. (In Python 3, things are easier.)
So in general, that's the flow you want to use. That's also what OpenSesame will do for you if you use a
.csv
file as a source for the loop table.Cheers,
Sebastiaan
Check out SigmundAI.eu for our OpenSesame AI assistant!
And if you gaze long enough into an abyss, the abyss will gaze back into you.
Thank you Sebastiaan ! That was effectively a txt file in UTF-8 with BOM.
I downloaded a good convert editor which allowed to convert to UTF-8 and only this format.
Once that is done, the first word of each first line no longer three characters, but then
accented words was still coded with one more characters.
But it was clearer, I made a list with the accented words:
Now it works !
Thanks again
Christophe
Salut Sebastiaan,
First of all thank you for answering.
That was effectively text files saved in utf-8 with BOM (which adds 3 characters to the beginning of the file).
So I edited the text files in utf-8 without BOM with Sublime Text editor and I had no problem with these 3 characters at the beginning of the file.
However the problem of accented words persisted, but it was now easier, I created a list of accented words and I simply coded the masks :
It works now !!
Thank you again, it was a good clue
Chris
Hi Chris,
When defining literal text in an
inline_script
, it's best to defineunicode
strings directly by prefixing au
. Forunicode
strings, the length indicates the number of characters. Otherwise (as in your case) they will be bytecode strings by default, and the length will indicate the number of bytes, which doesn't need to match the number of characters!This, by the way, is only true for Python 2.
Cheers,
Sebastiaan
Check out SigmundAI.eu for our OpenSesame AI assistant!
Hi Sebastiaan,
The people who work on this subject report to me problems.
It may have something to do with your last message ?
it's the same message that was displayed when I tried your proposal with :
The program would stop displaying this message about empty list as if a file is empty.
When I tried myself error display is : " Python seems to have crashed. This should not happen. If Python crashes often, please report it on the OpenSesame forum."
Details : item-stack: ``
That seems to be happening at the end of the task.
It may be related to the file format ?
Or the way to gather stimuli ?
For information: the file "sentences1'' is edited with sublime text 3 and I did not wish to rename it with ".txt" because it worked well.
Thank,
Chris
Problem solved !
It was the cycle repetition that was poorly tuned in the looper.
No problem with the file format.
Have a good day,
Chris