"Segmentation" of eye-tracking data in Python (e.g. pandas)
A more general question from me this time. I typically used OpenSesame to present my stimuli whilst using the eye tracker (usually Tobii Glasses or, like this time, the SMI red-500). In my current experiment, I am sending string messages to the eye tracker at the start of each trial (numbered), at the onset of the stimulus of interest, and at its offset.*
One thing that I've never managed to do in the most efficient way possible is selecting all the data in the dataframe for each trial between (and including) the two messages. Then, I could simply run aggregate functions on each period of interest (e.g. total dwell time on an AOI) and merge it with the OS output fur further data processing.
I was thinking of using the
.loc function in Pandas. However, what I find the most tricky with something like Pandas compared to, say, Matlab, is iterating through the dataframe (I.e. "find the next instance and index it, then find the next instance and index it, then ... etc." instead of "find all and aggregate", if that explanation makes sense.
I am asking here first rather than on Stack Overflow, because this type of data is quite common in psych experiments. In short, just wondering if there's anyone who used Python to process similar data, and/or whether I am missing something obvious.
- PS @eduard, if you're reading along - According to your script, this function is not supported in SMI, but there is an easy solution. Remind me to commit to you Git.
(I can't find an edit button. The post scriptum was meant for @Edwin. Bloody auto-fill...)
I wouldn't say that there's a standard way to do this, and the initial parsing process (i.e. going from the text data to some Python data structure) always depends on the specifics of the eye tracker. However, perhaps this tutorial (for EyeLink data) will give you some idea of how you could do this.
I had a quick look at the eyelinkparser (and the included smiparser) and it looks VERY useful indeed, thanks!
Just to wrap up this discussion, I'm actually using a custom parser that outputs a pandas compatible (uniform) .csv. I do like how DataMatrix works with both categorical and series data, but I'm sticking to pandas for now because of familiarity and a tight conference paper deadline.
To answer my own iteration question,
itertuples()works quite well. The selection of relevant phases is quite straight-forward too as I can just slice the whole Dataframe per trial, get the indices of the relevant messages, and use the range between these indices.
Bottom line: all is well.