[solved] Voicekey input / Triggering responses through sound input

sebastiaan · August 2011

The example linked to below is no longer available. See http://forum.cogsci.nl/index.php?p=/discussion/1772/ for an updated solution.

In response to the following comment on the OpenSesame page:

thanks for the great program!
Is it possible to program a voicekey_response?
I need to measure the first noise, the probands make after the stimulus is shown.

Hi Inga,

Yes, that's most certainly possible! Based on your question, I'm not sure how you measure sound, though.

a) If you use a voicekey box, it probably sends a signal through the serial/ parallel port when the loudness exceeds a certain threshold. You can use this type of signal as a response with the port_reader plug-in (included with the Windows version).

b) If you use a microphone, you need to record sounds and do a bit of processing to determine the loudness. This is actually not that difficult. I made a simple example, see the link below. I included an inline_script in the example that does all the hard stuff and gives you a response_time and loudness. You may have to tweak some of the parameters in the script, such as the TAP_THRESHOLD (the loudness threshold that should be exceeded) and possibly the input device_index (on line 43), which specifies the device that you want to use (in case there are multiple).

Example: http://forum.cogsci.nl/index.php?p=/discussion/1772/

I hope this gets you started and please let me know of any problems you run into!

Kindest regards,
Sebastiaan

petteri · August 2011

Hi Sebastiaan,

thank you very very much!
I found the mistake - my Microphone made some trouble.
But your program works perfectly and is very useful for my experiments.

Greetings,
Inga

sebastiaan · August 2011

Good to hear it works for you!

christod · October 2011

Do you think it would be easy to modify your script for the microphone to start recording at the beginning of the trial, detect the onset of speech and keep recording until the end of the trial or until no sound is detected, then save the soundfile as .wav?

Alex

sebastiaan · October 2011

Hi Christod,

Sure, that's actually quite a minor change. I changed the script from the voicekey experiment (see above) as shown below. Now it records sound for a pre-specified duration, notes the response (the moment that the loudness exceeds the threshold), and writes the entire clip to a file. Please note that the sound file is 'raw', so you need to import it as raw data in a sound editor. Audacity is a free tool that can do this.

Fore more information and examples, please refer to the pyaudio docs.

Hope this helps!

Kindest regards,

Sebastiaan

...

start_response_interval = self.time()
end_response_interval = None
trial_duration = 3000
output = open("output.raw", "wb")
while self.time() - start_response_interval < trial_duration:
    try:
        block = stream.read(INPUT_FRAMES_PER_BLOCK)
        output.write(block)
    except IOError, e:
        print e
    loudness = get_rms( block )
    if loudness > TAP_THRESHOLD:
        end_response_interval = self.time()
        response_loudness = loudness

if end_response_interval == None:
    self.experiment.set("response_time", "timeout")
    self.experiment.set("loudness", "timeout")
else:
    self.experiment.set("response_time", end_response_interval - start_response_interval)
    self.experiment.set("loudness", response_loudness)

# Close the mic
output.close()
stream.close()

christod · October 2011

Hi Sebastiaan,
Thanks so much for your help. I ended up fusing your script with a script I found on the pyaudio website to record in wav. I am pasting my script at the end of the message.
However, I have a question concerning the variable response time. The reported time is much longer than what the actual difference between the onset of the stimulus and the onset of speech should be. Any ideas where this issue might be coming from?
Also, do you think I could set the end of the sound recording to come with a keypress? I tried the set duration command but it wasn't working. Any ideas if this is feasible.
Thanks again for your help!

I apologize for the formatting. I put line break html formatting but that is the extent of my html knowledge.

Alex

Edit Sebastiaan: Fixed formatting

# Measure the start of the response interval



import pyaudio
import wave
import sys


TAP_THRESHOLD = 0.050
FORMAT = pyaudio.paInt16 
SHORT_NORMALIZE = (1.0/32768.0)
CHANNELS = 1
RATE = 44100  
INPUT_BLOCK_TIME = 0.01
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME)
chunk=1024
RECORD_SECONDS=5

def get_rms( block ):

    """Get root mean square as a measure of loudness"""

    import struct
    import math 

    global SHORT_NORMALIZE

    count = len(block)/2
    format = "%dh" % (count)
    shorts = struct.unpack( format, block )
    sum_squares = 0.0
    for sample in shorts:
        n = sample * SHORT_NORMALIZE
        sum_squares += n*n
    return math.sqrt( sum_squares / count )

# Open the mic
stream = pyaudio.PyAudio().open(format = FORMAT,
    channels = CHANNELS,
    rate = RATE,
    input = True,
    input_device_index = 0,
    frames_per_buffer = INPUT_FRAMES_PER_BLOCK)

# Listen until a sound has been detected
start_response_interval = self.time()
end_response_interval = None
trial_duration = 4000
left = self.get("Left")
right = self.get("Right")
Sub = self.get("subject_nr")
item = self.get("Item")
list = self.get("List")
cond = self.get("Cond")
syl = self.get("Syls")
output = "C:\Documents and Settings\Desktop/"+str(Sub)+"_"+str(item)+"_"+str(list)+"_"+(cond)+"_"+str(syl)+".wav"
all=[]
while self.time() - start_response_interval < trial_duration:
    try:
        block = stream.read(chunk)
        all.append(block)
    except IOError, e:
        print e
    loudness = get_rms( block )
    if loudness > TAP_THRESHOLD:
        end_response_interval = self.time()
        response_loudness = loudness
if end_response_interval == None:
    self.experiment.set("response_time", "timeout")
    self.experiment.set("loudness", "timeout")

else:
    self.experiment.set("response_time", end_response_interval - start_response_interval)
    self.experiment.set("loudness", response_loudness)


# Close the mic

all.append(block)
stream.close()
pyaudio.PyAudio().terminate()


block=''.join(all)
wf=wave.open(output,'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(pyaudio.PyAudio().get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(block)
wf.close()

sebastiaan · October 2011

Good to hear you got it working (mostly)! The increased response times probably come from a number of sources.

the playback latency (timing of sound playback is notoriously sloppy on regular hardware)
the recording latency (same story)
the processing latency (larger chunks = slower)

It may not be feasible to get really realistic response times, although you could see what happens if you use smaller chunks and reduce the TAP_THRESHOLD. But I imagine that the delay will be a more or less fixed factor, so it should be possible to nevertheless see relative RT differences between conditions.

With regard to using the keyboard, something like this should do the trick:

from openexp.keyboard import keyboard ... my_keyboard = keyboard(self.experiment, timeout=1) while True: resp, time = my_keyboard.get_key() if resp != None: break ...

For more info, see openexp.keyboard.

(For code formatting, you can use the <pre> .. </pre> tags.)

christod · October 2011

Your keypress code works like a charm. Thanks! As far as the response time goes, as long as the added time is fixed I should be able to implement my manipulation. I am trying to give feedback to the participants any time they take too long to start speaking.
Thanks again for your help!

Alex

Svetlognev · May 2015

Hello, colleagues!

I used for my study the inline written here in the message of Crhistod. It was the verbal response to pictures.

1) I see that you admit some problems with latencies. I did some tests on different computers (Win8, Win7 with different microphones, but the same OS 2.8.1) and the mean and the minimum reaction times were different: in one PC the minimum was 500 ms (when we tried to answer to the task), in the other - 900 ms. I understand that the problem may be based on different Windows, microphones, audiocards, processors, does not it? Hence, we can not compare results of different studies conducted in different computers? Also how do we know the reaction times to keypress are constant in different computers? Looking forward much to an answer, thank you in advance.

2) The design was the following: a series of 12 pictures, then the inline with verbal response. As the task was not easy, the very great majority of responses was made after the last picture, when the inline has been launched. But some responses I was to delete, because they answered during the series of pictures, that is before the launch of the inline with verbal response. Can we in OS begin to present pictures and simultaneously launch an inline (microphone in this case)?

Vladimir Kosonogov, Rostov-on-Don and Murcia

sebastiaan · May 2015

I understand that the problem may be based on different Windows, microphones, audiocards, processors, does not it? Hence, we can not compare results of different studies conducted in different computers?

Not if you're interested in absolute response times, no. But then again, you're usually interested in effects between groups and conditions, which should be comparable, even if the response times are offset by different amounts. You can just see the set-up as an additional 'random effect', just like the participant.

Also how do we know the reaction times to keypress are constant in different computers?

These differ between set-ups too, although to a lesser extent than audio latencies.

Can we in OS begin to present pictures and simultaneously launch an inline (microphone in this case)?

This wouldn't be easy to do with the script that you used. But the soundrecorder plug-ins do run in the background, so they might be a good solution in your case:

http://osdoc.cogsci.nl/devices/soundrecorder/

Cheers,
Sebastiaan

Klaudius · November 2015

Dear Sebastiaan,

in my working group we try to shed light on dual task processes and theories regarding dual task phenomena.
Currently we elaborate several methods to obtain latencies from verbal responses to auditive stimuli.
Therefore, we want to record from onset of the stimuls tone to the subsequent verbal response, for each single trail.

I have seen that you provided a link at the very beginning of this discussion to Inga (August,2011). I could imagine that this link is very helpful.
Unfortantely it does not seem to work anymore. Would it be possible provide me with a new link ?

Best regards,
Klaudius

Howdy, Stranger!

Categories

[solved] Voicekey input / Triggering responses through sound input

Comments

Howdy, Stranger!

Quick Links

Categories

[solved] Voicekey input / Triggering responses through sound input

Comments