Author Topic: pocketsphinx  (Read 4830 times)

0 Members and 1 Guest are viewing this topic.

Offline Sorunome

  • Fox Fox Fox Fox Fox Fox Fox!
  • Support Staff
  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 7920
  • Rating: +374/-13
  • Derpy Hooves
    • View Profile
    • My website! (You might lose the game)
pocketsphinx
« on: July 04, 2014, 11:08:58 am »
Anyone with pocketsphinx experience here?
The thing is that it only recognizes jibberish for me :(

THE GAME
Also, check out my website
If OmnomIRC is screwed up, blame me!
Click here to give me an internet!

Offline ElementCoder

  • LV7 Elite (Next: 700)
  • *******
  • Posts: 611
  • Rating: +42/-2
    • View Profile
Re: pocketsphinx
« Reply #1 on: July 07, 2014, 12:55:23 pm »
I've never worked with pocketsphinx, but maybe http://cmusphinx.sourceforge.net/wiki/tutorialam http://cmusphinx.sourceforge.net/wiki/tutorialam or
http://cmusphinx.sourceforge.net/wiki/tutorialam can be of help.  Are you also recording your voice in 16-bit 16kHz single channel mono as stated?
« Last Edit: July 07, 2014, 02:00:47 pm by ElementCoder »

Some people need a high five in the face... with a chair.
~EC

Offline Sorunome

  • Fox Fox Fox Fox Fox Fox Fox!
  • Support Staff
  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 7920
  • Rating: +374/-13
  • Derpy Hooves
    • View Profile
    • My website! (You might lose the game)
Re: pocketsphinx
« Reply #2 on: July 07, 2014, 01:00:29 pm »
How can i set how many kHz I want to record, using pyaudio?

THE GAME
Also, check out my website
If OmnomIRC is screwed up, blame me!
Click here to give me an internet!

Offline ElementCoder

  • LV7 Elite (Next: 700)
  • *******
  • Posts: 611
  • Rating: +42/-2
    • View Profile
Re: pocketsphinx
« Reply #3 on: July 07, 2014, 01:11:59 pm »
I found an example on http://people.csail.mit.edu/hubert/pyaudio/#examples which I think has the necesarry options. I guess you'd have to change the CHANNELS to 1 and the RATE to its 16kHz equivalent.
Code: [Select]
"""PyAudio example: Record a few seconds of audio and save to a WAVE file."""

import pyaudio
import wave

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
[\code]
Again, I've never worked with this or audio in general so I'm making some guesses based on the documentation and interwebs.

Some people need a high five in the face... with a chair.
~EC

Offline Sorunome

  • Fox Fox Fox Fox Fox Fox Fox!
  • Support Staff
  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 7920
  • Rating: +374/-13
  • Derpy Hooves
    • View Profile
    • My website! (You might lose the game)
Re: pocketsphinx
« Reply #4 on: July 07, 2014, 01:16:07 pm »
I don't see any kHz setting, all i see is the RATE setting (which I already use here :) )
EDIT: setting RATE caches the recording :( But On playback it tells me it is 44100Hz rate, which is what i set the RATE variable to.
« Last Edit: July 07, 2014, 01:19:17 pm by Sorunome »

THE GAME
Also, check out my website
If OmnomIRC is screwed up, blame me!
Click here to give me an internet!

Offline ElementCoder

  • LV7 Elite (Next: 700)
  • *******
  • Posts: 611
  • Rating: +42/-2
    • View Profile
Re: pocketsphinx
« Reply #5 on: July 07, 2014, 01:19:51 pm »
That seems like the kHz setting to me. Have you tried setting it to 16000? That's all I can think of. What are you trying to make btw, a secure Skype clone? :P
j/k really though, what are you making? :)
« Last Edit: July 07, 2014, 02:00:35 pm by ElementCoder »

Some people need a high five in the face... with a chair.
~EC

Offline Sorunome

  • Fox Fox Fox Fox Fox Fox Fox!
  • Support Staff
  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 7920
  • Rating: +374/-13
  • Derpy Hooves
    • View Profile
    • My website! (You might lose the game)
Re: pocketsphinx
« Reply #6 on: July 07, 2014, 01:24:53 pm »
Setting it to 16kHz gives me this:

Code: [Select]
Expression 'r' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2018
Expression 'PaAlsaStreamComponent_FinishConfigure( &self->capture, hwParamsCapture, inParams, self->primeBuffers, realSr, inputLatency )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2655
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2767
Traceback (most recent call last):
  File "speechcontrol.py", line 33, in <module>
    inputStream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=False,frames_per_buffer=CHUNK)
  File "/usr/lib/pymodules/python2.7/pyaudio.py", line 714, in open
    stream = Stream(self, *args, **kwargs)
  File "/usr/lib/pymodules/python2.7/pyaudio.py", line 396, in __init__
    self._stream = pa.open(**arguments)
IOError: [Errno Unanticipated host error] -9999

And I'm just messing with voice control, I mean how epic would it be if you enter your room and say "ok pi, turn on my computer"

THE GAME
Also, check out my website
If OmnomIRC is screwed up, blame me!
Click here to give me an internet!

Offline ElementCoder

  • LV7 Elite (Next: 700)
  • *******
  • Posts: 611
  • Rating: +42/-2
    • View Profile
Re: pocketsphinx
« Reply #7 on: July 07, 2014, 01:45:09 pm »
That would be epic indeed. It seems to have something to do with the stream itself e.g. opening a surround stream on a stereo device won't work.
I don't know what could be wrong though. Could you maybe paste your script?
« Last Edit: July 07, 2014, 02:00:25 pm by ElementCoder »

Some people need a high five in the face... with a chair.
~EC

Offline Sorunome

  • Fox Fox Fox Fox Fox Fox Fox!
  • Support Staff
  • LV13 Extreme Addict (Next: 9001)
  • *************
  • Posts: 7920
  • Rating: +374/-13
  • Derpy Hooves
    • View Profile
    • My website! (You might lose the game)
Re: pocketsphinx
« Reply #8 on: July 07, 2014, 02:55:25 pm »
Code: [Select]
###!/usr/bin/python2
try:
        import pocketsphinx as ps
except:
        import pocketsphinx as ps

import sphinxbase,pyaudio,wave

hmmd = '/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k'
lmdir = '/usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP'
dictp = '/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic'

hmmd = '/usr/local/share/pocketsphinx/model/hmm/en/tidigits'
lmdir = '/usr/local/share/pocketsphinx/model/lm/en/tidigits.DMP'
dictp = '/usr/local/share/pocketsphinx/model/lm/en/tidigits.dic'

#lmdir = '/home/sorunome/languagemodel_persona.lm'
#dictp = '/home/sorunome/dictionary_persona.dic'

p = pyaudio.PyAudio()

device = p.get_device_info_by_index(0)


CHUNK = 5750
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = int(device['defaultSampleRate'])

inputStream = p.open(format=FORMAT,channels=CHANNELS,rate=RATE,input=True,output=False,frames_per_buffer=CHUNK)

frames = []
for i in range(RATE/CHUNK * 5):
        frames.append(inputStream.read(CHUNK))

inputStream.stop_stream()
inputStream.close()
p.terminate()
write_frames = wave.open('tmp.wav','wb')
write_frames.setnchannels(CHANNELS)
write_frames.setsampwidth(p.get_sample_size(FORMAT))
write_frames.setframerate(RATE)
write_frames.writeframes(''.join(frames))
write_frames.close()


wavFile = file('tmp.wav','rb')
wavFile.seek(44)
#speechRec =  ps.Decoder(lm='/usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.DMP',dict='/usr/local/share/pocketsphinx/model/lm/en_US/hub4.5000.dic',hmm='/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k')
#speechRec =  ps.Decoder(lm='/home/sorunome/languagemodel_persona.lm',dict='/home/sorunome/dictionary_persona.dic',hmm='/usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k')
speechRec = ps.Decoder(hmm = hmmd,lm = lmdir,dict = dictp)
speechRec.decode_raw(wavFile)
print 'EPIC output',speechRec.get_hyp()
Lol, how did the extra hashes reach the first line of code :P

THE GAME
Also, check out my website
If OmnomIRC is screwed up, blame me!
Click here to give me an internet!