Speech recognition can be CPU intensive, and it might be keeping your tracking code from running on a timely basis. And recognition depends on analyzing incoming sounds, so you pay for it whether or not anything is recognized as a command.
I'd start by making sure that I'm doing the two tasks on separate threads, so I could use multiple cores where available.