Speech silicon AM: an FPGA-based acoustic modeling pipeline for hidden Markov model based speech recognition

Abstract

This paper presents the design of a FPGA-based hardware co-processor, based on the SPHINX 3 speech recognition engine from CMU; capable of performing acoustic modeling (AM) for medium sized vocabularies in real-time. By creating an input-driven pipeline for performing the calculations, we were able to maximize the throughput of the system while simultaneously minimizing the number of pipeline stalls. Use advanced placement techniques enabled post place-and-route speeds even greater than those necessary for real-time operation while operating at maximum workload. Further, by using input control vectors all FSMs were removed from the design, greatly increasing the flexibility of the design. These results combined with the ability to reprogram the system for different recognition tasks serve to create a system capable of in a vast array of environments. Synthesis to both Xilinx Virtex 4 and Spartan 3 FPGAs helps to further characterize the flexibility of the architecture.