A novel noise-robust soft Voice Activity Detector (VAD) operating in
the short-time Fourier domain is presented. A speech energy gain is
obtained by frame-wise processing of a noisy speech signal with a
speech codebook algorithm. This gain can be used for robust voice
detection. A speaker-independent speech codebook, consisting of
spectral envelopes, is created in the training process. While
applying the algorithm, the codebook is adapted in every frame to
the current speaker by combining the harmonic pitch structure of the
actual noisy speech frame with the codebook entries. Soft VAD
values ranging from zero to one are calculated by post-processing of
the speech gain which is obtained using gain shape vector
quantization. A binary VAD is carried out by applying a
threshold. The proposed method does not rely on noise
a-priori knowledge and is robust w.r.t. highly non-stationary
noise and adverse SNR conditions. In addition, it is possible to
compromise between the detection-rate and the false-alarm-rate by
varying a threshold without increasing the total number of
mis-detections. Compared to state-of-the-art VAD systems, the
proposed method is characterized by better detection-rates at
significant lower false-alarm-rates.
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
The following notice applies to all IEEE publications:
© IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.