Compute f0 using the algorithm named Yet Another Algorithm for Pitch Tracking
yaapt.Rd
The Yet Another Algorithm for Pitch Tracking algorithm (Kasi and Zahorian 2002) that computes f0 using Normalized Cross Correlation (NCCF) and the work of Talkin (Talkin and Kleijn 1995) in developing the RAPT algorithm.
Usage
yaapt(
listOfFiles,
beginTime = 0,
endTime = 0,
windowShift = 5,
windowSize = 35,
minF = 70,
maxF = 200,
tda_frame_length = 35,
fft_length = 8192,
bp_forder = 150,
bp_low = 50,
bp_high = 1500,
nlfer_thresh1 = 0.75,
nlfer_thresh2 = 0.1,
shc_numharms = 3,
shc_window = 40,
shc_maxpeaks = 4,
shc_pwidth = 50,
shc_thresh1 = 5,
shc_thresh2 = 1.25,
f0_double = 150,
f0_half = 150,
dp5_k1 = 11,
dec_factor = 1,
nccf_thresh1 = 0.3,
nccf_thresh2 = 0.9,
nccf_maxcands = 3,
nccf_pwidth = 5,
merit_boost = 0.2,
merit_pivot = 0.99,
merit_extra = 0.4,
median_value = 7,
dp_w1 = 0.15,
dp_w2 = 0.5,
dp_w3 = 0.1,
dp_w4 = 0.9,
explicitExt = "yf0",
outputDirectory = NULL,
toFile = TRUE
)
Arguments
- listOfFiles
A vector of file paths to wav files.
- beginTime
The start time of the section of the sound file that should be processed.
- endTime
The end time of the section of the sound file that should be processed.
- windowShift
The measurement interval (frame duration), in seconds.
- windowSize
length of each analysis frame (default: 35 ms)
- minF
Candidate f0 frequencies below this frequency will not be considered.
- maxF
Candidates above this frequency will be ignored.
- tda_frame_length
The frame length employed in the time domain analysis (defaults to the same as windowSize 35 ms).
- fft_length
FFT length (default: 8192 samples)
- bp_forder
order of band-pass filter (default: 150)
- bp_low
low frequency of filter passband (default: 50 Hz)
- bp_high
high frequency of filter passband (default: 1500 Hz)
- nlfer_thresh1
NLFER (Normalized Low Frequency Energy Ratio) boundary for voiced/unvoiced decisions (default: 0.75)
- nlfer_thresh2
threshold for NLFER definitely unvoiced (default: 0.1)
- shc_numharms
number of harmonics in SHC (Spectral Harmonics Correlation) calculation (default: 3)
- shc_window
SHC window length (default: 40 Hz)
- shc_maxpeaks
maximum number of SHC peaks to be found (default: 4)
- shc_pwidth
window width in SHC peak picking (default: 50 Hz)
- shc_thresh1
threshold 1 for SHC peak picking (default: 5)
- shc_thresh2
threshold 2 for SHC peak picking (default: 1.25)
- f0_double
pitch doubling decision threshold (default: 150 Hz)
- f0_half
pitch halving decision threshold (default: 150 Hz)
- dp5_k1
weight used in dynamic program (default: 11)
- dec_factor
factor for signal resampling (default: 1)
- nccf_thresh1
threshold for considering a peak in NCCF (Normalized Cross Correlation Function) (default: 0.3)
- nccf_thresh2
threshold for terminating search in NCCF (default: 0.9)
- nccf_maxcands
maximum number of candidates found (default: 3)
- nccf_pwidth
window width in NCCF peak picking (default: 5)
- merit_boost
boost merit (default. 0.20)
- merit_pivot
merit assigned to unvoiced candidates in definitely unvoiced frames (default: 0.99)
- merit_extra
merit assigned to extra candidates in reducing pitch doubling/halving errors (default: 0.4)
- median_value
order of medial filter (default: 7)
- dp_w1
DP (Dynamic Programming) weight factor for voiced-voiced transitions (default: 0.15)
- dp_w2
DP weight factor for voiced-unvoiced or unvoiced-voiced transitions (default: 0.5)
- dp_w3
DP weight factor of unvoiced-unvoiced transitions (default: 0.1)
- dp_w4
Weight factor for local costs (default: 0.9)
- explicitExt
the file extension that should be used.
- outputDirectory
set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file.
- toFile
write the output to a file? The file will be written in
outputDirectory
, if defined, or in the same directory as the soundfile.
Value
An SSFF track object containing two tracks ("f0" and "voiced") which contains the computed pitch values, and a binary (0 or 1) indication of whether the frame was considered "voiced" (1) or not (0). The tracks are either returned (toFile == FALSE) or stored on disk.
Details
The YAAPT algorithm processes the original acoustic signal and a non-linearly processed version of the signal to partially restore very weak f0 components. Intelligent peak picking to select multiple f0 candidates and assign merit factors; and, incorporation of highly robust pitch contours obtained from smoothed versions of low frequency portions of spectrograms. Dynamic programming is used to find the “best” pitch track among all the candidates, using both local and transition costs.
References
Kasi K, Zahorian SA (2002).
“Yet Another Algorithm for Pitch Tracking.”
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I--361-I-364.
doi:10.1109/icassp.2002.5743729
.
Talkin D, Kleijn WB (1995).
“A robust algorithm for pitch tracking (RAPT).”
Speech coding and synthesis, 495, 518.