Skip to contents

The Yet Another Algorithm for Pitch Tracking algorithm (Kasi and Zahorian 2002) that computes f0 using Normalized Cross Correlation (NCCF) and the work of Talkin (Talkin and Kleijn 1995) in developing the RAPT algorithm.

Usage

yaapt(
  listOfFiles,
  beginTime = 0,
  endTime = 0,
  windowShift = 5,
  windowSize = 35,
  minF = 70,
  maxF = 200,
  tda_frame_length = 35,
  fft_length = 8192,
  bp_forder = 150,
  bp_low = 50,
  bp_high = 1500,
  nlfer_thresh1 = 0.75,
  nlfer_thresh2 = 0.1,
  shc_numharms = 3,
  shc_window = 40,
  shc_maxpeaks = 4,
  shc_pwidth = 50,
  shc_thresh1 = 5,
  shc_thresh2 = 1.25,
  f0_double = 150,
  f0_half = 150,
  dp5_k1 = 11,
  dec_factor = 1,
  nccf_thresh1 = 0.3,
  nccf_thresh2 = 0.9,
  nccf_maxcands = 3,
  nccf_pwidth = 5,
  merit_boost = 0.2,
  merit_pivot = 0.99,
  merit_extra = 0.4,
  median_value = 7,
  dp_w1 = 0.15,
  dp_w2 = 0.5,
  dp_w3 = 0.1,
  dp_w4 = 0.9,
  explicitExt = "yf0",
  outputDirectory = NULL,
  toFile = TRUE
)

Arguments

listOfFiles

A vector of file paths to wav files.

beginTime

The start time of the section of the sound file that should be processed.

endTime

The end time of the section of the sound file that should be processed.

windowShift

The measurement interval (frame duration), in seconds.

windowSize

length of each analysis frame (default: 35 ms)

minF

Candidate f0 frequencies below this frequency will not be considered.

maxF

Candidates above this frequency will be ignored.

tda_frame_length

The frame length employed in the time domain analysis (defaults to the same as windowSize 35 ms).

fft_length

FFT length (default: 8192 samples)

bp_forder

order of band-pass filter (default: 150)

bp_low

low frequency of filter passband (default: 50 Hz)

bp_high

high frequency of filter passband (default: 1500 Hz)

nlfer_thresh1

NLFER (Normalized Low Frequency Energy Ratio) boundary for voiced/unvoiced decisions (default: 0.75)

nlfer_thresh2

threshold for NLFER definitely unvoiced (default: 0.1)

shc_numharms

number of harmonics in SHC (Spectral Harmonics Correlation) calculation (default: 3)

shc_window

SHC window length (default: 40 Hz)

shc_maxpeaks

maximum number of SHC peaks to be found (default: 4)

shc_pwidth

window width in SHC peak picking (default: 50 Hz)

shc_thresh1

threshold 1 for SHC peak picking (default: 5)

shc_thresh2

threshold 2 for SHC peak picking (default: 1.25)

f0_double

pitch doubling decision threshold (default: 150 Hz)

f0_half

pitch halving decision threshold (default: 150 Hz)

dp5_k1

weight used in dynamic program (default: 11)

dec_factor

factor for signal resampling (default: 1)

nccf_thresh1

threshold for considering a peak in NCCF (Normalized Cross Correlation Function) (default: 0.3)

nccf_thresh2

threshold for terminating search in NCCF (default: 0.9)

nccf_maxcands

maximum number of candidates found (default: 3)

nccf_pwidth

window width in NCCF peak picking (default: 5)

merit_boost

boost merit (default. 0.20)

merit_pivot

merit assigned to unvoiced candidates in definitely unvoiced frames (default: 0.99)

merit_extra

merit assigned to extra candidates in reducing pitch doubling/halving errors (default: 0.4)

median_value

order of medial filter (default: 7)

dp_w1

DP (Dynamic Programming) weight factor for voiced-voiced transitions (default: 0.15)

dp_w2

DP weight factor for voiced-unvoiced or unvoiced-voiced transitions (default: 0.5)

dp_w3

DP weight factor of unvoiced-unvoiced transitions (default: 0.1)

dp_w4

Weight factor for local costs (default: 0.9)

explicitExt

the file extension that should be used.

outputDirectory

set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file.

toFile

write the output to a file? The file will be written in outputDirectory, if defined, or in the same directory as the soundfile.

Value

An SSFF track object containing two tracks ("f0" and "voiced") which contains the computed pitch values, and a binary (0 or 1) indication of whether the frame was considered "voiced" (1) or not (0). The tracks are either returned (toFile == FALSE) or stored on disk.

Details

The YAAPT algorithm processes the original acoustic signal and a non-linearly processed version of the signal to partially restore very weak f0 components. Intelligent peak picking to select multiple f0 candidates and assign merit factors; and, incorporation of highly robust pitch contours obtained from smoothed versions of low frequency portions of spectrograms. Dynamic programming is used to find the “best” pitch track among all the candidates, using both local and transition costs.

References

Kasi K, Zahorian SA (2002). “Yet Another Algorithm for Pitch Tracking.” 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I--361-I-364. doi:10.1109/icassp.2002.5743729 .

Talkin D, Kleijn WB (1995). “A robust algorithm for pitch tracking (RAPT).” Speech coding and synthesis, 495, 518.