Estimate pitch using the probabilistic YIN algorithm

The probabilistic YIN algorithm (Mauch and Dixon 2014) is an extension of YIN (Cheveigné and Kawahara 2002) that considers multiple pitch candidates in a hidden Markov model that is Viterbi-decoded to deduce the final pitch estimate. The function also returns a track encoding whether the track was considered voiced or not, and a track containing the probability of voicing in the analysis frame.

Usage

pyin(
  listOfFiles,
  beginTime = 0,
  endTime = 0,
  windowShift = 5,
  windowSize = 30,
  minF = 70,
  maxF = 200,
  max_transition_rate = 35.92,
  beta_parameters = c(2, 18),
  center = TRUE,
  boltzmann_parameter = 2,
  resolution = 0.1,
  thresholds = 100,
  switch_probability = 0.01,
  no_trough_probability = 0.01,
  pad_mode = "constant",
  explicitExt = "pyp",
  outputDirectory = NULL,
  toFile = TRUE
)

Arguments

listOfFiles: A vector of file paths to wav files.
beginTime: The start time of the section of the sound file that should be processed.
endTime: The end time of the section of the sound file that should be processed.
windowShift: The measurement interval (frame duration), in seconds.
minF: Candidate f0 frequencies below this frequency will not be considered.
maxF: Candidates above this frequency will be ignored.
max_transition_rate: The maximum pitch transition rate in octaves per second.
beta_parameters: The shape parameters for the beta distribution prior over thresholds.
center: Should analysis windows be centered around the time of the window (TRUE, the default) or should the window be considered to have started at the indicated time point (FALSE).
boltzmann_parameter: The shape parameter for the Boltzmann distribution prior over troughs. Larger values will assign more mass to smaller periods.
resolution: The resolution of the pitch bins. 0.01 corresponds to cents.
thresholds: The number of thresholds for peak estimation.
switch_probability: The probability of switching from voiced to unvoiced or vice versa.
no_trough_probability: The maximum probability to add to global minimum if no trough is below threshold.
pad_mode: The mode in which padding occurs. Ignored if center is not TRUE. Padding occurs in the python library librosa, and the user should therefore consult the manual of the NumPy library function numpy.pad for other options.
explicitExt: the file extension that should be used.
outputDirectory: set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file.
toFile: write the output to a file? The file will be written in outputDirectory, if defined, or in the same directory as the soundfile.

Value

An SSFF track object containing two tracks (f0 and pitch) that are either returned (toFile == FALSE) or stored on disk.

Details

This function calls the librosa (McFee et al. 2022) Python library to load the audio data an make pitch related estimates.

References

Cheveigné Ad, Kawahara H (2002). “YIN, a fundamental frequency estimator for speech and music.” The Journal of the Acoustical Society of America, 111(4), 1917--1930. ISSN 0001-4966, doi:10.1121/1.1458024 , http://www.ncbi.nlm.nih.gov/pubmed/12002874.

Mauch M, Dixon S (2014). “PYIN: A Fundamental Frequency Estimator using Probabilistic Threshold Distributions.” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 659--663. doi:10.1109/icassp.2014.6853678 .

McFee B, Metsai A, McVicar M, Balke S, Thomé C, Raffel C, Zalkow F, Malek A, Dana, Lee K, Nieto O, Ellis D, Mason J, Battenberg E, Seyfarth S, Yamamoto R, viktorandreevichmorozov, Choi K, Moore J, Bittner R, Hidaka S, Wei Z, nullmightybofo, Weiss A, Hereñú D, Stöter F, Friesch P, Vollrath M, Kim T, Thassilo (2022). “librosa/librosa: 0.9.1.” doi:10.5281/zenodo.6097378 .