Skip to contents

The probabilistic YIN algorithm (Mauch and Dixon 2014) is an extension of YIN (Cheveigné and Kawahara 2002) that considers multiple pitch candidates in a hidden Markov model that is Viterbi-decoded to deduce the final pitch estimate. The function also returns a track encoding whether the track was considered voiced or not, and a track containing the probability of voicing in the analysis frame.

Usage

pyin(
  listOfFiles,
  beginTime = 0,
  endTime = 0,
  windowShift = 5,
  windowSize = 30,
  minF = 70,
  maxF = 200,
  max_transition_rate = 35.92,
  beta_parameters = c(2, 18),
  center = TRUE,
  boltzmann_parameter = 2,
  resolution = 0.1,
  thresholds = 100,
  switch_probability = 0.01,
  no_trough_probability = 0.01,
  pad_mode = "constant",
  explicitExt = "pyp",
  outputDirectory = NULL,
  toFile = TRUE
)

Arguments

listOfFiles

A vector of file paths to wav files.

beginTime

The start time of the section of the sound file that should be processed.

endTime

The end time of the section of the sound file that should be processed.

windowShift

The measurement interval (frame duration), in seconds.

minF

Candidate f0 frequencies below this frequency will not be considered.

maxF

Candidates above this frequency will be ignored.

max_transition_rate

The maximum pitch transition rate in octaves per second.

beta_parameters

The shape parameters for the beta distribution prior over thresholds.

center

Should analysis windows be centered around the time of the window (TRUE, the default) or should the window be considered to have started at the indicated time point (FALSE).

boltzmann_parameter

The shape parameter for the Boltzmann distribution prior over troughs. Larger values will assign more mass to smaller periods.

resolution

The resolution of the pitch bins. 0.01 corresponds to cents.

thresholds

The number of thresholds for peak estimation.

switch_probability

The probability of switching from voiced to unvoiced or vice versa.

no_trough_probability

The maximum probability to add to global minimum if no trough is below threshold.

pad_mode

The mode in which padding occurs. Ignored if center is not TRUE. Padding occurs in the python library librosa, and the user should therefore consult the manual of the NumPy library function numpy.pad for other options.

explicitExt

the file extension that should be used.

outputDirectory

set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file.

toFile

write the output to a file? The file will be written in outputDirectory, if defined, or in the same directory as the soundfile.

Value

An SSFF track object containing two tracks (f0 and pitch) that are either returned (toFile == FALSE) or stored on disk.

Details

This function calls the librosa (McFee et al. 2022) Python library to load the audio data an make pitch related estimates.

References

Cheveigné Ad, Kawahara H (2002). “YIN, a fundamental frequency estimator for speech and music.” The Journal of the Acoustical Society of America, 111(4), 1917--1930. ISSN 0001-4966, doi:10.1121/1.1458024 , http://www.ncbi.nlm.nih.gov/pubmed/12002874.

Mauch M, Dixon S (2014). “PYIN: A Fundamental Frequency Estimator using Probabilistic Threshold Distributions.” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 659--663. doi:10.1109/icassp.2014.6853678 .

McFee B, Metsai A, McVicar M, Balke S, Thomé C, Raffel C, Zalkow F, Malek A, Dana, Lee K, Nieto O, Ellis D, Mason J, Battenberg E, Seyfarth S, Yamamoto R, viktorandreevichmorozov, Choi K, Moore J, Bittner R, Hidaka S, Wei Z, nullmightybofo, Weiss A, Hereñú D, Stöter F, Friesch P, Vollrath M, Kim T, Thassilo (2022). “librosa/librosa: 0.9.1.” doi:10.5281/zenodo.6097378 .