Estimate pitch using the probabilistic YIN algorithm
pyin.Rd
The probabilistic YIN algorithm (Mauch and Dixon 2014) is an extension of YIN (Cheveigné and Kawahara 2002) that considers multiple pitch candidates in a hidden Markov model that is Viterbi-decoded to deduce the final pitch estimate. The function also returns a track encoding whether the track was considered voiced or not, and a track containing the probability of voicing in the analysis frame.
Usage
pyin(
listOfFiles,
beginTime = 0,
endTime = 0,
windowShift = 5,
windowSize = 30,
minF = 70,
maxF = 200,
max_transition_rate = 35.92,
beta_parameters = c(2, 18),
center = TRUE,
boltzmann_parameter = 2,
resolution = 0.1,
thresholds = 100,
switch_probability = 0.01,
no_trough_probability = 0.01,
pad_mode = "constant",
explicitExt = "pyp",
outputDirectory = NULL,
toFile = TRUE
)
Arguments
- listOfFiles
A vector of file paths to wav files.
- beginTime
The start time of the section of the sound file that should be processed.
- endTime
The end time of the section of the sound file that should be processed.
- windowShift
The measurement interval (frame duration), in seconds.
- minF
Candidate f0 frequencies below this frequency will not be considered.
- maxF
Candidates above this frequency will be ignored.
- max_transition_rate
The maximum pitch transition rate in octaves per second.
- beta_parameters
The shape parameters for the beta distribution prior over thresholds.
- center
Should analysis windows be centered around the time of the window (
TRUE
, the default) or should the window be considered to have started at the indicated time point (FALSE
).- boltzmann_parameter
The shape parameter for the Boltzmann distribution prior over troughs. Larger values will assign more mass to smaller periods.
- resolution
The resolution of the pitch bins. 0.01 corresponds to cents.
- thresholds
The number of thresholds for peak estimation.
- switch_probability
The probability of switching from voiced to unvoiced or vice versa.
- no_trough_probability
The maximum probability to add to global minimum if no trough is below threshold.
- pad_mode
The mode in which padding occurs. Ignored if
center
is notTRUE
. Padding occurs in the python library librosa, and the user should therefore consult the manual of the NumPy library function numpy.pad for other options.- explicitExt
the file extension that should be used.
- outputDirectory
set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file.
- toFile
write the output to a file? The file will be written in
outputDirectory
, if defined, or in the same directory as the soundfile.
Value
An SSFF track object containing two tracks (f0 and pitch) that are either returned (toFile == FALSE) or stored on disk.
Details
This function calls the librosa (McFee et al. 2022) Python library to load the audio data an make pitch related estimates.
References
Cheveigné Ad, Kawahara H (2002).
“YIN, a fundamental frequency estimator for speech and music.”
The Journal of the Acoustical Society of America, 111(4), 1917--1930.
ISSN 0001-4966, doi:10.1121/1.1458024
, http://www.ncbi.nlm.nih.gov/pubmed/12002874.
Mauch M, Dixon S (2014).
“PYIN: A Fundamental Frequency Estimator using Probabilistic Threshold Distributions.”
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 659--663.
doi:10.1109/icassp.2014.6853678
.
McFee B, Metsai A, McVicar M, Balke S, Thomé C, Raffel C, Zalkow F, Malek A, Dana, Lee K, Nieto O, Ellis D, Mason J, Battenberg E, Seyfarth S, Yamamoto R, viktorandreevichmorozov, Choi K, Moore J, Bittner R, Hidaka S, Wei Z, nullmightybofo, Weiss A, Hereñú D, Stöter F, Friesch P, Vollrath M, Kim T, Thassilo (2022).
“librosa/librosa: 0.9.1.”
doi:10.5281/zenodo.6097378
.