

You then try to find matches of the sequence of features in your effect with the sample. If this does not work then you could analyze each frame using an FFT this gives you a feature vector for each frame. Then use cross-correlation in the time domain. Constructing new waveforms from the effect and the sample by using the average of the peak values over time frames that are just short enough to capture the relevant features. This is less likely to be affected by the mp3 process.įirst, normalise your sample so the embedded effects are the same level as your reference effect. You could try trying to match the volume envelopes of your effect and your sample. The mp3 signal will preserve the perceptual properties but it is quite likely the phases of the frequency components will be shifted so the sample values will not match. Trying to directly match waveforms samples in the time domain is not a good idea. So I can imagine that some kind of averaging on every loop would cause this to take considerably longer. I should also note this script takes just over 1 minute to process the 3 hour file (which includes 237,426,624 samples).
Phew sound affect audio file mp4#
This does not work with the MP3 files above, but did with an MP4 version - where it was able to find the sample I extracted, but it was only that one sample (not all 12). Sample_series = numpy.around(sample_series, decimals=5) įor source_id, source_sample in enumerate(source_series): Source_series = numpy.around(source_series, decimals=5) Sample_series, sample_rate = librosa.load('sample.mp3') # 1 second file Source_series, source_rate = librosa.load('source.mp3') # 3 hour file
Phew sound affect audio file series#
I'm very new to audio processing, but my initial thought was to extract a sample of the 1 second sound effect, then use librosa in python to extract a floating point time series for both files, round the floating point numbers, and try to get a match. The time offsets will be stored in the ID3 Chapter Frame MetaData.Įxample Source, where the sound effect plays twice.įfmpeg -ss 0.9 -i source.mp3 -t 0.95 sample1.mp3 -acodec copy -yįfmpeg -ss 4.5 -i source.mp3 -t 0.95 sample2.mp3 -acodec copy -y The sound effect is similar every time, but because it's been encoded in a lossy file format, there will be a small amount of variation. Is it possible to identify each time this sound effect is played, so I can note the time offsets? I have a load of 3 hour MP3 files, and every ~15 minutes a distinct 1 second sound effect is played, which signals the beginning of a new chapter.
