9
votes

I'm writing an app in which I'm trying to change the pitch of the audio when I'm recording a movie (.m4v). Or by modifying the audio pitch of the movie afterwards. I want the end result to be a movie (.m4v) that has the original length (i.e. same visual as original) but with modified sound pitch, e.g. a "chipmunk voice". A realtime conversion is to prefer if possible.

I've read alot about changing audio pitch in iOS but most examples focus on playback, i.e. playing the sound with a different pitch.

In my app I'm recording a movie (.m4v / AVFileTypeQuickTimeMovie) and saving it using standard AVAssetWriter. When saving the movie I have access to the following elements where I've tried to manipulate the audio (e.g. modify the pitch):

  • audio buffer (CMSampleBufferRef)
  • audio input writer (AVAssetWriterAudioInput)
  • audio input writer options (e.g. AVNumberOfChannelsKey, AVSampleRateKey, AVChannelLayoutKey)
  • asset writer (AVAssetWriter)

I've tried to hook into the above objects to modify the audio pitch, but without success.

I've also tried with Dirac as described here: Real Time Pitch Change In iPhone Using Dirac
And OpenAL with AL_PITCH as described here: Piping output from OpenAL into a buffer
And the "BASS" library from un4seen: Change Pitch/Tempo In Realtime

I haven't found success with any of the above libs, most likely because I don't really know how to use them, and where to hook them into the audio saving code.

There seems to be alot of librarys that have similar effects but focuses on playback or custom recording code. I want to manipulate the audio stream I've already got (AVAssetWriterAudioInput) or modify the saved movie clip (.m4v). I want the video to be unmodifed visually, i.e. played at the same speed. But I want the audio to go faster (like a chipmunk) or slower (like a ... monster?

Do you have any suggestions how I can modify the pitch in either real time (when recording the movie) or afterwards by converting the entire movie (.m4v file)? Should I look further into Dirac, OpenAL, SoundTouch, BASS or some other library?

I want to be able to share the movie to others with modified audio, that's the reason I can't rely on modifying the pitch for playback only.

2

2 Answers

3
votes

Okay, I can safely say that dirac will definitively do the trick. I have used it and it does work.

I have no much experience with video processing but if at somepoint you can isolate the audio track it is a piece of cake.

  1. if you can do that, then just save it into a file and use dirac's sample code on time stretching, it does not say it but it also does pitch shifting, you set three parametres to transform your audio (time stretching factor, pitch shifting in cents/tones* and also formant shift).

  2. if you dont feel like saving it into a file, well then just convert it to PCM and do some DSP on Audio Units. to be honest you require some serious knowledge on mathematics and audio processing to do that, but there are a bunch of good sample projects out there (github (AudioGraph by Tom Zic)) that will provide you with what you need, do not forget to mention all those devs code on your work.

Furthermore, if you can transform to PCM, at this stage you can alternatively apply dirac to the uncompressed audio either live on the audio units graph or by using their sample code and instead of using the EAFReader Dirac uses, just past your buffer data to the buffer it uses to perform the pitch shift. you might need to do a little magic there but not as dramatic as writing your own DSP implementation of pitch shifting.

Bottom line, if you can ask AVFoundation to take care of the video only then you can do the audio units live processing and set a callback so everytime it process it you can pass the processed data to a file or probably to you avassetwriter, I am not quite sure if this very last piece is possible. If it is not possible, then the solution is to synchronise and save video and audio separetely although I can imagine that being a huge issue as they will both try to write to disk at the same time. Please let me know how it goes, I am intrigued now.

1
votes
  • First thing you need to do is demux audio from the mp4 stream. You will need a demultiplexer (demuxer in short) to achieve this. Have a look at MainConcept SDKs, they support a bunch of formats.
  • Second you need to decode your compressed audio from whatever format it is, to raw PCM.
  • Then use some library to pitch shift the raw audio.
  • Next you need to encode pitch-shifted audio back to compressed format.
  • And mux back into mp4.

You will loose some audio quality in the process because of decode-encode. Your video will stay the same.