4
votes

I have a question regarding a sound synthesis app that I'm working on. I am trying to read in an audio file, create randomized 'grains' using granular synthesis techniques, place them into an output buffer and then be able to play that back to the user using OpenAL. For testing purposes, I am simply writing the output buffer to a file that I can then listen back to.

Judging by my results, I am on the right track but am getting some aliasing issues and playback sounds that just don't seem quite right. There is usually a rather loud pop in the middle of the output file and volume levels are VERY loud at times.

Here are the steps that I have taken to get the results I need, but I'm a little bit confused about a couple of things, namely formats that I am specifying for my AudioStreamBasicDescription.

  1. Read in an audio file from my mainBundle, which is a mono file in .aiff format:

    ExtAudioFileRef extAudioFile;
    CheckError(ExtAudioFileOpenURL(loopFileURL,
                               &extAudioFile),
           "couldn't open extaudiofile for reading");
    memset(&player->dataFormat, 0, sizeof(player->dataFormat));
    
    player->dataFormat.mFormatID = kAudioFormatLinearPCM;
    player->dataFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
    player->dataFormat.mSampleRate = S_RATE;
    player->dataFormat.mChannelsPerFrame = 1;
    player->dataFormat.mFramesPerPacket = 1;
    player->dataFormat.mBitsPerChannel = 16;
    player->dataFormat.mBytesPerFrame = 2;
    player->dataFormat.mBytesPerPacket = 2;
    
    // tell extaudiofile about our format
    CheckError(ExtAudioFileSetProperty(extAudioFile,
                                   kExtAudioFileProperty_ClientDataFormat,
                                   sizeof(AudioStreamBasicDescription),
                                   &player->dataFormat),
           "couldnt set client format on extaudiofile");
    
    SInt64 fileLengthFrames;
    UInt32 propSize = sizeof(fileLengthFrames);
    ExtAudioFileGetProperty(extAudioFile,
                        kExtAudioFileProperty_FileLengthFrames,
                        &propSize,
                        &fileLengthFrames);
    
    player->bufferSizeBytes = fileLengthFrames * player->dataFormat.mBytesPerFrame;
    
  2. Next I declare my AudioBufferList and set some more properties

    AudioBufferList *buffers;
    UInt32 ablSize = offsetof(AudioBufferList, mBuffers[0]) + (sizeof(AudioBuffer) * 1);
    buffers = (AudioBufferList *)malloc(ablSize);
    
    player->sampleBuffer = (SInt16 *)malloc(sizeof(SInt16) * player->bufferSizeBytes);
    
    buffers->mNumberBuffers = 1;
    buffers->mBuffers[0].mNumberChannels = 1;
    buffers->mBuffers[0].mDataByteSize = player->bufferSizeBytes;
    buffers->mBuffers[0].mData = player->sampleBuffer;
    
  3. My understanding is that .mData will be whatever was specified in the formatFlags (in this case, type SInt16). Since it is of type (void *), I want to convert this to float data which is obvious for audio manipulation. Before I set up a for loop which just iterated through the buffer and cast each sample to a float*. This seemed unnecessary so now I pass in my .mData buffer to a function I created which then granularizes the audio:

        float *theOutBuffer = [self granularizeWithData:(float *)buffers->mBuffers[0].mData with:framesRead];
    
  4. In this function, I dynamically allocate some buffers, create random size grains, place them in my out buffer after windowing them using a hamming window and return that buffer (which is float data). Everything is cool up to this point.

  5. Next I set up all my output file ASBD and such:

    AudioStreamBasicDescription outputFileFormat;
    
    bzero(audioFormatPtr, sizeof(AudioStreamBasicDescription));
    
    outputFileFormat->mFormatID = kAudioFormatLinearPCM;
    outputFileFormat->mSampleRate = 44100.0;
    outputFileFormat->mChannelsPerFrame = numChannels;
    outputFileFormat->mBytesPerPacket = 2 * numChannels;
    outputFileFormat->mFramesPerPacket = 1;
    outputFileFormat->mBytesPerFrame = 2 * numChannels;
    outputFileFormat->mBitsPerChannel = 16;
    outputFileFormat->mFormatFlags = kAudioFormatFlagIsFloat | kAudioFormatFlagIsPacked;
    
    UInt32 flags = kAudioFileFlags_EraseFile;
    ExtAudioFileRef outputAudioFileRef = NULL;
    NSString *tmpDir = NSTemporaryDirectory();
    NSString *outFilename = @"Decomp.caf";
    NSString *outPath = [tmpDir stringByAppendingPathComponent:outFilename];
    NSURL *outURL = [NSURL fileURLWithPath:outPath];
    
    
    AudioBufferList *outBuff;
    UInt32 abSize = offsetof(AudioBufferList, mBuffers[0]) + (sizeof(AudioBuffer) * 1);
    outBuff = (AudioBufferList *)malloc(abSize);
    
    outBuff->mNumberBuffers = 1;
    outBuff->mBuffers[0].mNumberChannels = 1;
    outBuff->mBuffers[0].mDataByteSize = abSize;
    outBuff->mBuffers[0].mData = theOutBuffer;
    
    CheckError(ExtAudioFileCreateWithURL((__bridge CFURLRef)outURL,
                                     kAudioFileCAFType,
                                     &outputFileFormat,
                                     NULL,
                                     flags,
                                     &outputAudioFileRef),
           "ErrorCreatingURL_For_EXTAUDIOFILE");
    
    CheckError(ExtAudioFileSetProperty(outputAudioFileRef,
                                   kExtAudioFileProperty_ClientDataFormat,
                                   sizeof(outputFileFormat),
                                   &outputFileFormat),
           "ErrorSettingProperty_For_EXTAUDIOFILE");
    
    CheckError(ExtAudioFileWrite(outputAudioFileRef,
                             framesRead,
                             outBuff),
           "ErrorWritingFile");
    

The file is written correctly, in CAF format. My question is this: am I handling the .mData buffer correctly in that I am casting the samples to float data, manipulating (granulating) various window sizes and then writing it to a file using ExtAudioFileWrite (in CAF format)? Is there a more elegant way to do this such as declaring my ASBD formatFlag as kAudioFlagIsFloat? My output CAF file has some clicks in it and when I open it in Logic, it looks like there is a lot of aliasing. This makes sense if I am trying to send it float data but there is some kind of conversion happening which I am unaware of.

Thanks in advance for any advice on the matter! I have been an avid reader of pretty much all the source material online, including the Core Audio Book, various blogs, tutorials, etc. The ultimate goal of my app is to play the granularized audio in real time to a user with headphones so the writing to file thing is just being used for testing at the moment. Thanks!

1
I noticed your tweet, so I think you got this sorted out already, but I'd recommend to use Novocaine in combination with NVDSP if you need effects. It's a whole lot easier than trying to learn CoreAudio/AudioUnits, just skip that and start creating! ;)sougonde
Thanks, @bartolsthoorn for the reply. Great job implementing that!manderson

1 Answers

2
votes

What you say about step 3 suggests to me you are interpreting an array of shorts as an array of floats? If that is so, we found the reason for your trouble. Can you assign the short values one by one into an array of floats? That should fix it.

It looks like mData is a void * pointing to an array of shorts. Casting this pointer to a float * doesn't change the underlying data into float but your audio processing function will treat them as if they were. However, float and short values are stored in totally different ways, so the math you do in that function will operate on very different values that have nothing to do with your true input signal. To investigate this experimentally, try the following:

short data[4] = {-27158, 16825, 23024, 15};
void *pData = data;

The void pointer doesn't indicate what kind of data it points to, so erroneously, one can falsely assume it points to float values. Note that a short is 2 byte wide, but a float is 4 byte wide. It is a coincidence that your code did not crash with an access violation. Interpreted as float the array above is only long enough for two values. Let's just look at the first value:

float *pfData = (float *)pData;
printf("%d == %f\n", data[0], pfData[0]);

The output of this will be -27158 == 23.198200 illustrating how instead of the expected -27158.0f you obtain roughly 23.2f. Two problematic things happened. First, sizeof(float) is not sizeof(short). Second, the "ones and zeros" of a floating point number are stored very differently than an integer. See http://en.wikipedia.org/wiki/Single_precision_floating-point_format.

How to solve the problem? There are at least two simple solutions. First, you could convert each element of the array before you feed it into your audio processor:

int k;
float *pfBuf = (float *)malloc(n_data * sizeof(float));
short *psiBuf = (short *)buffers->mBuffers[0].mData[k];
for (k = 0; k < n_data; k ++)
{
    pfBuf[k] = psiBuf[k];
}
[self granularizeWithData:pfBuf with:framesRead];
for (k = 0; k < n_data; k ++)
{
    psiBuf[k] = pfBuf[k];
}
free(pfBuf);

You see that most likely you will have to convert everything back to short after your call to granularizeWithData: with:. So the second solution would be to do all processing in short although from what you write, I imagine you would not like that latter approach.