1
votes

I'm having trouble understanding a concept my teacher taught me and getting pretty inconsistent answers between classmates. I was wondering if anyone could help clarify this idea or concept?

It's known as num chunking in audio processing. We have to deal with WAV files only. So we assume our audio data is uncompressed.

The numChunking has to do with Fourier Transform on an audio sine wave.

To determine the # of numChunks we do something like

method1:

int numChunks = totalNumOfSamples/chunkSize

but few others I've spoken with have also said

method2:

int numChunks = totalNumOfSamples/binSize

The difference is that chunkSize is just some number specified or hardcoded in like 1024 or 2000 or something. binSize on the other hand is the number of frequencies we draw after applying Fourier Transform to our samples. The number of bins(bars/frequency bars on our panel) we generally draw is about 50-100 to display on screen any more and it'll just take forever.

In method 1 we get a number of numchunks from for example like 47988 samples / 2000 = 23 numChunks. then we send these chunks in to a for loop and add each sample of the sound data into each chunk, so if we put all the 23 numChunks together we have pretty much our entire sound data with some loss of precision due to being unable to accommodate every sample from the division. We then add them into an array or ArrayList to be later sent to Discrete Fourier Transform(Forward Fourier function) and we get our results and place them in our bins and plot/draw the results as bars.

One last detail that I'm unsure of is if the fourier transform divides by TOTAL samples in the entire sound or just the TOTAL # of samples in a numChunk.

In method 2 the way this one works is numChunks = total samples / binSize for example we'll use the same example. 47988/ 30bins = 1599 numChunks. In this idea my classmate explained to me that numChunks is a number of subArrays. So I made a 2D array, 1599 arrays each array with a length of a binSize so when we process each subArray through forward fourier transform we get our resulting amplitude value or frequency We then run each subArray through forward fourier dividing the value by the TOTAL sample size of the entire sound.

the results from both ideas are erratically different. Method 1 yields about 37.5 for the first value and the method 2 yields 3689 as a value. Their following values all seem generically correct so I'm not sure which method is correct or if any is correct at all.

This is being coded in Java

This question is quite confusing as you can tell I'm very bewildered myself. I hope someone can help clarify which is correct or incorrect or explain the concept.

1

1 Answers

1
votes

Wow, you are a bit confused. I'll do my best to try and help explain a little.

What you're doing with a DFT (Discrete Fourier Transform) is taking a number of samples N and converting them from the time domain to the frequency domain. The size of the array you get out in the frequency domain is the same size as the array you put in in the time domain. So, you can set your numChunks to be whatever you want depending on what frequency resolution you want in the output arrays because the frequency resolution is going to be sampleRate/numChunks. So, if you want, say 80 frequency values out, use a chunk size of 80. Divide your sound file into totalNumSamples/80 chunks and you'll get the frequency content of each chunk in turn.

To answer your other question, the total number of samples in the entire sound doesn't matter - each DFT you do is entirely independent so all that matter is the number of samples you've put into each DFT.

I hope that helps a little.