I'm having trouble understanding a concept my teacher taught me and getting pretty inconsistent answers between classmates. I was wondering if anyone could help clarify this idea or concept?
It's known as num chunking in audio processing. We have to deal with WAV files only. So we assume our audio data is uncompressed.
The numChunking has to do with Fourier Transform on an audio sine wave.
To determine the # of numChunks we do something like
method1:
int numChunks = totalNumOfSamples/chunkSize
but few others I've spoken with have also said
method2:
int numChunks = totalNumOfSamples/binSize
The difference is that chunkSize is just some number specified or hardcoded in like 1024 or 2000 or something. binSize on the other hand is the number of frequencies we draw after applying Fourier Transform to our samples. The number of bins(bars/frequency bars on our panel) we generally draw is about 50-100 to display on screen any more and it'll just take forever.
In method 1 we get a number of numchunks from for example like 47988 samples / 2000 = 23 numChunks. then we send these chunks in to a for loop and add each sample of the sound data into each chunk, so if we put all the 23 numChunks together we have pretty much our entire sound data with some loss of precision due to being unable to accommodate every sample from the division. We then add them into an array or ArrayList to be later sent to Discrete Fourier Transform(Forward Fourier function) and we get our results and place them in our bins and plot/draw the results as bars.
One last detail that I'm unsure of is if the fourier transform divides by TOTAL samples in the entire sound or just the TOTAL # of samples in a numChunk.
In method 2 the way this one works is numChunks = total samples / binSize for example we'll use the same example. 47988/ 30bins = 1599 numChunks. In this idea my classmate explained to me that numChunks is a number of subArrays. So I made a 2D array, 1599 arrays each array with a length of a binSize so when we process each subArray through forward fourier transform we get our resulting amplitude value or frequency We then run each subArray through forward fourier dividing the value by the TOTAL sample size of the entire sound.
the results from both ideas are erratically different. Method 1 yields about 37.5 for the first value and the method 2 yields 3689 as a value. Their following values all seem generically correct so I'm not sure which method is correct or if any is correct at all.
This is being coded in Java
This question is quite confusing as you can tell I'm very bewildered myself. I hope someone can help clarify which is correct or incorrect or explain the concept.