3
votes

I have an university assignement which consists in displaying the waveform of an audio file using C++/Qt. We should be able to modify the scale that we use to display it (expressed in audio samples per screen pixel).

So far, I am able to:

  • open the audio file
  • read the samples
  • plot the samples at a given scale

To plot the samples at a given scale, I have tried two strategies. Let assume that N is the value of the scale:

  • for i going from 0 to the width of my window, plot the i * Nth audio sample at the screen pixel i. This is very fast and constant in time because we always access the same amount of audio data points.
    However, it does not represent the waveform correctly, as we use the value of only 1 point to represent N points.

  • for i going from 0 to N * width, plot the ith audio sample at the screen position i / (N * width) and let Qt figure out how to represent that correctly on physical screen pixels.
    That plots very beautiful waveforms but it takes hell a lot of time to access data. For instance, if I want to display 500 samples per pixel and the width of my window is 100px, I have to access 50 000 points, which are then plotted by Qt as 100 physical points (pixels).

So, how can I get a correct plot of my audio data, which can be calculated fast? Should I calculate the average of N samples for each physical pixel? Should I do some curve fitting?
In other words, what kind of operation is involved when Qt/Matplotlib/Matlab/etc plot thousands of data point to a very limited amount of physical pixels?

2
Subsampling is the normal way to reduce the frequency of an audio signal by a large amount. (By a small amount is more difficult because of aliasing). Averaging out a window or a sliding window changes the data.Malcolm McLean
This is very similar to a question I asked. You might find the discussion there interesting: stackoverflow.com/questions/37554058/…Andrew Bainbridge

2 Answers

6
votes

Just because I do know how to do it and I already asked something similar on stackoverflow I'll reference this. I'll provide code later.

Drawing Waveforms is a real problem. I tried to figure this out for more than a half of a year! To sum this up:

According to the Audacity Documentation:

enter image description here

The waveform view uses two shades of blue, one darker and one lighter.

  • The dark blue part of the waveform displays the tallest peak in the area that pixel represents. At default zoom level Audacity will display many samples within that pixel width, so this pixel represents the value of the loudest sample in the group.
  • The light blue part of the waveform displays the average RMS (Root Mean Square) value for the same group of samples. This is a rough guide to how loud this area might sound, but there is no way to extract or use this RMS part of the waveform separately.

So you simply try to get the important information out of a chunk of data. If you do this over and over you'll have multiple stages which can be used for drawing.

I'll provide some code here, please bear with me it's in development:

template<typename T>
class CacheHandler {
public:

    std::vector<T> data;
    vector2d<T> min, max, rms;

    CacheHandler(std::vector<T>& data) throw(std::exception);

    void addData(std::vector<T>& samples);

    /*
    irreversible removes data.
    Fails if end index is greater than data length
    */
    void removeData(int endIndex);

    void removeData(int startIndex, int endIndex);
};

using this:

template<typename T>
inline WaveformPane::CacheHandler<T>::CacheHandler(std::vector<T>& data, int sampleSizeInBits) throw(std::exception)
{
    this->data = data;
    this->sampleSizeInBits = sampleSizeInBits;
    int N = log(data.size()) / log(2);
    rms.resize(N); min.resize(N); max.resize(N);
    rms[0] = calcRMSSegments(data, 2);
    min[0] = getMinPitchSegments(data, 2);
    max[0] = getMaxPitchSegments(data, 2);
    for (int i = 1; i < N; i++) {
        rms[i] = calcRMSSegments(rms[i - 1], 2);
        min[i] = getMinPitchSegments(min[i - 1], 2);
        max[i] = getMaxPitchSegments(max[i - 1], 2);
    }
}
3
votes

What I'd suggest is something like this:

Given totalNumSamples audio samples in your audio file, and widgetWidth pixels of width in your display widget, you can calculate which samples are to be represented by each pixel:

// Given an x value (in pixels), returns the appropriate corresponding
// offset into the audio-samples array that represents the
// first sample that should be included in that pixel.
int GetFirstSampleIndexForPixel(int x, int widgetWidth, int totalNumSamples)
{
   return (totalNumSamples*x)/widgetWidth;
}

virtual void paintEvent(QPaintEvent * e)
{
   QPainter p(this);
   for (int x=0; x<widgetWidth; x++)
   {
      const int firstSampleIndexForPixel = GetFirstSampleIndexForPixel(x, widgetWidth, totalNumSamples);
      const int lastSampleIndexForPixel = GetFirstSampleIndexForPixel(x+1, widgetWidth, totalNumSamples)-1;
      const int largestSampleValueForPixel = GetMaximumSampleValueInRange(firstSampleIndexForPixel, lastSampleIndexForPixel);
      const int smallestSampleValueForPixel = GetMinimumSampleValueInRange(firstSampleIndexForPixel, lastSampleIndexForPixel);

      // draw a vertical line spanning all sample values that are contained in this pixel
      p.drawLine(x, GetYValueForSampleValue(largestSampleValueForPixel), x, GetYValueForSampleValue(smallestSampleValueForPixel));
   }
}

Note that I didn't include source code for GetMinimumSampleValueInRange(), GetMaximumSampleValueInRange(), or GetYValueForSampleValue(), since hopefully what they do is obvious from their names, but if not, let me know and I can explain them.

Once you have the above working reasonably well (i.e. drawing a waveform that shows the entire file into your widget), you can start working on adding in zoom-and-pan functionality. Horizontal zoom can be implemented by modifying the behavior of GetFirstSampleIndexForPixel(), e.g.:

int GetFirstSampleIndexForPixel(int x, int widgetWidth, int sampleIndexAtLeftEdgeOfWidget, int sampleIndexAfterRightEdgeOfWidget)
{
   int numSamplesToDisplay = sampleIndexAfterRightEdgeOfWidget-sampleIndexAtLeftEdgeOfWidget;
   return sampleIndexAtLeftEdgeOfWidget+((numSamplesToDisplay*x)/widgetWidth);
}

With that, you can zoom/pan simply by passing in different values for sampleIndexAtLeftEdgeOfWidget and sampleIndexAfterRightEdgeOfWidget that together indicate the subrange of the file you want to display.