
I'm working on a source filter which feeds video/audio captured by our software through a DirectShow graph. I got the video working relatively painlessly, but I am now trying to add an audio output pin is proving to be quite a challenge. The specific question I have is: Does a audio renderer modify the actual reference clock as it is playing sound?

I'm seeing very jerky video playback. Attached below is a chunk of a log file, and it looks like once in a while the reference clock just "stops" while the system time keeps ticking. Does that make sense?

One thing I should mention is that audio samples are u-Law 8 kHz 8-bit and each packet is exactly 120 ms. Here's the complication: When we receive audio data from the network, it doesn't come with time information, so our software assigns a sample timestamp at the time that the packet was received. Video samples get stamped by the original source, so they are accurate. If I ignore audio sample times and simply assign sample timestamps 120 ms apart, video will play smoothly. The problem is that I'm still not fully understanding the complete relationship between the reference clock and audio/video renderers and what really puzzles me is that we have another similar source filter which plays the same data without jerking video (it doesn't have logging, and I didn't get a chance to add any to see if reference clock is also modified in that case).

This is that piece of the log:

Sys Clock    (delta)  StreamTime  (delta)   Drift between clocks:
15:54:40.755 (0.005)  1.838       (0.005)   0.000
15:54:40.761 (0.006)  1.844       (0.006)   0.000
15:54:40.889 (0.128)  1.972       (0.128)   0.000
15:54:40.894 (0.005)  1.977       (0.005)   0.000
15:54:40.899 (0.005)  1.982       (0.005)   0.000
15:54:40.903 (0.004)  1.986       (0.004)   0.000
15:54:40.931 (0.028)  2.014       (0.028)   0.000
15:54:40.936 (0.005)  2.019       (0.005)   0.000
15:54:41.019 (0.083)  2.080       (0.061)   0.022
15:54:41.175 (0.156)  2.080       (0.000)   0.178
15:54:41.181 (0.006)  2.080       (0.000)   0.184
15:54:41.190 (0.009)  2.080       (0.000)   0.193
15:54:41.197 (0.007)  2.080       (0.000)   0.200
15:54:41.202 (0.005)  2.080       (0.000)   0.205
15:54:41.210 (0.008)  2.080       (0.000)   0.213
15:54:41.216 (0.006)  2.080       (0.000)   0.219
15:54:41.220 (0.004)  2.080       (0.000)   0.223
15:54:41.313 (0.093)  2.080       (0.000)   0.316
15:54:41.317 (0.004)  2.080       (0.000)   0.320
15:54:41.408 (0.091)  2.116       (0.036)   0.375
15:54:41.412 (0.004)  2.120       (0.004)   0.375
15:54:41.432 (0.020)  2.140       (0.020)   0.375
15:54:41.436 (0.004)  2.144       (0.004)   0.375
15:54:41.439 (0.003)  2.147       (0.003)   0.375

2 Answers


When a sound card is in the graph it is usually selected as the reference clock. Other filters, including the video renderer, use it to determine when to show their samples. Using the system clock in parallel is not a good idea; you should use the same reference clock to be in sync.

If you know the real length of your audio samples, and you're sure you don't lose any of them (for example, you use TCP, not UDP) then just assigning sequential 120 ms time intervals is a good solution. Taking timestamps from the system clock when a sample arrives from network is a bad idea because it will introduce random time shifts caused by the network behavior - you never really know how long will it take for a network packet to come.

If you have two filters and want to see how their timing is different you can install GraphEditPlus, insert a sample grabber before/after your filters, right click it and select "watch grabbed samples". It will show all the timestamps and other information. Also, you can right click the graph window and choose "see event log". It can also help.


To understand which clock in a graph is being used as the reference clock and to see the drift of this clock relative the local CPU clock (via QueryPerformanceCounter), check out the DirectShow filter ShowClk.ax.