0
votes

We are using Varnish cache 6.2 to sit in front of our WebAPI backend. The backend is sending a cache-control header back on certain requests, for things that we can cache for a bit longer.

However - should the backend go down, and stay down, we send stale-while-revalidate of an hour.

So a typical cache-control response header from our backend looks like:

public, max-age=30, stale-while-revalidate=3600

In our Varnish VCL we have added a routine that stops background fetch on certain errors. This is to stop the bad response from the backend from entering the cache:

sub vcl_backend_response {
    if (beresp.status == 500 || beresp.status == 502 || beresp.status == 503 || beresp.status == 504)
    {
        if (bereq.is_bgfetch)
        {
            return (abandon);
        }

        set beresp.ttl = 1s;
    }
}

The problem we are facing is simple - Varnish does not update the item in the cache after Max-Age expires, even though the backend is available. (And changes have occurred to the response) We have seen issues where the responding "Age" header from Varnish exceeds 200s, with the wrong response. We have also seen cases where the "Age" header is 1-3s, which would indicate a background fetch (or normal fetch) has occurred.

This happens often enough that we notice it - but not on every request.

I have tried a simple "pass", such as the following in Varnish:

sub vcl_recv {
    return(pass);
}

However, this appeared to have no effect.

Could there be anything else with Varnish setup that could cause the situation above?

EDIT, as per comment, this is a small thing we add to each sub that interacts with our request, to see what actually happened:

sub vcl_deliver {
    if (obj.uncacheable) {
        set req.http.x-cache = req.http.x-cache + " uncacheable" ;
    } else {
        set req.http.x-cache = req.http.x-cache + " cached" ;
    }

    set resp.http.x-cache = req.http.x-cache;
}

sub vcl_hit {
    set req.http.x-cache = "hit";
}
1

1 Answers

4
votes

That's the expected behavior. Once the object if fetched from the backend side for the first time (i.e. t=0), Varnish caches it setting beresp.ttl to 30s and beresp.grace to 3600s. Then, if you request the object to Varnish when t=3000, the old object will be delivered to the client side (i.e. Age: 3000) and an asynchronous background fetch will be triggered in order to refresh the cached object. If you request again the object to Varnish when t=3001, if the background fetch already completed its job, a fresh object will be delivered (i.e. Age: 1). The following test illustrates this behavior:

varnishtest "..."

server s1 {
    rxreq
    txresp -hdr "Cache-Control: public, max-age=1, stale-while-revalidate=60" \
           -hdr "Version: 1"

    rxreq
    txresp -hdr "Cache-Control: public, max-age=1, stale-while-revalidate=60" \
           -hdr "Version: 2"
} -start

varnish v1 -vcl+backend {
} -start

client c1 {
    txreq
    rxresp
    expect resp.http.Version == 1
    expect resp.http.Age == 0

    delay 5.0

    txreq
    rxresp
    expect resp.http.Version == 1
    expect resp.http.Age == 5

    delay 0.1

    txreq
    rxresp
    expect resp.http.Version == 2
    expect resp.http.Age == 0
} -run

varnish v1 -expect client_req == 3

In order to refresh the object synchronously once the item in the cache consumes its TTL, you need to play with req.grace during vcl_recv. You probably want to set it to 0s if the backend is healthy. Please, check https://varnish-cache.org/docs/trunk/users-guide/vcl-grace.html#misbehaving-servers for details.