We're using GCE and getting a very odd error:
"Error loading health status"
We see this error on all of our HTTP load balancers. The health check failure doesn't seem to be having any impact on the actual service. In other words, the server itself works just fine, but the health check is "red."
We use DM to setup our environment, so what we have running now is exactly the same as it's always been. Up until today we have been using the beta API's for everything. Our theory was that if we moved everything over to using "v1" we would resolve this issue. What we learned is that even with all "v1" bits, we still see the same error.
tcpdump -vvvs 1500 -l -A port 80
20:01:21.249194 IP (tos 0x80, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 146) 22.214.171.124.42560 > master-game-100-tbmatch-us-central1-a-xxxx.c.radiant-cloud.internal.http: Flags [P.], cksum 0xe9bd (correct), seq 1:95, ack 1, win 222, options [nop,nop,TS val 92585061 ecr 407465], length 94 E.....@.@....... ....@.P.Giw............... ...e..7.GET /healthz HTTP/1.1 Host: 10.240.0.30 User-Agent: GoogleHC/1.0 Connection: Keep-alive
We see the health check request coming in...
20:01:21.250109 IP (tos 0x0, ttl 63, id 15146, offset 0, flags [DF], proto TCP (6), length 155) master-game-100-tbmatch-us-central1-a-0buf.c.radiant-cloud.internal.http > 126.96.36.199.42560: Flags [P.], cksum 0x905d (incorrect -> 0xb6e0), seq 1:104, ack 95, win 220, options [nop,nop,TS val 407465 ecr 92585061], length 103 E...;*@.?.pc ........P.@.....Gi......]..... ..7....eHTTP/1.1 200 OK Content-Length: 2 Content-Type: text/plain Date: Mon, 11 Jan 2016 20:01:21 GMT ok
And we see our service responding appropriately like it always has in the past. Is anyone else seeing this problem?