We’ve used two following prometheus metrics to understand our current usage node_nf_conntrack_entries and node_nf_conntrack_entries_limit. We’ve started analysing available networking metrics and also stumbled across this article. HAProxy reported high connection reset errors that kept climbing up. Shortly after the change in service architecture, node pool sizing (we’ve halved RAM on the nodes running the aggregation service) and sharding of downstream service (doubled number of connections that need to be tracked) we’ve started noticing various connectivity problems with the service and elevated error rates.Ĭaused by: : xxxxx failed to respond This means that our aggregation proxy service needs to continuously track a large number of connections. This service continuously operates at 10'000+ requests per second and can spike up to x4 of its baseload. SetupĪt loveholidays we are running some high load services, one of them is an aggregation service which acts as a proxy between two other services (each consisting of 100+ instances). Our observation shows that for each MB of memory, we get roughly 5 conntrack_max. This is not what we’ve observed with GKE nodes. NAT relies on this information to translate all related packets in the same way, and iptables can use this information to act as a stateful firewall.Ĭonntrack on Linux systems is not unbound, in Kubernetes default value can be found in the node_nf_conntrack_entries_limit prometheus metrics (requires node_exporter) or via:ĬONNTRACK_MAX = RAMSIZE (in bytes ) / 16384 / (x / 32 ) where x is the number of bits in a pointer ( for example, 32 or 64 bits )Ībove calculation indicates that conntrack_max value is directly proportional to the node’s memory. It is essential for performant complex networking of Kubernetes where nodes need to track connection information between thousands of pods and services.Ĭonnection tracking allows the kernel to keep track of all logical network connections or sessions, and thereby relate all of the packets which may make up that connection. TheoryĬonntrack is a feature built on top of Netlifier framework. There is several options to deal with this issue. High load applications (especially on small nodes) can easily exceed conntrack_max and result in connection resets and timeouts. Kubernetes nodes set conntrack_max value proportionally to the size of the RAM on the node.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |