BGP: bird2 (OpenWrt) and K8s MetalLB
Previously, I had been using the Layer2 ip advertisement of metallb for my cluster. This worked out of the box, but over time some cracks appeared. Some devices, epecially IoT and other smart devices, did not play very nicely with this. They would periodically be unable to route to loadbalancer ips. This create random drop outs of connectivity which really killed the experiance. Resolving this would involve dropping arp caches and/or bouncing the metallb speakers. Limped along with this for a while before putting together a path forward with BGP.
This may have been some sort of issue in the other components of the network like my routers setup or intrinsic to how the ip advertisemnt works. I’m inlined to believe the latter since it seemed to effect some devices more than others.
A minimal /etc/bird.conf that should get it running and see some activity when trying to connect but wont actually modify routing table of the router just yet.
log syslog all;
debug protocols all;
router id 192.168.1.1;
protocol kernel kernel4 {
ipv4 {
export none;
};
}
protocol kernel kernel6 {
ipv6 {
export none;
};
}
Some log messages with the above configuration
daemon.debug bird: kernel6: 2600:4040:5786:ee00::/56: [alien] ignored
daemon.err bird: KRT: Received route fd5c:6784:b290::/64 with unknown ifindex 12
daemon.debug bird: kernel6: fd5c:6784:b290::/48: [alien] ignored
daemon.debug bird: kernel6: Pruning table master6
daemon.debug bird: kernel4: Scanning routing table
daemon.err bird: KRT: Received route 0.0.0.0/0 with unknown ifindex 7
daemon.err bird: KRT: Received route 72.85.191.0/24 with unknown ifindex 7
daemon.err bird: KRT: Received route 172.17.0.0/16 with unknown ifindex 16
daemon.err bird: KRT: Received route 192.168.1.0/24 with unknown ifindex 12
daemon.debug bird: kernel4: Pruning table master4
For documentation purposes will be using the “documentation” ASNs (64496 - 64511, 16bit). In reality the ASNs used are in the “private” AS space (64512-65534, 16bit). I don’t know and didn’t test if metallb/bird2 have any problems with the 32bit ASN space, using 16bit for sake of simplicity.
resource "kubernetes_manifest" "bgp_ip_address_pool" {
manifest = {
apiVersion = "metallb.io/v1beta1"
kind = "IPAddressPool"
metadata = {
name = "bgp-pool"
namespace = "metallb"
}
spec = {
addresses = [
"192.168.2.0/24"
]
# not sure if i'll be affected by this but will avoid
# for now while just trying to get this to work
avoidBuggyIPs = true
}
}
}
resource "kubernetes_manifest" "bgp_peer" {
manifest = {
apiVersion = "metallb.io/v1beta1"
kind = "BGPPeer"
metadata = {
name = "bgp-router-peer"
namespace = "metallb"
}
spec = {
peerAddress = "192.168.1.1"
peerASN = 64496
myASN = 64497
}
}
}
resource "kubernetes_manifest" "bgp_advertisemnt" {
manifest = {
apiVersion = "metallb.io/v1beta1"
kind = "BGPAdvertisement"
metadata = {
name = "bgp-pool-advertisement"
namespace = "metallb"
}
spec = {
ipAddressPools = [
kubernetes_manifest.bgp_ip_address_pool.manifest.metadata.name
]
peers = [
kubernetes_manifest.bgp_peer.manifest.metadata.name
]
}
}
}
$ kubectl exec -it -n metallb daemonset.apps/metallb-speaker -c frr -- vtysh -c "show ip bgp neighbor"
BGP neighbor is 192.168.1.1, remote AS 64496, local AS 64497, external link
Local Role: undefined
Remote Role: undefined
BGP version 4, remote router ID 0.0.0.0, local router ID 192.168.2.1
BGP state = Active
Last read 23:55:00, Last write never
Hold time is 0 seconds, keepalive interval is 0 seconds
Configured hold time is 0 seconds, keepalive interval is 0 seconds
Configured tcp-mss is 0, synced tcp-mss is 0
Configured conditional advertisements interval is 60 seconds
Graceful restart information:
Local GR Mode: Helper*
Remote GR Mode: NotApplicable
R bit: False
N bit: False
Timers:
Configured Restart Time(sec): 120
Received Restart Time(sec): 0
Configured LLGR Stale Path Time(sec): 0
Message statistics:
Inq depth is 0
Outq depth is 0
Sent Rcvd
Opens: 0 0
Notifications: 0 0
Updates: 0 0
Keepalives: 0 0
Route Refresh: 0 0
Capability: 0 0
Total: 0 0
Minimum time between advertisement runs is 0 seconds
For address family: IPv4 Unicast
Not part of any update group
Community attribute sent to this neighbor(all)
Inbound path policy configured
Outbound path policy configured
Route map for incoming advertisements is *192.168.1.1-in
Route map for outgoing advertisements is *192.168.1.1-out
0 accepted prefixes
For address family: IPv6 Unicast
Not part of any update group
Community attribute sent to this neighbor(all)
Inbound path policy configured
Outbound path policy configured
Route map for incoming advertisements is *192.168.1.1-in
Route map for outgoing advertisements is *192.168.1.1-out
0 accepted prefixes
Connections established 0; dropped 0
Last reset 23:55:00, Waiting for peer OPEN (n/a)
External BGP neighbor may be up to 1 hops away.
BGP Connect Retry Timer in Seconds: 120
Next connect timer due in 65 seconds
Read thread: off Write thread: off FD used: -1
$ kubectl exec -it -n metallb daemonset.apps/metallb-speaker -c frr -- vtysh -c "show running"
Building configuration...
Current configuration:
!
frr version 9.1_git
frr defaults traditional
hostname ...lan
log file /etc/frr/frr.log informational
log timestamp precision 3
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 64497
no bgp ebgp-requires-policy
no bgp default ipv4-unicast
bgp graceful-restart preserve-fw-state
no bgp network import-check
neighbor 192.168.1.1 remote-as 64496
neighbor 192.168.1.1 timers 0 0
!
address-family ipv4 unicast
network 192.168.2.1/32
neighbor 192.168.1.1 activate
neighbor 192.168.1.1 route-map 192.168.1.1-in in
neighbor 192.168.1.1 route-map 192.168.1.1-out out
exit-address-family
!
address-family ipv6 unicast
neighbor 192.168.1.1 activate
neighbor 192.168.1.1 route-map 192.168.1.1-in in
neighbor 192.168.1.1 route-map 192.168.1.1-out out
exit-address-family
exit
!
ip prefix-list 192.168.1.1-pl-ipv4 seq 1 permit 192.168.2.1/32
!
ipv6 prefix-list 192.168.1.1-pl-ipv4 seq 2 deny any
!
route-map 192.168.1.1-in deny 20
exit
!
route-map 192.168.1.1-out permit 1
match ip address prefix-list 192.168.1.1-pl-ipv4
exit
!
route-map 192.168.1.1-out permit 2
match ipv6 address prefix-list 192.168.1.1-pl-ipv4
exit
!
end
$ kubectl exec -it -n metallb daemonset.apps/metallb-speaker -c frr -- vtysh -c "show bgp summary"
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.2.1, local AS number 64497 vrf-id 0
BGP table version 5
RIB entries 1, using 96 bytes of memory
Peers 1, using 13 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.1.1 4 64496 0 0 0 0 0 never Active 0 N/A
Total number of neighbors 1
IPv6 Unicast Summary (VRF default):
BGP router identifier 192.168.2.1, local AS number 64497 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 13 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.1.1 4 64496 0 0 0 0 0 never Active 0 N/A
Total number of neighbors 1
$ kubectl exec -it -n metallb daemonset.apps/metallb-speaker -c frr -- vtysh -c "show bfd peers"
BFD Peers:
$ kubectl exec -it -n metallb daemonset.apps/metallb-speaker -c frr -- vtysh -c "show ip route"
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 0.0.0.0/0 [0/100] via 192.168.1.1, enp1s0, src 192.168.1.40, 1d11h20m
C>* 10.2.7.13/32 is directly connected, cilium_host, 1d11h20m
C>* 192.168.1.0/24 is directly connected, enp1s0, 1d11h20m
Looks sort of setup but something missing. Likely the block in bird that allows the AS to be accepted
# without this bird doesnt listen on anything
protocol device {
}
Each node that will connect to the router via bgp has to have an entry in /etc/bird.conf. So will use a template to capture the pattern and then each node just defines the neighbor entry unique to it.
Only nodes running the metallb speaker need to be here, but this is by default all nodes.
template bgp metallb {
local as 64496;
ipv4 {
import all;
export none;
};
}
protocol bgp node1 from metallb {
neighbor 192.168.1.32 as 64497;
}
With this inplace the commands from earlier look a bit more lively.
$ kubectl exec -it -n metallb daemonset.apps/metallb-speaker -c frr -- vtysh -c "show ip bgp neighbor"
BGP neighbor is 192.168.1.1, remote AS 64496, local AS 64497, external link
Local Role: undefined
Remote Role: undefined
BGP version 4, remote router ID 192.168.1.1, local router ID 192.168.2.1
BGP state = Established, up for 00:04:29
Last read 00:04:29, Last write 00:04:28
Hold time is 0 seconds, keepalive interval is 0 seconds
Configured hold time is 0 seconds, keepalive interval is 0 seconds
Configured tcp-mss is 0, synced tcp-mss is 1448
Configured conditional advertisements interval is 60 seconds
Neighbor capabilities:
4 Byte AS: advertised and received
Extended Message: advertised
AddPath:
IPv4 Unicast: RX advertised
IPv6 Unicast: RX advertised
Long-lived Graceful Restart: advertised and received
Address families by peer:
Route refresh: advertised and received
Enhanced Route Refresh: advertised and received
Address Family IPv4 Unicast: advertised and received
Address Family IPv6 Unicast: advertised
Hostname Capability: advertised (name: ....lan,domain name: n/a) not received
Version Capability: not advertised not received
Graceful Restart Capability: advertised and received
Remote Restart timer is 120 seconds
Address families by peer:
none
Graceful restart information:
End-of-RIB send: IPv4 Unicast
End-of-RIB received: IPv4 Unicast
Local GR Mode: Helper*
Remote GR Mode: Helper
R bit: False
N bit: False
Timers:
Configured Restart Time(sec): 120
Received Restart Time(sec): 120
Configured LLGR Stale Path Time(sec): 0
IPv4 Unicast:
F bit: False
End-of-RIB sent: Yes
End-of-RIB sent after update: Yes
End-of-RIB received: Yes
Timers:
Configured Stale Path Time(sec): 360
LLGR Stale Path Time(sec): 0
IPv6 Unicast:
F bit: False
End-of-RIB sent: No
End-of-RIB sent after update: No
End-of-RIB received: No
Timers:
Configured Stale Path Time(sec): 360
LLGR Stale Path Time(sec): 0
Message statistics:
Inq depth is 0
Outq depth is 0
Sent Rcvd
Opens: 6 1
Notifications: 0 0
Updates: 2 1
Keepalives: 1 1
Route Refresh: 0 0
Capability: 0 0
Total: 9 3
Minimum time between advertisement runs is 0 seconds
For address family: IPv4 Unicast
Update group 1, subgroup 1
Packet Queue length 0
Community attribute sent to this neighbor(all)
Inbound path policy configured
Outbound path policy configured
Route map for incoming advertisements is *192.168.1.1-in
Route map for outgoing advertisements is *192.168.1.1-out
0 accepted prefixes
For address family: IPv6 Unicast
Not part of any update group
Community attribute sent to this neighbor(all)
Inbound path policy configured
Outbound path policy configured
Route map for incoming advertisements is *192.168.1.1-in
Route map for outgoing advertisements is *192.168.1.1-out
0 accepted prefixes
Connections established 1; dropped 0
Last reset 00:24:34, Waiting for peer OPEN (n/a)
External BGP neighbor may be up to 1 hops away.
Local host: 192.168.1.37, Local port: 39172
Foreign host: 192.168.1.1, Foreign port: 179
Nexthop: 192.168.1.37
Nexthop global: fd5c:6784:b290::2a1
Nexthop local: fe80::b8b8:2e23:6184:9711
BGP connection: shared network
BGP Connect Retry Timer in Seconds: 120
Estimated round trip time: 0 ms
Read thread: on Write thread: on FD used: 21
$ IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.2.1, local AS number 64497 vrf-id 0
BGP table version 1
RIB entries 1, using 96 bytes of memory
Peers 1, using 13 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.1.1 4 64496 3 9 1 0 0 00:06:18 0 1 N/A
Total number of neighbors 1
IPv6 Unicast Summary (VRF default):
BGP router identifier 192.168.2.1, local AS number 64497 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 13 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
192.168.1.1 4 64496 3 9 0 0 0 00:06:18 NoNeg NoNeg N/A
Total number of neighbors 1
With all this in place time to let bird put routes in the kernels routing table.
protocol kernel kernel4 {
ipv4 {
export all;
};
}
And with that, it works!
Im sure there’s some ways to lock this down, but it does atleast require that peer nodes are whitelisted.
Switching LoadBalancer k8s services to use the bgp ip pool is straight forward. Adding the annotation
annotations:
metallb.io/address-pool: "bgp-pool"
will cause metallb to re-allocate the ip address for this loadbalancer.
Links
- https://dustinspecker.com/posts/kubernetes-networking-from-scratch-bgp-bird-advertise-pod-routes/
- https://www.redhat.com/en/blog/metallb-in-bgp-mode
- https://docs.okd.io/4.16/networking/metallb/about-advertising-ipaddresspool.html
- https://metallb.universe.tf/configuration/_advanced_ipaddresspool_configuration/
- https://en.wikipedia.org/wiki/Autonomous_system_(Internet)
- https://github.com/eloicaso/bird-openwrt
- https://skyenet.tech/bird/
- https://www.monotux.tech/posts/2025/07/bird-kubernetes/