k8s(三)、kube-router: K8S BGP直通网络CNI方案

前言

上一篇讲到,k8s使用traefik代理集群内部服务,灵活地注入代理配置的方式,提供7层对外服务(参考:k8s(二)、对外服务)。在本篇,使用kube-router的cni,实现3层网络直通式的集群内外网络互通。

环境

环境测试使用GNS3-IOU思科模拟器+vmware虚拟机,搭建BGP网络环境。主机(client端)与 虚拟机/路由器 通过vmware桥接的方式互通,而后配置路由器与k8s节点内的kube-router cni,使两者建立起bgp邻居关系,宣告集群内的pod CIDR网段,BGP网络宣告完成后,外部client即可直接通过3层网络通信的方式访问集群内的pod,最终实现集群内部与外部的BGP网络互通。

# 客户端
client: 192.168.0.158/24 # 物理机# 中转端(router)
AS number: 64513
router1: 192.168.0.101/24  # gns3+vmware虚机环境
router2: 192.168.0.102/24 # gns3+vmware虚机环境
router vip: 192.168.0.100  # 两个router之间的vrrp虚拟ip# k8s集群端
AS number: 64512
node1: 192.168.0.170/24  # vmware虚机环境
node1: 192.168.0.171/24  # vmware虚机环境
pod CIDR: 10.244.0.0/16 
svc CIDR:  10.96.0.0/12

目标实验效果: 集群外的客户端访问集群内的pod和svc,数据层面通过如下流向:client ---> router --> k8s node --> pod ip/ svc ip,实现3层直通。

拓扑图:

这里写图片描述

路由器配置

####router1:
IOU1#show running-config interface Ethernet0/0ip address 192.168.0.101 255.255.255.0vrrp 1 ip 192.168.0.100router bgp 64513bgp log-neighbor-changesneighbor 192.168.0.102 remote-as 64513neighbor 192.168.0.170 remote-as 64512neighbor 192.168.0.171 remote-as 64512maximum-paths 2no auto-summary###router2:
IOU2#show running-config 
interface Ethernet0/0ip address 192.168.0.102 255.255.255.0vrrp 1 ip 192.168.0.100vrrp 1 priority 100router bgp 64513bgp log-neighbor-changesneighbor 192.168.0.101 remote-as 64513neighbor 192.168.0.170 remote-as 64512neighbor 192.168.0.171 remote-as 64512maximum-paths 2no auto-summary

kube-router配置:

k8s集群搭建步骤见k8s(一)、 1.9.0高可用集群本地离线部署记录
在此前配置好的kube-route的yaml配置文件中,需添加如下几项参数,分别指定本地bgp as,对等体as、ip等。

 - --advertise-cluster-ip=true   #宣告集群IP- --advertise-external-ip=true  #宣告svc外部ip,如果svc指定了external-ip则生效- --cluster-asn=64512           #指定本地集群bgp as号- --peer-router-ips=192.168.0.100 #指定对等体ip,这里可以写多个,以','隔开,本次实验路由器配置了vrrp,指定vip即可- --peer-router-asns=64513  #对等体as号

完整的kube-router.yaml文件如下:

apiVersion: v1
kind: ConfigMap
metadata:name: kube-router-cfgnamespace: kube-systemlabels:tier: nodek8s-app: kube-router
data:cni-conf.json: |{"name":"kubernetes","type":"bridge","bridge":"kube-bridge","isDefaultGateway":true,"ipam": {"type":"host-local"}}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:labels:k8s-app: kube-routertier: nodename: kube-routernamespace: kube-system
spec:template:metadata:labels:k8s-app: kube-routertier: nodeannotations:scheduler.alpha.kubernetes.io/critical-pod: ''spec:serviceAccountName: kube-routerserviceAccount: kube-routercontainers:- name: kube-routerimage: cloudnativelabs/kube-routerimagePullPolicy: Alwaysargs:- --run-router=true- --run-firewall=true- --run-service-proxy=true- --advertise-cluster-ip=true- --advertise-external-ip=true- --cluster-asn=64512- --peer-router-ips=192.168.0.100- --peer-router-asns=64513env:- name: NODE_NAMEvalueFrom:fieldRef:fieldPath: spec.nodeNamelivenessProbe:httpGet:path: /healthzport: 20244initialDelaySeconds: 10periodSeconds: 3resources:requests:cpu: 250mmemory: 250MisecurityContext:privileged: truevolumeMounts:- name: lib-modulesmountPath: /lib/modulesreadOnly: true- name: cni-conf-dirmountPath: /etc/cni/net.d- name: kubeconfigmountPath: /var/lib/kube-router/kubeconfigreadOnly: trueinitContainers:- name: install-cniimage: busyboximagePullPolicy: Alwayscommand:- /bin/sh- -c- set -e -x;if [ ! -f /etc/cni/net.d/10-kuberouter.conf ]; thenTMP=/etc/cni/net.d/.tmp-kuberouter-cfg;cp /etc/kube-router/cni-conf.json ${TMP};mv ${TMP} /etc/cni/net.d/10-kuberouter.conf;fivolumeMounts:- mountPath: /etc/cni/net.dname: cni-conf-dir- mountPath: /etc/kube-routername: kube-router-cfghostNetwork: truetolerations:- key: CriticalAddonsOnlyoperator: Exists- effect: NoSchedulekey: node-role.kubernetes.io/masteroperator: Existsvolumes:- name: lib-moduleshostPath:path: /lib/modules- name: cni-conf-dirhostPath:path: /etc/cni/net.d- name: kube-router-cfgconfigMap:name: kube-router-cfg- name: kubeconfighostPath:path: /var/lib/kube-router/kubeconfig
---
apiVersion: v1
kind: ServiceAccount
metadata:name: kube-routernamespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:name: kube-routernamespace: kube-system
rules:- apiGroups:- ""resources:- namespaces- pods- services- nodes- endpointsverbs:- list- get- watch- apiGroups:- "networking.k8s.io"resources:- networkpoliciesverbs:- list- get- watch- apiGroups:- extensionsresources:- networkpoliciesverbs:- get- list- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:name: kube-router
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: kube-router
subjects:
- kind: ServiceAccountname: kube-routernamespace: kube-system

直接删除此前的kube-router配置,重新create这个yaml文件内的资源:

kubectl delete -f kubeadm-kuberouter.yaml  #更新配置参数前执行
kubectl create -f  kubeadm-kuberouter.yaml  #更新配置参数后执行

接着部署一个测试用得nginx的pod实例,yaml文件如下

[root@171 nginx]# cat nginx-deploy.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:annotations:deployment.kubernetes.io/revision: "2"creationTimestamp: 2018-04-09T04:02:02Zgeneration: 4labels:app: nginxname: nginx-deploynamespace: defaultresourceVersion: "111504"selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/nginx-deployuid: c28090c0-3baa-11e8-b75a-000c29858eab
spec:replicas: 1selector:matchLabels:app: nginxstrategy:rollingUpdate:maxSurge: 1maxUnavailable: 1type: RollingUpdatetemplate:metadata:creationTimestamp: nulllabels:app: nginxspec:containers:- image: nginx:1.9.1name: nginxports:- containerPort: 80protocol: TCPresources: {}terminationMessagePath: /dev/termination-logterminationMessagePolicy: FilednsPolicy: ClusterFirstrestartPolicy: AlwaysschedulerName: default-schedulersecurityContext: {}terminationGracePeriodSeconds: 30

查看podIP,本地curl测试:

[root@171 nginx]# kubectl get pods -o wide 
NAME                            READY     STATUS    RESTARTS   AGE       IP            NODE
nginx-deploy-5964dfd755-lv2kb   1/1       Running   0          29m       10.244.1.35   171
[root@171 nginx]# curl 10.244.1.35
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>body {width: 35em;margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p><p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p><p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@171 nginx]# 

3.检验结果

路由器查看BGP邻居:

IOU1#show ip bgp summary 
BGP router identifier 192.168.0.101, local AS number 64513
BGP table version is 19, main routing table version 19
4 network entries using 560 bytes of memory
6 path entries using 480 bytes of memory
2 multipath network entries and 4 multipath paths
1/1 BGP path/bestpath attribute entries using 144 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 1208 total bytes of memory
BGP activity 5/1 prefixes, 12/6 paths, scan interval 60 secsNeighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
192.168.0.102   4        64513     100     106       19    0    0 01:27:20        0
192.168.0.170   4        64512     107      86       19    0    0 00:37:19        3
192.168.0.171   4        64512      98      85       19    0    0 00:37:21        3
IOU1#

路由器查看BGP路由条目:

IOU1#show ip route bgp
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static route, H - NHRP, l - LISPa - application route+ - replicated route, % - next hop overrideGateway of last resort is 192.168.0.1 to network 0.0.0.010.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
B        10.96.0.1/32 [20/0] via 192.168.0.171, 00:38:36[20/0] via 192.168.0.170, 00:38:36
B        10.96.0.10/32 [20/0] via 192.168.0.171, 00:38:36[20/0] via 192.168.0.170, 00:38:36
B        10.244.0.0/24 [20/0] via 192.168.0.170, 00:38:36
B        10.244.1.0/24 [20/0] via 192.168.0.171, 00:38:39

可以看到,bgp邻居建立成功,k8s集群内部路由学习成功

开一台测试机检查:
测试机修改默认网关指向路由器,模拟外部网络环境中的一台普通pc:

[root@python ~]# ip route
default via 192.168.0.100 dev eth3  proto static [root@python ~]# curl 10.244.1.35
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>body {width: 35em;margin: 0 auto;font-family: Tahoma, Verdana, Arial, sans-serif;}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p><p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p><p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@python ~]# 

测试成功,至此,集群内部bgp网络,和外部bgp网络对接成功!

友情提示:别忘了保存路由器的配置到nvram里(copy running-config startup-config),否则重启就丢配置啦。好久没碰网络设备了,这个茬给忘了,被坑了一次,嘿嘿


9-22踩坑

今日尝试在k8s集群中添加与原集群node(192.168.9.x/24)不在同一个网段的新node(192.168.20.x/24),创建好了之后出现了非常奇怪的现象:新node中的kube-router与集群外的网络核心设备之间建立的peer邻居关系一直重复地处于established —> idle状态频繁转变,非常的不稳定,当处于established状态时,新node与原node间的丢包率甚至达到70%以上,处于idle状态时,node间ping包正常
问题截图:
在这里插入图片描述

在进入kube-router后使用gobgp neighbor查看发现新node与外部网络设备ebgp\原node的ibgp邻居关系一直处于频繁变化的状态无法稳定建立关系,百思不得其解,最后终于在github上找到了kube-router唯一的类似issue,项目成员解释如下:
Github issue链接
在这里插入图片描述

个人理解是:
kube-router只支持与同一个子网的node建立ibgp对等体关系,跨子网的节点无法建立对等体邻居关系;同时,ebgp协议的ebgp max hop属性,默认值为1,路由器设备一般支持手动修改此值,而kube-router较早之前的版本仅支持默认值1,无法手动配置此值,因此EBGP邻居与kube-router也必须在同一个子网中,后面的版本已解决此问题,升级版本后node跨子网建ebgp邻居不再有问题。


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部