

Postgres on Kubernetes with the Zalando operator
source link: https://vitobotta.com/2020/02/05/postgres-kubernetes-zalando-operator/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

When I need a database I usually reach out for MySQL, since I have been using it for a while and I am most familiar with it. However I am planning to switch to Postgres for an app, so I was looking for options to run Postgres in Kubernetes. I found three candidates: the Zalando operator, the CrunchyData operator and KubeDB. KubeDB is a cool operator that support various databases including MySQL, Postgres, Redis and more. I tried it and I really liked the features, however there are many threads on Github about data loss and other serious issues, so I am not going to use that for now. Admittedly I couldn’t figure out how to set up and use the CrunchyData operator… maybe it’s just me but I don’t find the documentation straightforward in this regard.
So after quite a bit of testing I think I’m going to use the Zalando operator, which AFAIK has been used in production by Zalando for a couple of years already with 100s of clusters and databases. So in theory it should be reliable. While testing I found some issues though because I am using Scaleway object storage for backups instead of AWS S3 (Scaleway is S3 compatible), and I had to figure out some workarounds by looking at the code because the documentation is lacking with regards to some configuration details. A Reddit user helped with some of this.
Installation
The installation is quite straightforward with the Helm chart in the Github repo. So you first need to clone the repo, and checkout the latest version/tag (currently v1.3.1). Then edit the values file ./charts/postgres-operator/values-crd.yaml at least as follows:
configKubernetes:
enable_pod_antiaffinity: true
pod_environment_configmap: "postgres-pod-config"
watched_namespace: "*"
configAwsOrGcp:
aws_region: "<s3 region>"
aws_endpoint: "<s3 endpoint url>"
wal_s3_bucket: "<s3 bucket>"
configLogicalBackup:
logical_backup_s3_access_key_id: "<s3 access key id>"
logical_backup_s3_bucket: "<s3 bucket>"
logical_backup_s3_endpoint: "<s3 endpoint url>"
logical_backup_s3_secret_access_key: "<s3 secret access key>"
logical_backup_s3_sse: "<server side encryption method, e.g. AES256; I leave this blank since it's not supported by Scaleway>"
logical_backup_schedule: "<backup schedule in cron format>"
pod_environment_configmap specifies the name of a config map where we define some settings for the cluster to enable WAL archivation to object storage, and cloning/point in time recovery. watched_namespace is set to an empty string by default, which means that the operator will only watch its namespace for the custom resources required to create the clusters. Setting that to * makes it possible to create clusters in different namespaces, which is likely what you want. I am not 100% sure whether the configAwsOrGcp section is actually needed when using s3 compatible storage. The logical backups settings should be self explanatory. Please note that there currently is a missing setting for the region, so I have submitted a PR to add the setting logical_backup_s3_region since this is required with some s3 compatible services like Scaleway, otherwise the upload of logical backups fails. Hopefully it will be merged soon. In the meantime I have build a custom image for the logical backups using the relevant Dockerfile in the repo, and adding the --region parameter to the aws CLI command in dump.sh.
Once the configuration is ready, we can install the operator with Helm:
kubectl create ns postgres-operator
helm upgrade --install \
--namespace postgres-operator \
-f ./charts/postgres-operator/values-crd.yaml \
postgres-operator \
./charts/postgres-operator
I am assuming Helm 3 here so you need to create the namespace first (I use postgres-operator).
PgAdmin
I like to install PgAdmin as well so I can have a handy web interface for my clusters:
helm upgrade --install \
--namespace postgres-operator \
--set env.email=<your email used as username for login> \
--set env.password=<password> \
--set persistence.enabled=true \
--set persistence.size=1Gi \
--set service.type=ClusterIP \
--set ingress.enabled=false \
pgadmin \
stable/pgadmin
You can optionally enable an ingress, but I prefer not to expose PgAdmin to the web and open it with a port-forward instead:
kubectl -n postgres-operator port-forward svc/pgadmin <some local port>:80
Creating a cluster
To create a cluster, first create a namespace, e.g. postgres-cluster. Then you need to create a config map which will be used by the database pods. As I mentioned earlier, this config map includes the information required for both archiving of WAL to S3, and clone/point-in-time recovery from S3 using the WAL archives. To create the config map run
kubectl create ns postgres-cluster
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-pod-config # The name must be same as configured in the values file earlier
namespace: postgres-cluster
data:
BACKUP_SCHEDULE: "0 */12 * * *" # Schedule a base backup every 12 hours; you can customise as you wish
USE_WALG_BACKUP: "true" # Use the golang backup tool (faster)
BACKUP_NUM_TO_RETAIN: "14" # For 2 backups per day, keep 7 days of base backups
AWS_ACCESS_KEY_ID: <s3 access key>
AWS_SECRET_ACCESS_KEY: <s3 secret key>
AWS_ENDPOINT: <s3 endpoint url>
AWS_REGION: <s3 region>
WALG_DISABLE_S3_SSE: "true" # I disable this because it's not supported by Scaleway
EOF
Next, edit manifests/complete-postgres-manifest.yaml as follows:
metadata:
name: postgres-cluster
namespace: postgres-cluster
spec:
teamId: "postgres"
volume:
size: <volume size>
storageClass: <storage class>
numberOfInstances: 2
enableShmVolume: true
postgresql:
parameters:
max_connections: "250"
enableLogicalBackup: true
logicalBackupSchedule: "00 05 * * *"
The teamId must be set to the first word of the name of the cluster if I remember/understood correctly. I specify at least 2 instances so to have a replica for HA. By default the cluster allows for 10 connections, so I bump this since I usually need more than that. I think you can use fewer connections using something like pgbouncer, but I’m not familiar with it and I am not sure if I would actually need it since I can just increase the number of connections in the cluster directly. Of course I enable the logical backups as well; I would likely use the point in time recovery if needed but it’s nice to have daily dumps too. In the manifest comment out the section about init containers because you will likely not need it. I also comment out the sections about users and databases since I create them once I have access to the cluster, but you can also let the operator create these if you prefer.
Finally, to create the cluster apply the manifest:
kubectl apply -f manifests/complete-postgres-manifest.yaml
This will usually take a few minutes. Once ready, to access the cluster with a client or from an app you can retrieve the password for the postgres user with the following command:
kubectl -n postgres-cluster get secret postgres.postgres-cluster.credentials.postgresql.acid.zalan.do -o 'jsonpath={.data.password}' | base64 -d
Of course change the namespace if needed. That’s it. You now have a working Postgres cluster. You can create as many as you wish and the operator will take care of managing them.
Logical backups
Together with the cluster, the operator will create a cron job in the same namespace for logical backups/dumps, if you have enabled them. You can trigger a logical backup manually at any time with this command:
kubectl -n postgres-cluster create job --from=cronjob/logical-backup-postgres-cluster
You should see that a path like /spilo///logical-backups has been created in the s3 bucket and it includes a compressed dump.
Restoring a logical backup
As far as I know, the operator doesn’t support restoring from logical backups directly in an automated way, yet. But restoring a logical backup is pretty straightforward anyway. What we need to do is create a new cluster first - unless we want to restore the backup in an existing cluster, deploy a temporary container with e.g. the Ubuntu image, download and import the desired dump in the cluster (download and import are done in one step with streaming to avoid the need to download the dump first). If you are restoring to an existing cluster you should first delete the databases you are restoring. Perhaps it’s cleaner to create a new cluster first. To download/stream the backup we can use s3cmd, a simple s3 client that we can install with pip/python. So the whole process is done with the following commands:
apt update
apt install -y python3-pip postgresql-client
pip3 install s3cmd
cat > ~/.pgpass <<EOF
postgres-cluster:5432:postgres:postgres:<password for the postgres user in the target cluster>
EOF
chmod 0600 /root/.pgpass
DUMP_URL=s3://..... (URL of the .sql.gz dump)
/usr/local/bin/s3cmd \
--no-check-certificate \
--access_key=<s3 access key> \
--secret_key=<s3 secret key> \
--host=<s3 endpoint url> \
--host-bucket="<s3 host bucket - see s3cmd reference for your object storage service>" \
get --no-progress ${DUMP_URL} - | gunzip | psql -U postgres -h postgres-cluster
The hostname for the import with psql must be the name of the target cluster. This process will take some time depending on the size of the backup of course.
Point in time recovery with WAL
Logical backups are handy to have, but even nicer is the ability to restore the Postgres cluster at (almost) any point in time within the X days as per the configuration in the config map we created earlier, using the WAL archive; if I am not mistaken, by default the WAL archive is updated in the bucket every 16MB of changes or something like that. That’s why you can revocer to “almost” any point in time. I am sure this is customisable but I haven’t looked into it. For point in time recovery, you need to create a cluster almost the same way as we created a new cluster earlier. The only difference is that we need to add a section to the config map which specifies where the WAL archive is located in s3, and the date/time we want to restore. So, like we did for a brand new cluster, you need to create the namespace first and then the config map as follows:
kubectl create ns postgres-cluster
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-pod-config # The name must be same as configured in the values file earlier
namespace: postgres-cluster
data:
BACKUP_SCHEDULE: "0 */12 * * *" # Schedule a base backup every 12 hours; you can customise as you wish
USE_WALG_BACKUP: "true" # Use the golang backup tool (faster)
BACKUP_NUM_TO_RETAIN: "14" # For 2 backups per day, keep 7 days of base backups
AWS_ACCESS_KEY_ID: <s3 access key>
AWS_SECRET_ACCESS_KEY: <s3 secret key>
AWS_ENDPOINT: <s3 endpoint url>
AWS_REGION: <s3 region>
WALG_DISABLE_S3_SSE: "true" # I disable this because it's not supported by Scaleway
# Optional for point in time recovery/restore
USEWALG_RESTORE: "true"
CLONE_METHOD: CLONE_WITH_WALE
CLONE_AWS_ACCESS_KEY_ID: <s3 access key>
CLONE_AWS_SECRET_ACCESS_KEY: <s3 secret key>
CLONE_AWS_ENDPOINT: <s3 endpoint url>
CLONE_AWS_REGION: <s3 region>
CLONE_WAL_S3_BUCKET: <s3 bucket>
CLONE_WAL_BUCKET_SCOPE_SUFFIX: "/<id of the original cluster>"
CLONE_TARGET_TIME: "2020-02-04T12:49:03+00:00" # time in the WAL archive that we want to restore
CLONE_SCOPE: postgres-cluster # name of the original cluster
EOF
Pay attention to the CLONE_WAL_BUCKET_SCOPE_SUFFIX setting, which should be a string including the ID of the original cluster prefixed with a “/”. Finally, to restore the cluster just edit and apply the manifest like we did for a brand new cluster.
Conclusion
I have only occasionally used Postgres in the past, but I have never looked into it much. So far I like what I see with this operator by Zalando. Creating a cluster and performing backups and restores isn’t complicated and this setup should be pretty reliable considering that Zalando has been using this in production for a while now with lots of clusters. There’s also a nice UI you can optionally install to create clusters etc if you don’t want to use manifests manually. Hope it was useful!
Recommend
-
98
GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over 79 million projects.
-
75
README.md Postgres Operator
-
54
写在前面 Zalando 是欧洲目前规模最大的在线零售平台,我们与其他竞争对手的主要区别在于,我们在欧洲的大部分国家和地区提供免费送货、100 天退货以及便捷的免费退款服务。 Zalando 公司在欧洲的 17 个国家与地区开展业...
-
11
PGO: The Postgres Operator from Crunchy Data Run Cloud Native PostgreSQL on Kubernetes with PGO: The Postgres Operator from Crunchy Data! PGO, the...
-
10
亚马逊意大利推出Prime Wardrobe!挑战时尚电商巨头Zalando!-跨境头条-AMZ123亚马逊导航-跨境电商出海门户 亚马逊意大利推出Prime Wardrobe!挑战时尚...
-
9
Zara成全球最受欢迎时尚品牌!Zalando排第三! ...
-
6
Closed Bug 1688576 Opened 10 months ago Closed 23 days ago...
-
10
快速上手Zalando Postgres Operator,你会了吗?-51CTO.COM 快速上手Zalando Postgres Operator,你会了吗? 作者:为少 2022-04-08 00:01:41 只需应用 yaml 清单即可安装 Postgres Operator。注...
-
13
Postgres Operat...
-
13
Introducing CloudNativePG: A New Open Source Kubernetes Operator for PostgresGabriele Bartolini—5/10/2022Today is the culmination of years o...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK