No description

Find a file

anaeem d805e25240 Install TALM v4.19.3 and pass end-to-end smoke test - talm-subscription.yaml: Subscription applied to openshift-operators, Manual approval, startingCSV pinned to v4.19.3. - smoke-test/: namespace, no-op inform Policy (asserts kube-system exists), staged ClusterGroupUpgrade. Smoke result on anaeem: ClustersSelected=True, Validated=True, then after enable=true the CGU reached Succeeded=True with reason "All clusters already compliant" — full Policy → Placement → ACM → TALM → CGU chain verified.		2026-05-06 16:17:26 +01:00
smoke-test	Install TALM v4.19.3 and pass end-to-end smoke test	2026-05-06 16:17:26 +01:00
curator-anaeem-4.20.11.yaml	Initial commit: ACM ClusterCurator upgrade procedure for anaeem	2026-04-30 18:15:38 +01:00
README.md	Initial commit: ACM ClusterCurator upgrade procedure for anaeem	2026-04-30 18:15:38 +01:00
talm-subscription.yaml	Install TALM v4.19.3 and pass end-to-end smoke test	2026-05-06 16:17:26 +01:00
TALM-zero-touch.md	Add TALM zero-touch upgrade design doc	2026-05-06 13:54:26 +01:00

README.md

Upgrading OpenShift via ACM ClusterCurator

End-to-end procedure for upgrading a managed OpenShift cluster from the ACM hub using a ClusterCurator YAML — no console clicks, GitOps-friendly, and reusable across clusters.

This document is written against the live setup:


Hub cluster	`local-cluster` (`api.virt.na-launch.com:6443`) — also the KubeVirt host for spoke nodes
Spoke cluster	`anaeem` (`api.anaeem.na-launch.com:6443`)
ACM	`multiclusterhub` 2.14.2 in `open-cluster-management`
Curator controller	`cluster-curator-controller` (2 replicas) in `multicluster-engine`
CRD	`clustercurators.cluster.open-cluster-management.io/v1beta1`

Other managed clusters (hybrid, additional spokes) follow the same pattern — only the namespace and CR metadata.name change.

1. Background: how the curator drives an upgrade

ClusterCurator is a hub-side CR that the cluster-curator-controller reconciles. It does not itself talk to the spoke. Instead it spawns a Kubernetes Job in the cluster's namespace (anaeem in our case). That Job runs two stages as init/main containers:

curator-job-<rand>
├── initContainer: upgrade-cluster      (writes the desired version to the spoke)
└── container:     monitor-upgrade      (polls until the spoke reaches it or times out)

Both stages communicate with the spoke through ACM's "work" channel:

upgrade-cluster
- Creates a ManagedClusterView named after the cluster, pointing at the spoke's ClusterVersion resource — this is how the hub reads spoke state.
- Creates a short-lived ManagedClusterAction that asks the klusterlet on the spoke to patch ClusterVersion.spec.desiredUpdate (and spec.channel). The klusterlet executes the patch, the action self-deletes after success/failure.
- Marks upgrade-cluster condition True on the curator.
monitor-upgrade
- Re-reads the ManagedClusterView on a poll loop.
- Updates monitor-upgrade condition with whatever the spoke's ClusterVersion.status currently says (e.g. Working towards 4.20.11: 119 of 959 done (12% complete), waiting on etcd, kube-apiserver).
- Exits successfully when the spoke reports the new version Completed.
- Exits failed when monitorTimeout (minutes) elapses — but the upgrade itself does not roll back; CVO and MCO continue independently.

The CR also supports prehook / posthook Ansible Tower job specs and an overrideJob that replaces the default Job entirely — out of scope for this doc.

Who does what after `desiredUpdate` is set

ClusterCurator (hub)
   │ (writes to)
   ▼
ClusterVersion.spec.desiredUpdate (spoke)
   │
   ▼
Cluster Version Operator (CVO)        — sequences cluster-operator updates
   │
   ▼
Each ClusterOperator                   — own controller does its rolling update
   │
   ▼
Machine Config Operator (MCO)          — when needed, generates new rendered MachineConfig
   │
   ▼
MachineConfigPool (master, then worker)
   │ ┌──────────────────────────────────┐
   │ │ Per-node loop:                   │
   │ │   1. cordon                      │
   │ │   2. drain (respects PDBs)       │
   │ │   3. apply rendered MachineConfig│
   │ │   4. reboot                      │
   │ │   5. uncordon                    │
   │ └──────────────────────────────────┘
   ▼
Upgrade complete ⇒ ClusterVersion.status.history[0].state = Completed

Almost every "upgrade is stuck" symptom traces back to one of those five per-node steps failing.

2. Pre-flight checks

Run all of these from the hub. Replace anaeem with your cluster name.

2.1 Confirm the cluster is reachable and healthy

oc get managedcluster anaeem
# HUB ACCEPTED   JOINED   AVAILABLE  → expect true / True / True

AVAILABLE=Unknown (as hybrid currently shows) means the klusterlet is not reporting; the curator will create the Job but the action will never reach the spoke. Fix availability before upgrading.

2.2 Confirm the curator controller is running

oc -n multicluster-engine get pods -l app=cluster-curator-controller

Without it, the CR sits in pending.

2.3 Inspect what the spoke thinks it can upgrade to

oc get managedclusterinfo -n anaeem anaeem -o json | jq '.status.distributionInfo.ocp |
  {current:.version, channel:.channel, desired:.desiredVersion,
   inChannelUpdates:.availableUpdates,
   conditional:[.versionAvailableUpdates[].version]}'

Two relevant lists:

availableUpdates — versions reachable from the current channel and recommended. These do not require the not-recommended annotation.
versionAvailableUpdates — the broader graph (other channels, conditional updates). To pick from this set you must:
- set the annotation cluster.open-cluster-management.io/upgrade-allow-not-recommended-versions: "true",
- and usually set spec.upgrade.upstream: to the OpenShift update service URL (https://api.openshift.com/api/upgrades_info/v1/graph).

Pick a version that is in availableUpdates whenever possible.

2.4 Confirm the spoke is currently healthy

The MCO will refuse to drain a degraded pool, and CVO will refuse to start an upgrade if any operator is Available=False. Spoke-side check:

oc --context anaeem get clusterversion
oc --context anaeem get co | awk '$3!="True" || $4!="False" || $5!="False"'
oc --context anaeem get mcp
oc --context anaeem get nodes

(oc --context anaeem requires the spoke kubeconfig context to be present; if not, log in: oc login https://api.anaeem.na-launch.com:6443.)

If you don't want to leave the hub, mirror these via ManagedClusterView (see §6.1).

2.5 Make sure no other curator is already running

oc get clustercurator -n anaeem
oc get jobs -n anaeem | grep curator

A second ClusterCurator while the first is in flight will race. Delete the old CR (and its Job) before applying a new one.

3. The CR

/home/anaeem/upgrades/curator-anaeem-4.20.11.yaml:

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: ClusterCurator
metadata:
  name: anaeem                # MUST match the managed cluster name
  namespace: anaeem           # MUST be the cluster's namespace
  annotations:
    cluster.open-cluster-management.io/upgrade-allow-not-recommended-versions: "false"
spec:
  desiredCuration: upgrade
  upgrade:
    desiredUpdate: "4.20.11"  # quoted to keep it a string
    channel: stable-4.20
    monitorTimeout: 120       # minutes
    # upstream: https://api.openshift.com/api/upgrades_info/v1/graph   # only when crossing channels / conditional
    # intermediateUpdate: "4.20.99"                                    # EUS→EUS hop
    # prehook: [...] / posthook: [...]                                 # Ansible Tower
    # overrideJob: <PodTemplateSpec>                                   # replace default upgrade job

Field reference (verified against the live CRD via oc explain clustercurator.spec.upgrade):

Field	Type	Notes
`desiredCuration`	string enum	`install`, `upgrade`, `scale`, `destroy`. `""` clears, allowing re-arm.
`upgrade.desiredUpdate`	string	Target X.Y.Z. Required.
`upgrade.channel`	string	Update channel. Should match what the spoke is on (or one it can switch to).
`upgrade.intermediateUpdate`	string	EUS→EUS only. Curator will hop through this version first.
`upgrade.monitorTimeout`	int (minutes)	Default 120. Only affects the `monitor-upgrade` container; the upgrade keeps going if it expires.
`upgrade.upstream`	string	Override OSUS URL.
`upgrade.prehook` / `posthook`	[]obj	Ansible Tower job specs.
`upgrade.overrideJob`	obj	Full pod template — completely replaces the default job.
`upgrade.towerAuthSecret`	string	Tower secret for prehook/posthook.

Server-side validation before applying

oc apply --dry-run=server -f curator-anaeem-4.20.11.yaml

Catches schema mistakes without creating anything.

Apply

oc apply -f curator-anaeem-4.20.11.yaml

The curator controller picks it up within a few seconds, populates spec.curatorJob, and creates the Job.

4. Watching progress

4.1 Curator-side (hub)

# CR conditions — the canonical view of curator state
oc get clustercurator -n anaeem anaeem \
  -o jsonpath='{range .status.conditions[*]}{.type}={.status} {.message}{"\n"}{end}'

# Example output during a healthy upgrade:
#   clustercurator-job=False  curator-job-rtpw6 DesiredCuration: upgrade
#   upgrade-cluster=True      Completed executing init container
#   monitor-upgrade=False     Upgrade status - Working towards 4.20.11: 119 of 959 done (12% complete), waiting on etcd, kube-apiserver

# The job and pod
oc get jobs,pods -n anaeem | grep curator

# Live tail of the monitor container (this is the most useful single command during an upgrade)
JOB=$(oc get clustercurator -n anaeem anaeem -o jsonpath='{.spec.curatorJob}')
oc logs -n anaeem -l job-name=$JOB -c monitor-upgrade -f

Condition meanings:

clustercurator-job — overall job lifecycle. False/Job_has_finished here is misleading; it just means a job was launched.
upgrade-cluster — True once the init container successfully patched the spoke.
monitor-upgrade — carries the live progress message. Becomes True on success.

4.2 Spoke-side, viewed through the curator's view

The curator already created ManagedClusterView/anaeem in the anaeem namespace, mirroring the spoke's ClusterVersion:

oc get managedclusterview -n anaeem anaeem \
  -o jsonpath='{.status.result.status.conditions[?(@.type=="Progressing")].message}{"\n"}'
oc get managedclusterview -n anaeem anaeem \
  -o jsonpath='{.status.result.status.history[0].version}{" → state: "}{.status.result.status.history[0].state}{"\n"}'

4.3 Spoke-side, directly

oc --context anaeem get clusterversion
oc --context anaeem get co
oc --context anaeem get mcp
oc --context anaeem get nodes

oc adm upgrade status (newer OpenShift) gives a clean summary if you have it.

4.4 Re-arming the curator for the next bump

The CR is single-shot: once the Job ends (success or fail), reconciliation stops. To run another upgrade:

# either delete + re-apply
oc delete clustercurator -n anaeem anaeem
oc apply -f curator-anaeem-4.20.12.yaml

# or patch in place (clear, then set new spec)
oc patch clustercurator -n anaeem anaeem --type=merge -p '{"spec":{"desiredCuration":""}}'
oc patch clustercurator -n anaeem anaeem --type=merge \
  -p '{"spec":{"desiredCuration":"upgrade","upgrade":{"desiredUpdate":"4.20.12","channel":"stable-4.20","monitorTimeout":180}}}'

Either way, a new curator-job-* will be created. Old completed jobs are not auto-cleaned; periodically oc delete job -n <ns> <old-job> to keep the namespace tidy.

5. Post-upgrade validation

# version landed
oc get managedclusterinfo -n anaeem anaeem \
  -o jsonpath='{.status.distributionInfo.ocp.version}{"\n"}'

# all operators healthy
oc --context anaeem get co | awk '$3!="True" || $4!="False" || $5!="False"'  # should be empty

# all pools updated
oc --context anaeem get mcp -o wide
# UPDATED=True UPDATING=False DEGRADED=False for both pools

# nodes ready and on the new RHCOS
oc --context anaeem get nodes -o wide
oc --context anaeem get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" "}{.status.nodeInfo.osImage}{"\n"}{end}'

Once green: optionally delete the curator CR and old job.

6. Stuck-node / stuck-upgrade playbook

An upgrade is "stuck" when the spoke's ClusterVersion stops making progress for >15 minutes, or a ClusterOperator / MachineConfigPool reports Degraded=True. The curator monitor will keep ticking until monitorTimeout then fail — but the failure is informational, not corrective.

6.1 Diagnose from the hub without leaving it

ManagedClusterView lets you read any spoke resource. Apply once, then re-read its .status.result.

# nodes
oc apply -f - <<'EOF'
apiVersion: view.open-cluster-management.io/v1beta1
kind: ManagedClusterView
metadata: { name: anaeem-nodes, namespace: anaeem }
spec:
  scope:
    resource: nodes
EOF

oc get managedclusterview -n anaeem anaeem-nodes -o json |
  jq '.status.result.items[] |
    {name:.metadata.name,
     ready:(.status.conditions[]|select(.type=="Ready")|.status),
     unsched:.spec.unschedulable,
     image:.status.nodeInfo.osImage}'

# machineconfigpools
oc apply -f - <<'EOF'
apiVersion: view.open-cluster-management.io/v1beta1
kind: ManagedClusterView
metadata: { name: anaeem-mcp, namespace: anaeem }
spec:
  scope:
    resource: machineconfigpools
    apiGroup: machineconfiguration.openshift.io
EOF

oc get managedclusterview -n anaeem anaeem-mcp -o json |
  jq '.status.result.items[] |
    {name:.metadata.name,
     desired:.status.configuration.name,
     ready:.status.readyMachineCount,
     updated:.status.updatedMachineCount,
     degraded:.status.degradedMachineCount,
     conditions:[.status.conditions[]|select(.status=="True")|.type]}'

# clusteroperators
oc apply -f - <<'EOF'
apiVersion: view.open-cluster-management.io/v1beta1
kind: ManagedClusterView
metadata: { name: anaeem-co, namespace: anaeem }
spec:
  scope:
    resource: clusteroperators
    apiGroup: config.openshift.io
EOF

For ad-hoc patches without leaving the hub, use ManagedClusterAction (the same mechanism the curator itself uses).

6.2 Common failure modes and fixes

A. Drain hangs because of a PodDisruptionBudget

Symptom: MCP Updating=True for a long time on one node; oc describe node <n> shows Drain failed; an MCD log line like error when evicting pod "...": Cannot evict pod as it would violate the pod's disruption budget.

Fix on spoke:

oc get pdb -A -o wide                 # find the offender
oc patch pdb <name> -n <ns> --type=merge -p '{"spec":{"minAvailable":0}}'
# upgrade resumes within seconds

After the upgrade restore the PDB and (better) fix the workload to tolerate maxUnavailable: 1.

B. Pod won't terminate (long grace period, missing controller, stuck finalizer)

Symptom: MCD log shows pods with local storage or pod has no controller. oc describe node <n> lists offending pods.

Fix:

# force-delete after confirming the workload tolerates it
oc delete pod <p> -n <ns> --grace-period=0 --force

# stuck finalizer
oc patch pod <p> -n <ns> --type=merge -p '{"metadata":{"finalizers":null}}'

C. `emptyDir` / `hostPath` blocks drain

Symptom: MCD log says pods with local storage (use --delete-local-data to override).

The MCO won't pass that flag. Either annotate the pod's owning workload to evict cleanly (controller.kubernetes.io/pod-deletion-cost, accept restart) or recreate the workload elsewhere first.

D. Node reboots but comes back `NotReady`

Symptom: oc get nodes shows NotReady,SchedulingDisabled. oc describe node may show kubelet errors, certificate problems, or CNI not ready.

Investigation:

oc --context anaeem get csr | grep -i pending             # approve any pending
oc --context anaeem adm certificate approve <csr-name>
oc --context anaeem -n openshift-machine-config-operator logs $(oc --context anaeem -n openshift-machine-config-operator get pod -o name | grep mcd-on-stuck-node)

If kubelet is dead on the node, you need console / serial access. For our setup the node is a KubeVirt VM on the virt host (see §6.2.F).

E. A `ClusterOperator` won't progress

Symptom: monitor-upgrade message stays on waiting on <op> for 30+ min, e.g. waiting on etcd, kube-apiserver.

oc --context anaeem get co <op> -o yaml | yq '.status.conditions'
oc --context anaeem -n openshift-<op> get pods
oc --context anaeem -n openshift-<op> logs <pod>

etcd and kube-apiserver are the most common stallers — usually quorum, certs, or a single bad master node. Stabilize that node first.

F. Node is gone / unreachable (virt-host issue)

The anaeem cluster's nodes are KubeVirt VMs on the virt host. From the hub:

oc get vm,vmi -A | grep anaeem
oc -n <vm-ns> describe vmi <vm>
# common host-side problems: VMI Failed, evicted from node, scheduling pressure on the host

# restart the VM (graceful)
virtctl restart <vm> -n <vm-ns>

# force-stop / start if hung
virtctl stop <vm> -n <vm-ns> --force
virtctl start <vm> -n <vm-ns>

If a master VM is the one that died and you've lost etcd quorum, follow the OpenShift "restore etcd quorum" runbook — the upgrade is the least of your problems at that point.

G. Pause the bleed

When in doubt, pause the affected pool so MCO stops touching the next node while you investigate:

oc --context anaeem patch mcp worker --type=merge -p '{"spec":{"paused":true}}'
# ...investigate / fix...
oc --context anaeem patch mcp worker --type=merge -p '{"spec":{"paused":false}}'

While paused, CVO will still report Progressing=True indefinitely; the curator monitor will tick toward its timeout. Pause is safe; do not leave a pool paused for days because it blocks security-critical MachineConfig changes too.

H. Manual drain when MCO refuses

If you've decided the disruption is acceptable and just want the node moved:

oc --context anaeem adm cordon <node>
oc --context anaeem adm drain <node> \
  --ignore-daemonsets --delete-emptydir-data --force --disable-eviction
# MCO will then proceed to apply the new MachineConfig and reboot

--disable-eviction bypasses PDBs — last resort.

I. Abandon a bad target version

CVO doesn't truly downgrade, but you can stop chasing the target by writing the previous version back into desiredUpdate. Operators that already updated stay updated; CVO will stop trying to advance the rest.

oc --context anaeem patch clusterversion version --type=merge \
  -p '{"spec":{"desiredUpdate":{"version":"4.20.8","force":false}}}'
oc delete clustercurator -n anaeem anaeem

This is rare and has consequences (mixed-version cluster); coordinate with whoever owns the cluster.

6.3 What the curator does while the cluster is stuck

State	`monitor-upgrade` condition	`curator-job-*` Job	Underlying upgrade
Healthy progress	`False`, message updates each poll	Running	Advancing
Stuck < `monitorTimeout`	`False`, message stays the same	Running	Stalled
`monitorTimeout` reached	`False`, message ends with `... timed out`	`Failed`	Still trying — CVO/MCO continue
Spoke reaches target	`True`, `Cluster has been upgraded`	`Complete`	Done

So a failed curator job ≠ failed upgrade. Always corroborate with the spoke's ClusterVersion.

To re-arm monitoring after a timeout (no need to re-apply the CR if the upgrade is still progressing — but if you want curator to resume watching):

oc patch clustercurator -n anaeem anaeem --type=merge -p '{"spec":{"desiredCuration":""}}'
oc patch clustercurator -n anaeem anaeem --type=merge \
  -p '{"spec":{"desiredCuration":"upgrade","upgrade":{"desiredUpdate":"4.20.11","channel":"stable-4.20","monitorTimeout":240}}}'

7. Reference

7.1 Useful commands cheat sheet

# curator state
oc get clustercurator -n anaeem anaeem -o jsonpath='{range .status.conditions[*]}{.type}={.status} {.message}{"\n"}{end}'

# tail upgrade
oc logs -n anaeem -l job-name=$(oc get clustercurator -n anaeem anaeem -o jsonpath='{.spec.curatorJob}') -c monitor-upgrade -f

# spoke version + progress without context switch
oc get managedclusterview -n anaeem anaeem -o json |
  jq '.status.result.status | {desired:.desired.version, history:.history[0], progressing:(.conditions[]|select(.type=="Progressing"))}'

# what versions are reachable
oc get managedclusterinfo -n anaeem anaeem -o json |
  jq '.status.distributionInfo.ocp | {channel,available:.availableUpdates}'

7.2 Annotations the curator understands

Annotation	Purpose
`cluster.open-cluster-management.io/upgrade-allow-not-recommended-versions`	Allow `desiredUpdate` to come from `versionAvailableUpdates` instead of `availableUpdates`.
`cluster.open-cluster-management.io/upgrade-clusterversion-backoff-limit`	Override the Job's `backoffLimit`.

7.3 Files in this directory

README.md — this document
curator-anaeem-4.20.11.yaml — applied 2026-04-30 to take anaeem from 4.20.8 to 4.20.11

7.4 Source of truth

CRD schema: oc explain clustercurator.spec.upgrade
Controller: cluster-curator-controller in multicluster-engine
Upstream: https://github.com/stolostron/cluster-curator-controller