Jump to: Complete Features | Incomplete Features | Complete Epics | Incomplete Epics | Other Complete | Other Incomplete |
Note: this page shows the Feature-Based Change Log for a release
These features were completed when this image was assembled
Currently the Get started with on-premise host inventory quickstart gets delivered in the Core console. If we are going to keep it here we need to add the MCE or ACM operator as a prerequisite, otherwise it's very confusing.
OCP console has MarkdownView which can be used to interpret Markdown. We should expose it so it can be used by our plugin to interpret Markdown-based descriptions of ClusterTemplate-s
For users who are using OpenShift but have not yet begun to explore multicluster and we we offer them.
I'm investigating where Learning paths are today and what is required.
As a user I'd like to have learning path for how to get started with Multicluster.
Install MCE
Create multiple clusters
Use HyperShift
Provide access to cluster creation to devs via templates
Scale up to ACM/ACS (OPP?)
Status
https://github.com/patternfly/patternfly-quickstarts/issues/37#issuecomment-1199840223
Update the cluster-authentication-operator to not go degraded when it can’t determine the console url. This risks masking certain cases where we would want to raise an error to the admin, but the expectation is that this failure mode is rare.
Risk could be avoided by looking at ClusterVersion's enabledCapabilities to decide if missing Console was expected or not (unclear if the risk is high enough to be worth this amount of effort).
AC: Update the cluster-authentication-operator to not go degraded when console config CRD is missing and ClusterVersion config has Console in enabledCapabilities.
We need to continue to maintain specific areas within storage, this is to capture that effort and track it across releases.
Goals
Requirements
Requirement | Notes | isMvp? |
---|---|---|
Telemetry | No | |
Certification | No | |
API metrics | No | |
Out of Scope
n/a
Background, and strategic fit
With the expected scale of our customer base, we want to keep load of customer tickets / BZs low
Assumptions
Customer Considerations
Documentation Considerations
Notes
In progress:
High prio:
Unsorted
Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.
There is a new driver release 5.0.0 since the last rebase that includes snapshot support:
https://github.com/kubernetes-sigs/ibm-vpc-block-csi-driver/releases/tag/v5.0.0
Rebase the driver on v5.0.0 and update the deployments in ibm-vpc-block-csi-driver-operator.
There are no corresponding changes in ibm-vpc-node-label-updater since the last rebase.
Currently in OpenShift we do not support distributing hotfix packages to cluster nodes. In time-sensitive situations, a RHEL hotfix package can be the quickest route to resolving an issue.
Before we ship OCP CoreOS layering in https://issues.redhat.com/browse/MCO-165 we need to switch the format of what is currently `machine-os-content` to be the new base image.
The overall plan is:
We need something in our repo /docs that we can point people to that briefly explains how to use "layering features" via the MCO in OCP ( well, and with the understanding that OKD also uses the MCO ).
Maybe this ends up in its own repo like https://github.com/coreos/coreos-layering-examples eventually, maybe it doesn't.
I'm thinking something like https://github.com/openshift/machine-config-operator/blob/layering/docs/DemoLayering.md back from when we did the layering branch, but actually matching what we have in our main branch
This is separate but probably related to what Colin started in the Docs Tracker.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Description of the problem:
BE 2.13.0, In Nutanix, UMN flow, If machine_network = [] , bootstrap validation failed.
How reproducible:
Trying to reproduce
Steps to reproduce:
1.
2.
3.
Actual results:
Expected results:
Assumption
Doc: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit
CNCC was moved to the management cluster and it should use proxy settings defined for the management cluster.
Epic Goal*
Provide a long term solution to SELinux context labeling in OCP.
Why is this important? (mandatory)
As of today when selinux is enabled, the PV's files are relabeled when attaching the PV to the pod, this can cause timeout when the PVs contains lot of files as well as overloading the storage backend.
https://access.redhat.com/solutions/6221251 provides few workarounds until the proper fix is implemented. Unfortunately these workaround are not perfect and we need a long term seamless optimised solution.
This feature tracks the long term solution where the PV FS will be mounted with the right selinux context thus avoiding to relabel every file.
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
As we are relying on mount context there should not be any relabeling (chcon) because all files / folders will inherit the context from the mount context
More on design & scenarios in the KEP and related epic STOR-1173
Dependencies (internal and external) (mandatory)
None for the core feature
However the driver will have to set SELinuxMountSupported to true in the CSIDriverSpec to enable this feature.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
This Epic is to track upstream work in the Storage SIG community
This Epic is to track the SELinux specific work required. fsGroup work is not included here.
Goal:
Continue contributing to and help move along the upstream efforts to enable recursive permissions functionality.
Finish current SELinuxMountReadWriteOncePod feature upstream:
The feature is probably going to stay alpha upstream.
Problem:
Recursive permission change takes very long for fsGroup and SELinux. For volumes with many small files Kubernetes currently does a chown for every file on the volume (due to fsGroup). Similarly for container runtimes (such as CRI-O) a chcon of every file on the volume is performed due to SCC's SELinux context. Data on the volume may already have the correct GID/SELinux context so Kubernetes needs way to detect this automatically to avoid the long delay.
Why is this important:
Dependencies (internal and external):
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL):
Previous Work:
Customers:
Open questions:
Notes:
As OCP developer (and as OCP user in the future), I want all CSI drivers shipped as part of OCP to support mounting with -o context=XYZ, so I can test with CSIDriver.SELinuxMount: true (or my pods are running without CRI-O recursively relabeling my volume).
In detail:
Exit criteria:
In testing dual stack on vsphere we discovered that kubelet will not allow us to specify two ips on any platform except baremetal. We have a couple of options to deal with that:
GA CgroupV2 in 4.13
Default with RHEL 9
From OCP - 4.13, the RCOS nodes by default come up with the "CGroupsV2" configuration
Command to verify on any OCP cluster node
stat -c %T -f /sys/fs/cgroup/
So, to avoid unexpected complications, if the `cgroupMode` is found to be empty in the `nodes.config` resource, `CGroupsv1` configuration needs to be explicitly set using the `machine-config-operator`
This user tracks the changes required to remove the TechPreview related checks in the MCO code to graduate the CGroupsV2 feature to GA.
Feature
As an Infrastructure Administrator, I want to deploy OpenShift on vSphere with supervisor (aka Masters) and worker nodes (from a MachineSet) across multiple vSphere data centers and multiple vSphere clusters using full stack automation (IPI) and user provided infrastructure (UPI).
MVP
Install OpenShift on vSphere using IPI / UPI in multiple vSphere data centers (regions) and multiple vSphere clusters in 1 vCenter, all in the same IPv4 subnet (in the same physical location).
Out of scope
Scenarios for consideration:
Acceptance criteria:
References:
As an openshift engineer make changes to various openshift components so that vSphere zonal installation is considered GA.
As a openshift engineer depreciate existing vSphere platform spec parameters so that they can eventually be removed in favor of zonal.
As a openshift engineer create an additional UPI terraform for zonal so that it can be tested in CI.
As a openshift engineer I need to follow the process to move the api from tech preview to ga so it can be used by clusters not installed with TechPreviewNoUpgrade.
more to follow...
As a openshift engineer implement a new job for upi zonal so that method of installation is tested.
Epic Goal*
We need SPLAT-594 to be reflected in our CSI driver operator to support vSphere topology of storage GA.
Why is this important? (mandatory)
See SPLAT-320.
Scenarios (mandatory)
As user, I want to edit Infrastructure object after OCP installation (or upgrade) to update cluster topology, so all newly provisioned PVs will get the new topology labels.
(With vSphere topology GA, we expect that users will be allowed to edit Infrastructure and change the cluster topology after cluster installation.)
Dependencies (internal and external) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
It's possible that Infrastructure will remain read-only. No code on Storage side is expected then.
Done - Checklist (mandatory)
In a zonal deployments, it is possible that new failure-domains are added to the cluster.
In that case, we will have to most likely discover these new failure-domains and tag datastores in them, so as topology aware provisioning can work.
When STOR-1145 is merged, make sure that these new metrics are reported via telemetry to us.
Exit criteria:
We should create a metric and an alert if both ClusterCSIDriver and Infra object specify a topology.
Although such configuration is supported and Infra object will take precedence but it indicates an user error and hence user should be alerted about them.
I was thinking we will probably need a detailed metric for topology information about the cluster. Such as - how many failure-domains, how many datacenter and how many datastores.
Feature Goal*
What is our purpose in implementing this? What new capability will be available to customers?
The goal of this feature is to provide a consistent, predictable and deterministic approach on how the default storage class(es) is managed.
Why is this important? (mandatory)
The current default storage class implementation has corner cases which can result in PVs staying in pending because there is either no default storage class OR multiple storage classes are defined
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
No default storage class
In some cases there is no default SC defined, this can happen during OCP deployment where components such as the registry request a PV whereas the SC are not been defined yet. This can also happen during a change in default SC, there won't be any between the admin unset the current one and set the new on.
Another user creates PVC requesting a default SC, by leaving pvc.spec.storageClassName=nil. The default SC does not exist at this point, therefore the admission plugin leaves the PVC untouched with pvc.spec.storageClassName=nil.
The admin marks SC2 as default.
PV controller, when reconciling the PVC, updates pvc.spec.storageClassName=nil to the new SC2.
PV controller uses the new SC2 when binding / provisioning the PVC.
The installer creates a default SC.
PV controller, when reconciling the PVC, updates pvc.spec.storageClassName=nil to the new default SC.
PV controller uses the new default SC when binding / provisioning the PVC.
Multiple Storage Classes
In some cases there are multiple default SC, this can be an admin mistake (forget to unset the old one) or during the period where a new default SC is created but the old one is still present.
New behavior:
-> the PVC will get the default storage class with the newest CreationTimestamp (i.e. B) and no error should show.
-> admin will get an alert that there are multiple default storage classes and they should do something about it.
CSI that are shipped as part of OCP
The CSI drivers we ship as part of OCP are deployed and managed by RH operators. These operators automatically create a default storage class. Some customers don't like this approach and prefer to:
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
No external dependencies.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Can bring confusion to customer as there is a change in the default behavior customer are used to. This needs to be carefully documented.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
Much like core OpenShift operators, a standardized flow exists for OLM-managed operators to interact with the cluster in a specific way to leverage AWS STS authorization when using AWS APIs as opposed to insecure static, long-lived credentials. OLM-managed operators can implement integration with the CloudCredentialOperator in well-defined way to support this flow.
Enable customers to easily leverage OpenShift's capabilities around AWS STS with layered products, for increased security posture. Enable OLM-managed operators to implement support for this in well-defined pattern.
See Operators & STS slide deck.
The CloudCredentialsOperator already provides a powerful API for OpenShift's cluster core operator to request credentials and acquire them via short-lived tokens. This capability should be expanded to OLM-managed operators, specifically to Red Hat layered products that interact with AWS APIs. The process today is cumbersome to none-existent based on the operator in question and seen as an adoption blocker of OpenShift on AWS.
This is particularly important for ROSA customers. Customers are expected to be asked to pre-create the required IAM roles outside of OpenShift, which is deemed acceptable.
This Section: High-Level description of the Market Problem ie: Executive Summary
This Section: Articulates and defines the value proposition from a users point of view
This Section: Effect is the expected outcome within the market. There are two dimensions of outcomes; growth or retention. This represents part of the “why” statement for a feature.
As an engineer I want the capability to implement CI test cases that run at different intervals, be it daily, weekly so as to ensure downstream operators that are dependent on certain capabilities are not negatively impacted if changes in systems CCO interacts with change behavior.
Acceptance Criteria:
Create a stubbed out e2e test path in CCO and matching e2e calling code in release such that there exists a path to tests that verify working in an AWS STS workflow.
OC mirror is GA product as of Openshift 4.11 .
The goal of this feature is to solve any future customer request for new features or capabilities in OC mirror
In 4.12 release, a new feature was introduced to oc-mirror allowing it to use OCI FBC catalogs as starting point for mirroring operators.
As a oc-mirror user, I would like the OCI FBC feature to be stable
so that I can use it in a production ready environment
and to make the new feature and all existing features of oc-mirror seamless
This feature is ring-fenced in the oc mirror repository, it uses the following flags to achieve this so as not to cause any breaking changes in the current oc-mirror functionality.
The OCI FBC (file base container) format has been delivered for Tech Preview in 4.12
Tech Enablement slides can be found here https://docs.google.com/presentation/d/1jossypQureBHGUyD-dezHM4JQoTWPYwiVCM3NlANxn0/edit#slide=id.g175a240206d_0_7
Design doc is in https://docs.google.com/document/d/1-TESqErOjxxWVPCbhQUfnT3XezG2898fEREuhGena5Q/edit#heading=h.r57m6kfc2cwt (also contains latest design discussions around the stories of this epic)
Link to previous working epic https://issues.redhat.com/browse/CFE-538
Contacts for the OCI FBC feature
WHAT
Refer engineering notes document https://docs.google.com/document/d/1zZ6FVtgmruAeBoUwt4t_FoZH2KEm46fPitUB23ifboY/edit#heading=h.6pw5r5w2r82 steps 2-7
Acceptance Criteria
As IBM, I would like to use oc-mirror with the --use-oci-feature flag and ImageSetConfigs containing OCI-FBC operator catalogs to mirror these catalogs to a connected registry
so that , regarding OCI FBC catalog:
and that regarding releases, additional images, helm charts:
As an oc-mirror user I want a well documented and intuitive process
so that I can effectively and efficiently deliver image artifacts in both connected and disconnected installs with no impact on my current workflow
Glossary:
References:
Acceptance criteria:
As IBM user, I'd like to be able to specify the destination of the OCI FBC catalog in ImageSetConfig
So that I can control where that image is pushed to on the disconnected destination registry, because the path on disk to that OCI catalog doesn't make sense to be used in the component paths of the destination catalog.
Examples provided assume that the current working directory is set to /tmp/cwdtest.
Instead of introducing a targetNamespace which is used in combination with targetName, this counter proposal introduces a targetCatalog field which supersedes the existing targetName field (which would be marked as deprecated). Users should transition from using targetName to targetCatalog, but if both happen to be specified, the targetCatalog is preferred and targetName is ignored. Any ISC that currently uses targetName alone should continue to be used as currently defined.
The rationale for targetCatalog is that some customers will have restrictions on where images can be placed. All IBM images always use a namespace. We therefore need a way to indicate where the CATALOG image is located within the context of the target registry... it can't just be placed in the root, so we need a way to configure this.
The targetCatalog field consists of an optional namespace followed by the target image name, described in extended Backus–Naur form below:
target-catalog = [namespace '/'] target-name target-name = path-component namespace = path-component ['/' path-component]* path-component = alpha-numeric [separator alpha-numeric]* alpha-numeric = /[a-z0-9]+/ separator = /[_.]|__|[-]*/
The target-name portion of targetCatalog represents the the image name in the final destination registry, and matches the definition/purpose of the targetName field. The namespace is only used for "placement" of the catalog image into the right "hierarchy" in the target registry. The target-name portion will be used in the catalog source metadata name, the file name of the catalog source, and target image reference.
Examples:
targetCatalog: foo/bar/baz/ibm-zcon-zosconnect-example
targetCatalog: ibm-zcon-zosconnect-example
FBC image from docker registry
oc mirror -c /Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml --dest-skip-tls --dest-use-http docker://localhost:5000
/Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml
kind: ImageSetConfiguration apiVersion: mirror.openshift.io/v1alpha2 storageConfig: local: path: /tmp/localstorage mirror: operators: - catalog: icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:6f02ecef46020bcd21bdd24a01f435023d5fc3943972ef0d9769d5276e178e76
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/imageContentSourcePolicy.yaml
apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: labels: operators.openshift.org/catalog: "true" name: operator-0 spec: repositoryDigestMirrors: - mirrors: - localhost:5000/cpopen source: icr.io/cpopen - mirrors: - localhost:5000/openshift4 source: registry.redhat.io/openshift4
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/catalogSource-ibm-zcon-zosconnect-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-zcon-zosconnect-catalog namespace: openshift-marketplace spec: image: localhost:5000/cpopen/ibm-zcon-zosconnect-catalog:6f02ec sourceType: grpc
FBC image from docker registry (putting images into a destination "namespace")
oc mirror -c /Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml --dest-skip-tls --dest-use-http docker://localhost:5000/foo
/Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml
kind: ImageSetConfiguration apiVersion: mirror.openshift.io/v1alpha2 storageConfig: local: path: /tmp/localstorage mirror: operators: - catalog: icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:6f02ecef46020bcd21bdd24a01f435023d5fc3943972ef0d9769d5276e178e76
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/imageContentSourcePolicy.yaml
apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: labels: operators.openshift.org/catalog: "true" name: operator-0 spec: repositoryDigestMirrors: - mirrors: - localhost:5000/foo/cpopen source: icr.io/cpopen - mirrors: - localhost:5000/foo/openshift4 source: registry.redhat.io/openshift4
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/catalogSource-ibm-zcon-zosconnect-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-zcon-zosconnect-catalog namespace: openshift-marketplace spec: image: localhost:5000/foo/cpopen/ibm-zcon-zosconnect-catalog:6f02ec sourceType: grpc
FBC image from docker registry (overriding the catalog name and tag)
oc mirror -c /Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml --dest-skip-tls --dest-use-http docker://localhost:5000
/Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml
kind: ImageSetConfiguration apiVersion: mirror.openshift.io/v1alpha2 storageConfig: local: path: /tmp/localstorage mirror: operators: - catalog: icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:6f02ecef46020bcd21bdd24a01f435023d5fc3943972ef0d9769d5276e178e76 targetCatalog: cpopen/ibm-zcon-zosconnect-example # NOTE: namespace now has to be provided along with the # target catalog name to preserve the namespace in the resulting image targetTag: v123
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/imageContentSourcePolicy.yaml
apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: labels: operators.openshift.org/catalog: "true" name: operator-0 spec: repositoryDigestMirrors: - mirrors: - localhost:5000/cpopen source: icr.io/cpopen - mirrors: - localhost:5000/openshift4 source: registry.redhat.io/openshift4
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/catalogSource-ibm-zcon-zosconnect-example.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-zcon-zosconnect-example namespace: openshift-marketplace spec: image: localhost:5000/cpopen/ibm-zcon-zosconnect-example:v123 sourceType: grpc
FBC image from OCI path
In this example we're suggesting the use of a targetCatalog field.
oc mirror -c /Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml --dest-skip-tls --dest-use-http --use-oci-feature docker://localhost:5000
/Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml
kind: ImageSetConfiguration apiVersion: mirror.openshift.io/v1alpha2 storageConfig: local: path: /tmp/localstorage mirror: operators: - catalog: oci:///foo/bar/baz/ibm-zcon-zosconnect-catalog/amd64 # This is just a path to the catalog and has no special meaning targetCatalog: foo/bar/baz/ibm-zcon-zosconnect-example # <--- REQUIRED when using OCI and optional for docker images # value is used within the context of the target registry # targetTag: v123 # <--- OPTIONAL
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/imageContentSourcePolicy.yaml
apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: labels: operators.openshift.org/catalog: "true" name: operator-0 spec: repositoryDigestMirrors: - mirrors: - localhost:5000/cpopen source: icr.io/cpopen - mirrors: - localhost:5000/openshift4 source: registry.redhat.io/openshift4
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/catalogSource-ibm-zcon-zosconnect-example.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-zcon-zosconnect-example namespace: openshift-marketplace spec: image: localhost:5000/foo/bar/baz/ibm-zcon-zosconnect-example:6f02ec # Example uses "targetCatalog" set to # "foo/bar/baz/ibm-zcon-zosconnect-example" at the # destination registry localhost:5000 sourceType: grpc
FBC image from OCI path (putting images into a destination "namespace" named "abc")
oc mirror -c /Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml --dest-skip-tls --dest-use-http --use-oci-feature docker://localhost:5000/abc
/Users/jhunkins/go/src/github.com/jchunkins/oc-mirror/ImageSetConfiguration.yml
kind: ImageSetConfiguration apiVersion: mirror.openshift.io/v1alpha2 storageConfig: local: path: /tmp/localstorage mirror: operators: - catalog: oci:///foo/bar/baz/ibm-zcon-zosconnect-catalog/amd64 # This is just a path to the catalog and has no special meaning targetCatalog: foo/bar/baz/ibm-zcon-zosconnect-example # <--- REQUIRED when using OCI and optional for docker images # value is used within the context of the target registry # targetTag: v123 # <--- OPTIONAL
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/imageContentSourcePolicy.yaml
apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: labels: operators.openshift.org/catalog: "true" name: operator-0 spec: repositoryDigestMirrors: - mirrors: - localhost:5000/abc/cpopen source: icr.io/cpopen - mirrors: - localhost:5000/abc/openshift4 source: registry.redhat.io/openshift4
/tmp/cwdtest/oc-mirror-workspace/results-1675716807/catalogSource-ibm-zcon-zosconnect-example-catalog.yaml
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: ibm-zcon-zosconnect-example namespace: openshift-marketplace spec: image: localhost:5000/abc/foo/bar/baz/ibm-zcon-zosconnect-example:6f02ec # Example uses "targetCatalog" set to # "foo/bar/baz/ibm-zcon-zosconnect-example" at the # destination registry localhost:5000/abc sourceType: grpc
Customers are asking for improvements to the upgrade experience (both over-the-air and disconnected). This is a feature tracking epics required to get that work done.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
The CVO README is currently aimed at CVO devs. But there are way more CVO consumers than there are CVO devs. We should aim the README at "what does the CVO do for my clusters?", and push the dev docs down under docs/dev/.
Enable sharing ConfigMap and Secret across namespaces
Requirement | Notes | isMvp? |
---|---|---|
Secrets and ConfigMaps can get shared across namespaces | YES |
NA
NA
Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (OCPBU-93) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them.
Questions to be addressed:
* What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
* Does this feature have doc impact?
* New Content, Updates to existing content, Release Note, or No Doc Impact
* If unsure and no Technical Writer is available, please contact Content Strategy.
* What concepts do customers need to understand to be successful in [action]?
* How do we expect customers will use the feature? For what purpose(s)?
* What reference material might a customer want/need to complete [action]?
* Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
* What is the doc impact (New Content, Updates to existing content, or Release Note)?
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
As a user of an OpenShift Cluster i should be restricted from creating Shared Secrets and ConfigMaps with the "openshift-" prefix unless my access level is cluster-admin or above
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
<Describes high level purpose and goal for this story. Answers the questions: Who is impacted, what is it and why do we need it? How does it improve the customer’s experience?>
https://github.com/openshift/csi-driver-shared-resource-operator/pull/71
https://github.com/openshift/cluster-storage-operator/pull/342
https://github.com/openshift/origin/pull/27730
https://github.com/openshift/release/pull/36433
https://github.com/openshift/cluster-storage-operator/pull/343
https://github.com/openshift/openshift-controller-manager/pull/251
https://redhat-internal.slack.com/archives/C01C8502FMM/p1676472369732279
Currently, looks like we need to merge the SR operator changes first (they have proven to be benign in non-hypershift) so that we can complete testing in the cluster-storage-operator PR
<Describes the context or background related to this story>
<Defines what is not included in this story>
<Description of the general technical path on how to achieve the goal of the story. Include details like json schema, class definitions>
<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>
<Describe edge cases to consider when implementing the story and defining tests>
<Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Unknown
Verified
Unsatisfied
<Describes high level purpose and goal for this story. Answers the questions: Who is impacted, what is it and why do we need it? How does it improve the customer’s experience?>
https://github.com/openshift/csi-driver-shared-resource-operator/pull/71
https://github.com/openshift/cluster-storage-operator/pull/342
https://github.com/openshift/origin/pull/27730
https://github.com/openshift/release/pull/36433
https://github.com/openshift/cluster-storage-operator/pull/343
https://github.com/openshift/openshift-controller-manager/pull/251
https://redhat-internal.slack.com/archives/C01C8502FMM/p1676472369732279
Currently, looks like we need to merge the driver changes first (they have proven to be benign in non-hypershift) so that we can complete testing in the cluster-storage-operator PR
<Describes the context or background related to this story>
<Defines what is not included in this story>
<Description of the general technical path on how to achieve the goal of the story. Include details like json schema, class definitions>
<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>
<Describe edge cases to consider when implementing the story and defining tests>
<Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Unknown
Verified
Unsatisfied
Track goals/requirements for self-managed GA of Hosted control planes on AWS using the AWS Provider.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
This Section: What does the person writing code, testing, documenting need to know? What context can be provided to frame this feature.
Questions to be addressed:
Today upstream and the more complete documentation of HyperShift lives on https://hypershift-docs.netlify.app/.
However product documentation today live under https://access.redhat.com/login?redirectTo=https%3A%2F%2Faccess.redhat.com%2Fdocumentation%2Fen-us%2Fred_hat_advanced_cluster_management_for_kubernetes%2F2.6%2Fhtml%2Fmulticluster_engine%2Fmulticluster_engine_overview%23hosted-control-planes-intro
The goal of this Epic is to extract important docs and establish parity between what's documented and possible upstream and product documentation.
Multiple consumers have not realised a newer version of a CPO (spec.release) is not guaranteed to work with an older HO.
This is stated here https://hypershift-docs.netlify.app/reference/versioning-support/
but empiric evidences like OCM integration are telling us this is not enough.
We already deploy a CM in the HO namespace with the HC supported versions.
Additionally we can add an image label with latest HC version supported by the operator so you can quickly docker inspect...
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
User Stories
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Questions to be addressed:
As a developer, I want to be able to:
so that I can achieve
Description of criteria:
All modifications to the openshift-installer is handled in other cards in the epic.
Allow users to interactively adjust the network configuration for a host after booting the agent ISO.
Configure network after host boots
The user has Static IPs, VLANs, and/or bonds to configure, but has no idea of the device names of the NICs. They don't enter any network config in agent-config.yaml. Instead they configure each host's network via the text console after it boots into the image.
As a user, I need information about common misconfigurations that may be preventing the automated installation from proceeding.
If we are unable to access the release image from the registry, provide sufficient debugging information to the user to pinpoint the problem. Check for:
Currently the agent-tui displays always the additional checks (nslookup/ping/http get), even when the primary check (pull image) passes. This may cause some confusion to the user, due the fact that the additional checks do not prevent the agent-tui to complete successfully but they are just informative, to allow a better troubleshooting of the issue (so not needed in the positive case).
The additional checks should then be shown only when the primary check fails for any reason.
Enhance the openshift-install agent create image command so that the agent-nmtui executable will be embedded in the agent ISO
After having created the agent ISO, the agent-nmtui must be added to the ISO using the following approach:
1. Unpack the agent ISO in a temporary folder
2. Unpack the /images/ignition.img compressed cpio archive in a temporary folder
3. Create a new ignition.img compressed cpio archive by appending the agent-nmtui
2. Create a new agent ISO with the updated ignition.img
Implementation note
Portions of code from a PoC located at https://github.com/andfasano/gasoline could be re-used
When running the openshift-install agent create image command, first of all it needs to extract the agent-tui executable from the release payload in a temporary folder
In the console service from AGENT-453, check whether we are able to pull the release image, and display this information to the user before prompting to run nmtui.
If we can access the image, then exit the service if there is no user input after some timeout, to allow the installation to proceed in the automation flow.
When the agent-tui is shown during the initial host boot, if the pull release image check fails then an additional checks box is shown along with a details text view.
The content of the details view gets continuosly updated with the details of failed check, but the user cannot move the focus over the details box (using the arrow/tab keys), thus cannot scroll its content (using the up/down arrow keys)
The openshift-install agent create image will need to fetch the agent-tui executable so that it could be embedded within the agent ISO. For this reason the agent-tui must be available in the release payload, so that it could be retrieved even when the command is invoked in a disconnected environment.
Create a systemd service that runs at startup prior to the login prompt and takes over the console. This should start after the network-online target, and block the login prompt appearing until it exits.
This should also block, at least temporarily, any services that require pulling an image from the registry (i.e. agent + assisted-service).
Right now all the connectivity checks are executed simultaneously, and it doesn't seem necessary especially in the positive scenario, ie when the release image can be pulled without any issue.
So, the connectivity related checks should be performed only when the release image is not accessible, to provide further infos to to the user.
The initial condition for allowing to continue (or not) the installation should be related then just to result of the primary check (right now, just the pull image) and not the secondary ones (http/dns/ping), that are just informative checks.
Note: this approach will also help to manage those cases where, currently, the release image can be pulled but the host doesn't answer to the ping
The node zero ip is currently hard-coded inside set-node-zero.sh.template and in the ServiceBaseURL template string.
ServiceBaseURL is also hard-coded inside:
We need to remove this hard-coding and to allow a user to be able to set the node zero ip through the tui and have it be reflected by the agent services and scripts.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
The goal of this initiative to help boost adoption of OpenShift on ppc64le. This can be further broken down into several key objectives.
Some of the technical specifications have been laid out in MULTIARCH-75.
Epic Goal
Running doc to describe terminologies and concepts which are specific to Power VS - https://docs.google.com/document/d/1Kgezv21VsixDyYcbfvxZxKNwszRK6GYKBiTTpEUubqw/edit?usp=sharing
Recently, the image registry team decided with this change[1] that major cloud platforms cannot have `emptyDir` as the storage backend. IBMCloud uses ibmcos, which we would ideally need to do. There have been few issues identified with using ibmcos as is in the cluster image registry operator and some solutions identified here[2]. Basically, we would need the PowerVS platform to be supported for ibmcos and an API related to change to add resourceGroup in the infra API. This only affects 4.13 and is not an issue for 4.12.
[1] https://github.com/openshift/cluster-image-registry-operator/pull/820
As our customers create more and more clusters, it will become vital for us to help them support their fleet of clusters. Currently, our users have to use a different interface(ACM UI) in order to manage their fleet of clusters. Our goal is to provide our users with a single interface for managing a fleet of clusters to deep diving into a single cluster. This means going to a single URL – your Hub – to interact with your OCP fleet.
The goal of this tech preview update is to improve the experience from the last round of tech preview. The following items will be improved:
Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?
Why we want this?
Phase 2 Goal: Productization of the united Console
As a Dynamic Plugin developer I would render version of my Dynamic plugin in the About modal. For that we would need to check the `
LoadedDynamicPluginInfo` instances. There we need to check the `metadata.name` and `metadata.version` that we need to surface to the About modal.
AC: Render name and version for each Dynamic Plugin into the About modal.
Original description: When ACM moved to the unified console experience, we lost the ability in our standalone console to display our version information in our own About modal. We would like to be able to add our product and version information into the OCP About modal.
Description of problem:
There is a possible race condition in the console operator where the managed cluster config gets updated after the console deployment and doesn't trigger a rollout.
Version-Release number of selected component (if applicable):
4.10
How reproducible:
Rarely
Steps to Reproduce:
1. Enable multicluster tech preview by adding TechPreviewNoUpgrade featureSet to FeatureGate config. (NOTE THIS ACTION IS IRREVERSIBLE AND WILL MAKE THE CLUSTER UNUPGRADEABLE AND UNSUPPORTED) 2. Install ACM 2.5+ 3. Import a managed cluster using either the ACM console or the CLI 4. Once that managed cluster is showing in the cluster dropdown, import a second managed cluster
Actual results:
Sometimes the second managed cluster will never show up in the cluster dropdown
Expected results:
The second managed cluster eventually shows up in the cluster dropdown after a page refresh
Additional info:
Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2055415
Description of problem:
When multi-cluster is enabled and the console can display data from other clusters, we should either change or disable how we filter the OperatorHub catalog by arch / OS. We assume that the arch and OS of the pod running the console is the same as the cluster, but for managed clusters, it could be something else, which would cause us to incorrectly filter operators.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
Migrated from https://bugzilla.redhat.com/show_bug.cgi?id=2089939
Installed operators, operator details, operand details, and operand create pages should work as expected in a multicluster environment when copied CSVs are disabled on any cluster in the fleet.
AC:
Mock a multicluster environment in our CI using Cypress, without provisioning multiple clusters using a combination of cy.intercept and updating window.SERVER flags in the before section of the test scenarios.
Acceptance Criteria:
Without provisioning additional clusters:
In order for hub cluster console OLM screens to behave as expected in a multicluster environment, we need to gather "copiedCSVsDisabled" flags from managed clusters so that the console backend/frontend can consume this information.
AC:
Description of problem:
When viewing a resource that exists for multiple clusters, the data may be from the wrong cluster for a short time after switching clusters using the multicluster switcher.
Version-Release number of selected component (if applicable):
4.10.6
How reproducible:
Always
Steps to Reproduce:
1. Install RHACM 2.5 on OCP 4.10 and enable the FeatureGate to get multicluster switching 2. From the local-cluster perspective, view a resource that would exist on all clusters, like /k8s/cluster/config.openshift.io~v1~Infrastructure/cluster/yaml 3. Switch to a different cluster in the cluster switcher
Actual results:
Content for resource may start out correct, but then switch back to the local-cluster version before switching to the correct cluster several moments later.
Expected results:
Content should always be shown from the selected cluster.
Additional info:
Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2075657
An epic we can duplicate for each release to ensure we have a place to catch things we ought to be doing regularly but can tend to fall by the wayside.
As a developer I want a github pr template that allows me to provide:
Allow to configure compute and control plane nodes on across multiple subnets for on-premise IPI deployments. With separating nodes in subnets, also allow using an external load balancer, instead of the built-in (keepalived/haproxy) that the IPI workflow installs, so that the customer can configure their own load balancer with the ingress and API VIPs pointing to nodes in the separate subnets.
I want to install OpenShift with IPI on an on-premise platform (high priority for bare metal and vSphere) and I need to distribute my control plane and compute nodes across multiple subnets.
I want to use IPI automation but I will configure an external load balancer for the API and Ingress VIPs, instead of using the built-in keepalived/haproxy-based load balancer that come with the on-prem platforms.
Customers require using multiple logical availability zones to define their architecture and topology for their datacenter. OpenShift clusters are expected to fit in this architecture for the high availability and disaster recovery plans of their datacenters.
Customers want the benefits of IPI and automated installations (and avoid UPI) and at the same time when they expect high traffic in their workloads they will design their clusters with external load balancers that will have the VIPs of the OpenShift clusters.
Load balancers can distribute incoming traffic across multiple subnets, which is something our built-in load balancers aren't able to do and which represents a big limitation for the topologies customers are designing.
While this is possible with IPI AWS, this isn't available with on-premise platforms installed with IPI (for the control plane nodes specifically), and customers see this as a gap in OpenShift for on-premise platforms.
Epic | Control Plane with Multiple Subnets | Compute with Multiple Subnets | Doesn't need external LB | Built-in LB |
---|---|---|---|---|
✓ | ✓ | ✓ | ✓ | |
✓ | ✓ | ✓ | ✕ | |
✓ | ✓ | ✓ | ✓ | |
✓ | ✓ | ✓ | ✓ | |
✓ | ✓ | ✓ | ||
✓ | ✓ | ✓ | ✕ | |
✓ | ✓ | ✓ | ✕ | |
✓ | ✓ | ✓ | ✓ | |
✕ | ✓ | ✓ | ✓ | |
✕ | ✓ | ✓ | ✓ | |
✕ | ✓ | ✓ | ✓ |
Workers on separate subnets with IPI documentation
We can already deploy compute nodes on separate subnets by preventing the built-in LBs from running on the compute nodes. This is documented for bare metal only for the Remote Worker Nodes use case: https://docs.openshift.com/container-platform/4.11/installing/installing_bare_metal_ipi/ipi-install-installation-workflow.html#configure-network-components-to-run-on-the-control-plane_ipi-install-installation-workflow
This procedure works on vSphere too, albeit no QE CI and not documented.
External load balancer with IPI documentation
As an OpenShift infrastructure owner I need to deploy OCP on OpenStack with the installer-provisioned infrastructure workflow and configure my own load balancers
Customers want to use their own load balancers and IPI comes with built-in LBs based in keepalived and haproxy.
vsphere has done the work already via https://issues.redhat.com/browse/SPLAT-409
This is needed once the API patch for External LB has merged.
Test the feature with e2e.
We basically want to check that the Keepalived & HAproxy pods don't run when an external LB is being deployed.
The test have to run on all on-prem platforms, for all the jobs, so we can test the default LB as well.
As an OpenShift installation admin I want to use the Assisted Installer, ZTP and IPI installation workflows to deploy a cluster that has remote worker nodes in subnets different from the local subnet, while my VIPs with the built-in load balancing services (haproxy/keepalived).
While this request is most common with OpenShift on bare metal, any platform using the ingress operator will benefit from this enhancement.
Customers using platform none run external load balancers and they won't need this, this is specific for platforms deployed via AI, ZTP and IPI.
Customers and partners want to install remote worker nodes on day1. Due to the built-in network services we provide with Assisted Installer, ZTP and IPI that manage the VIP for ingress, we need to ensure that they remain in the local subnet where the VIPs are configured.
The bare metal IPI tam added a workflow that allows to place the VIPs in the masters. While this isn't an ideal solution, this is the only option documented:
Configuring network components to run on the control plane
OC mirror is GA product as of Openshift 4.11 .
The goal of this feature is to solve any future customer request for new features or capabilities in OC mirror
Overview
This epic is a simple tracker epic for the proposed work and analysis for 4.14 delivery
Goal:
As a cluster administrator, I want OpenShift to include a recent HAProxy version, so that I have the latest available performance and security fixes.
Description:
We should strive to follow upstream HAProxy releases by bumping the HAProxy version that we ship in OpenShift with every 4.y release, so that OpenShift benefits from upstream performance and security fixes, and so that we avoid large version-number jumps when an urgent fix necessitates bumping to the latest HAProxy release. This bump should happen as early as possible in the OpenShift release cycle, so as to maximize soak time.
For OpenShift 4.13, this means bumping to 2.6.
As a cluster administrator,
I want OpenShift to include a recent HAProxy version,
so that I have the latest available performance and security fixes.
We should strive to follow upstream HAProxy releases by bumping the HAProxy version that we ship in OpenShift with every 4.y release, so that OpenShift benefits from upstream performance and security fixes, and so that we avoid large version-number jumps when an urgent fix necessitates bumping to the latest HAProxy release. This bump should happen as early as possible in the OpenShift release cycle, so as to maximize soak time.
For OpenShift 4.14, this means bumping to 2.6.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
OVN IC will be the model used in Hypershift.
OVN IC changes are in progress. A number of metrics have been either moved to a new prometheus subsystem or simply a subsystem change. See this metrics doc for details: https://github.com/martinkennelly/ovn-kubernetes-1/blob/c47ed896d6eef1e78844cc258deafd20502c348b/docs/metrics.md
Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.
Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.
(Using separate cards for each driver because these updates can be more complicated)
Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.
This includes (but is not limited to):
Operators:
Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.
(Using separate cards for each driver because these updates can be more complicated)
Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.
This includes ibm-vpc-node-label-updater!
(Using separate cards for each driver because these updates can be more complicated)
Update all CSI sidecars to the latest upstream release from https://github.com/orgs/kubernetes-csi/repositories
Corresponding downstream repos have `csi-` prefix, e.g. github.com/openshift/csi-external-attacher.
This includes update of VolumeSnapshot CRDs in cluster-csi-snapshot-controller- operator assets and client API in go.mod. I.e. copy all snapshot CRDs from upstream to the operator assets + go get -u github.com/kubernetes-csi/external-snapshotter/client/v6 in the operator repo.
Update the driver to the latest upstream release. Notify QE and docs with any new features and important bugfixes that need testing or documentation.
(Using separate cards for each driver because these updates can be more complicated)
As a WMCO developer, I want the kube-proxy submodule to be pointing to the sdn-4.13-kubernetes-1.26.0 on the openshift/kuberenetes repository so we can pickup the latest kube rebase updates.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository
Description of problem:
The Azure cloud controller manager is currently on kubernetes 1.25 dependencies and should be updated to 1.26
To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository
To align with the 4.13 release, dependencies need to be updated to 1.26. This should be done by rebasing/updating as appropriate for the repository
This Epic is here to track the rebase we need to do when kube 1.26 is GA https://www.kubernetes.dev/resources/release/
https://docs.google.com/document/d/1h1XsEt1Iug-W9JRheQas7YRsUJ_NQ8ghEMVmOZ4X-0s/edit --> this is the link for rebase help
Rebase Openshift SDN to use Kube 1.26
The Assisted Installer is used to help streamline and improve the install experience of OpenShift UPI. Given the install footprint of OpenShift on IBM Power and IBM zSystems we would like to bring the Assisted Installer experience to those platforms and easy the installation experience.
Full support of the Assisted Installer for use by IBM Power and IBM zSystems
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.
Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
As a multi-arch development engineer, I would like to evaluate if the assisted installer is a good fit for simplifying UPI deployments on Power and Z.
Acceptance Criteria
For assisted installer, the nodes will be boot up from cdrom(ISO) or netboot(Network), after RHCOS installed to target disk, the target need to be set as boot device.
For PowerVM with SAN disk, the multipath is enabled, then the disk should be set to be FC disk, not HHD disk.
We have a RFE where customer is asking option to run must-gather in its own name space .
Looks like we already have they option but its hidden . The request from customer is un hide the option
https://github.com/openshift/oc/blob/master/pkg/cli/admin/mustgather/mustgather.go#L158-L160
Provide a visible flag to run must-gather in its own name space. Originally an experimental flag.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Questions to be addressed:
This description is based on the Google Doc by Rafael Fonseca dos Santos : https://docs.google.com/document/d/1yQt8sbknSmF_hriHyMAKPiztSoRIvntSX9i1wtObSYs
Microsoft is deprecating two APIs. The AD Graph API used by Installer destroy code and also used by the CCO to mint credentials. ADAL is also going EOL. ADAL is used by the installer and all cluster components that authenticate to Azure:
Azure Active Directory Authentication Library (ADAL) Retirement **
ADAL end-of-life is December 31, 2022. While ADAL apps may continue to work, no support or security fixes will be provided past end-of-life. In addition, there are no planned ADAL releases planned prior to end-of-life for features or planned support for new platform versions. We recommend prioritizing migration to Microsoft Authentication Library (MSAL).
Azure AD Graph API
Azure AD Graph will continue to function until June 30, 2023. This will be three years after the initial deprecation[ announcement.|https://techcommunity.microsoft.com/t5/microsoft-entra-azure-ad-blog/update-your-applications-to-use-microsoft-authentication-library/ba-p/1257363] Based on Azure deprecation[ guidelines|https://docs.microsoft.com/en-us/lifecycle/], we reserve the right to retire Azure AD Graph at any time after June 30, 2023, without advance notice. Though we reserve the right to turn it off after June 30, 2023, we want to ensure all customers migrate off and discourage applications from taking production dependencies on Azure AD Graph. Investments in new features and functionalities will only be made in[ Microsoft Graph|https://docs.microsoft.com/en-us/graph/overview]. Going forward, we will continue to support Azure AD Graph with security-related fixes. We recommend prioritizing migration to Microsoft Graph.
References:
ADAL is being deprecated. Update to use azidentity.
ADAL is EOL. Update component to use azidentity.
Feature Overview
Upstream Kuberenetes is following other SIGs by moving it's intree cloud providers to an out of tree plugin format at some point in a future Kubernetes release. OpenShift needs to be ready to action this change
Goals
Requirements
Requirement | Notes | isMvp? |
---|---|---|
Plugin framework | Yes | |
AWS out of tree provider | Yes | |
Other Cloud provider plugins | No | |
Out of Scope
n/a
Background, and strategic fit
Assumptions
Customer Considerations
Documentation Considerations
Scenarios
To make the CCM GA, we need to update the switch case in library go to make sure the vSphere CCM is always considered external.
We then need to update the vendor in KCMO, CCMO, KASO and MCO.
The vSphere CCM has a new YAML based cloud config format. We should build a config transformer into CCMO to load the old ini file, drop any storage/unrelated entries, and convert the existing schema to the new YAML schema, before storing it within the CCM namespace.
This will allow us to use the new features from the YAML config and avoid the old, deprecated ini format.
4.11 MVP Requirements
Out of scope use cases (that are part of the Kubeframe/factory project):
Questions to be addressed:
Epic Goal
Why is this important?
Scenarios
1. …
Acceptance Criteria
Dependencies (internal and external)
1. …
Previous Work (Optional):
1.https://issues.redhat.com/browse/ARMOCP-346
Open questions::
1. …
Done Checklist
As an OCP admistrator, I would like to deploy OCP on arm64 BM with agent installer
Acceptance Criteria
Dev:
Jira Admin
QE
Docs:
Agent Installer
Update ETCD datastore encryption to use AES-GCM instead of AES-CBC
2. What is the nature and description of the request?
The current ETCD datastore encryption solution uses the aes-cbc cipher. This cipher is now considered "weak" and is susceptible to padding oracle attack. Upstream recommends using the AES-GCM cipher. AES-GCM will require automation to rotate secrets for every 200k writes.
The cipher used is hard coded.
3. Why is this needed? (List the business requirements here).
Security conscious customers will not accept the presence and use of weak ciphers in an OpenShift cluster. Continuing to use the AES-CBC cipher will create friction in sales and, for existing customers, may result in OpenShift being blocked from being deployed in production.
4. List any affected packages or components.
Epic Goal*
What is our purpose in implementing this? What new capability will be available to customers?
The Kube APIserver is used to set the encryption of data stored in etcd. See https://docs.openshift.com/container-platform/4.11/security/encrypting-etcd.html
Today with OpenShift 4.11 or earlier, only aescbc is allowed as the encryption field type.
RFE-3095 is asking that aesgcm (which is an updated and more recent type) be supported. Furthermore RFE-3338 is asking for more customizability which brings us to how we have implemented cipher customzation with tlsSecurityProfile. See https://docs.openshift.com/container-platform/4.11/security/tls-security-profiles.html
Why is this important? (mandatory)
AES-CBC is considered as a weak cipher
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.
The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.
The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.
AES-GCM encryption was enabled in cluster-openshift-apiserver-operator and cluster-openshift-autenthication-operator, but not in the cluster-kube-apiserver-operator. When trying to enable aesgcm encryption in the apiserver config, the kas-operator will produce an error saying that the aesgcm provider is not supported.
The new aesgcm encryption provider was added in 4.13 as techpreview, but as part of https://issues.redhat.com/browse/API-1509, the feature needs to be GA in OCP 4.13.
During oc login with a token, pasting the token on command line with oc login --token command is insecure. The token is logged in bash history, and appears in a "ps" command when ran precisely at the time the oc login command runs. Moreover, the token gets logged and is searchable by any sysadmin.
Customers/Users would like either the "--web" command, or a command that prompt for a token. There should be no way to pass a secret on a command line with --token command.
For environments where no web browser is available, a "--ask-token" option should be provided that prompts for a token instead of passing it on the command line.
High-level list of items that are out of scope. Initial completion during Refinement status.
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
Provide information that needs to be considered and planned so that documentation will meet customer needs. Initial completion during Refinement status.
Which other projects and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
This is a backport from 4.14
Epic Goal*
During oc login with a token, pasting the token on command line with oc login --token command is insecure. The token is logged in bash history, and appears in a "ps" command when ran precisely at the time the oc login command runs. Moreover, the token gets logged and is searchable by any sysadmin.
Customers/Users would like either the "--web" command, or a command that prompt for a token. There should be no way to pass a secret on a command line with --token command.
For environments where no web browser is available, a "--ask-token" option should be provided that prompts for a token instead of passing it on the command line.
Why is this important? (mandatory)
Pasting the token on command line with oc login --token command is insecure
Scenarios (mandatory)
Customers/Users would like either the "--web" command. There should be no way to pass a secret on a command line with --token command.
For environments where no web browser is available, a "--ask-token" option should be provided that prompts for a token instead of passing it on the command line.
Dependencies (internal and external) (mandatory)
What items must be delivered by other teams/groups to enable delivery of this epic.
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
Acceptance Criteria (optional)
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
This is a backport of 4.14 and 4.15 feature. It requires several PR's to merge, this issue is going to be an umbrella for all the PR's and for documentation to track.
In order to secure token usage during oc login, we need to add the capability to oc to login using the OAuth2 Authorization Code Grant Flow through a browser. This will be possible by providing a command line option to oc:
oc login --web
This top level feature is going to be used as a placeholder for the IBM team who is working on new features for this integration in an effort to keep in sync their existing internal backlog with the corresponding Features/Epics in Red Hat's Jira.
Make machine phases public so they can be used by controller packages (https://github.com/openshift/machine-api-operator/blob/6397450f3464ffb875f43011ed8ad7428e50f881/pkg/controller/machine/controller.go#L75-L92)
With this BYON support:
`networkResourceGroupName` NOT specified ==> non-BYON install scenario
`networkResourceGroupName` specified ==> BYON install scenario (required for BYON scenario)
As a (user persona), I want to be able to:
so that I can achieve
Description of criteria:
Detail about what is specifically not being delivered in the story
This requires/does not require a design proposal.
This requires/does not require a feature gate.
Add support for a NetworkResourceGroup in the MachineProviderSpec, and the logic for performing lookups during machine creation for IBM Cloud.
Goal: Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.
Problem: There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.
Why is this important: Increased operational simplicity and scale flexibility of the cluster’s control plane deployment.
See slack working group: #wg-ctrl-plane-resize
We want to make sure that if new failure domains are added, the CPMS rebalances the machines.
This should be automatic with the RollingUpdate strategy.
Remove e2e common test suite from the control plane machine set E2Es, and adapt presubmit and periodic test setups and teardowns to account for this change.
For more context see this [GitHub conversation](https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/147#discussion_r1035912969)
For clusters where this appropriate, check that a new CPMS is generated and that it is as expected, ie replicas == updatedReplicas, no errors reported
This will need to be tied to OCPCLOUD-1741 which removes the CPMS, we can run them in an ordered container together
Even if a machine does not need an update, the CPMS should replace it when it's been marked for deletion.
We test that we can replace work machines behind a proxy configuration, but we do not test control plane machines.
It would be good to check that the control plane replacement is not going to be disrupted by the proxy
We want to make sure that the latency added by having a proxy between resources does not affect replacing control plane machines
To validate the deletion process of the CPMS, we need to create a test that deletes the CPMS and checks that the CPMS eventually goes away (it may come back with a different UID), and that when it goes away, there are no owner references on the control plane machines, there are still 3 control plane machines, and the cluster operators are all stable
This is already tested with an integration test, but we should also check this in E2E as it is cheap (assuming it works, no machine changes) and may pick up weird interactions with other components.
Eg in integration we have no GC running.
When Inactive, the CPMS should be updated by the generator controller where applicable.
We should test that we can update the spec of the newest machine, observe the CPMS get updated, and then set it back, and observe the update again.
We expect that a generated CPMS should be able to be activated without causing a rollout, that is, the replicas should be equal to the updated replicas.
This should run after OCPCLOUD-1742 in an ordered container
We expect once active for the CPMS to own the master machines.
We can check this in tandem with the other activation test OCPCLOUD-1746
The console only displays list of `Pending` or `Failed` plugins the Cluster Overview Dynamic Plugin Status card item popover when one or more dynamic plugins has a status of `Pending` or `Failed` (added in https://github.com/openshift/console/pull/11664).
https://issues.redhat.com/browse/HAC-1615 will add additional information regarding why a plugin has a `Failed` status.
The aforementioned popover should be updated to include this additional `Failed` information once it is available.
Additionally, the `Console plugins` tab (e.g., k8s/cluster/operator.openshift.io~v1~Console/cluster/console-plugins) only displays a list of `ConsolePlugin` resources augmented with `Version` and `Description` data culled from `Loaded` dynamic plugins. This page should actually show dynamic plugins with a `status` of `Pending` or `Failed` as well, either through the addition of a `status` column or additional tables (design TBD with UXD). The aforementioned additional information regarding why a plugin has `Failed` should also be added to the page as well.
Acceptance Criteria: Update the popup on the dashboard and update notification drawer with failure reason.
Key Objective
Providing our customers with a single simplified User Experience(Hybrid Cloud Console)that is extensible, can run locally or in the cloud, and is capable of managing the fleet to deep diving into a single cluster.
Why customers want this?
Why we want this?
Phase 2 Goal: Productization of the united Console
We need a way to show metrics for workloads running on spoke clusters. This depends on ACM-876, which lets the console discover the monitoring endpoints.
Open Issues:
We will depend on ACM to create a route on each spoke cluster for the prometheus tenancy service, which is required for metrics for normal users.
Openshift console backend should proxy managed cluster monitoring requests through the MCE cluster proxy addon to prometheus services on the managed cluster. This depends on https://issues.redhat.com/browse/ACM-1188
Description/Acceptance Criteria:
Problem
As an Operator author, I want to be able to specify where my Operators to run (on infra, master, or worker nodes) so my end-users can easily install them through OperatorHub in the console without special setups.
Acceptance Criteria
Details
The console adds support to take the value field of a CSV annotation as the Namespace YAML template to create a Namespace object for installing the Operator.
CSV Annotation Example
apiVersion: v1 kind: Namespace metadata: annotations: openshift.io/node-selector: "" name: my-operator-namespace
none for now.
Description of problem:
This is for backporting the feature to 4.13 past Feature freeze, with the approval of Program management.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
1. Proposed title of this feature request
BYOK encrypts root vols AND default storageclass
2. What is the nature and description of the request?
User story
As a customer spinning up managed OpenShift clusters, if I pass a custom AWS KMS key to the installer, I expect it (installer and cluster-storage-operator) to not only encrypt the root volumes for the nodes in the cluster, but also be applied to encrypt the first/default (gp2 in current case) StorageClass, so that my assumptions around passing a custom key are met.
In current state, if I pass a KMS key to the installer, only root volumes are encrypted with it, and the default AWS managed key is used for the default StorageClass.
Perhaps this could be offered as a flag to set in the installer to further pass the key to the storage class, or not.
3. Why does the customer need this? (List the business requirements here)
To satisfy that customers wish to encrypt their owned volumes with their selected key instead of the AWS default account key, by accident.
4. List any affected packages or components.
Note: this implementation should take effect on AWS, GCP and Azure (any cloud provider) equally.
As a cluster admin, I want OCP to provision new volumes with my custom encryption key that I specified during cluster installation in install-config.yaml so all OCP assets (PVs, VMs & their root disks) use the same encryption key.
Description of criteria:
Re-encryption of existing PVs with a new key. Only newly provisioned PVs will use the new key.
Enhancement (incl. TBD API with encryption key reference) will be provided as part of https://issues.redhat.com/browse/CORS-2080.
"Raw meat" of this story is translation of the key reference in TBD API to StorageClass.Parameters. AWS EBS CSi driver operator should update both the StorageClass it manages (managed-csi) with:
Parameters:
encrypted: "true"
kmsKeyId: "arn:aws:kms:us-east-1:012345678910:key/abcd1234-a123-456a-a12b-a123b4cd56ef"
Upstream docs: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/docs/parameters.md
As a cluster admin, I want OCP to provision new volumes with my custom encryption key that I specified during cluster installation in install-config.yaml so all OCP assets (PVs, VMs & their root disks) use the same encryption key.
Description of criteria:
Re-encryption of existing PVs with a new key. Only newly provisioned PVs will use the new key.
Enhancement (incl. TBD API with encryption key reference) will be provided as part of https://issues.redhat.com/browse/CORS-2080.
"Raw meat" of this story is translation of the key reference in TBD API to StorageClass.Parameters. GCP PD CSi driver operator should update both StorageClasses that it manages (standard-csi, standard-ssd) with:
Parameters:
disk-encryption-kms-key: projects/<KEY_PROJECT_ID>/locations/<LOCATION>/keyRings/<RING_NAME>/cryptoKeys/<KEY_NAME>
Upstream docs: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver#createvolume-parameters (CreateVolume parameters == StorageClass.Parameters)
As a cluster admin, I want OCP to provision new volumes with my custom encryption key that I specified during cluster installation in install-config.yaml so all OCP assets (PVs, VMs & their root disks) use the same encryption key.
Description of criteria:
Re-encryption of existing PVs with a new key. Only newly provisioned PVs will use the new key.
Enhancement (incl. TBD API with encryption key reference) will be provided as part of https://issues.redhat.com/browse/CORS-2080.
"Raw meat" of this story is translation of the key reference in TBD API to StorageClass.Parameters. Azure Disk CSi driver operator should update both the StorageClass it manages (managed-csi) with:
Parameters:
diskEncryptionSetID: /subscriptions/<subs-id>/resourceGroups/<rg-name>/providers/Microsoft.Compute/diskEncryptionSets/<diskEncryptionSet-name>
Upstream docs: https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/docs/driver-parameters.md (CreateVolume parameters == StorageClass.Parameters)
Feature Goal: Unify the management of cluster ingress with a common, open, expressive, and extensible API.
Why is this Important? Gateway API is the evolution of upstream Kubernetes Ingress APIs. The upstream project is part of Kubernetes, working under SIG-NETWORK. OpenShift is contributing to the development, building a leadership position, and preparing OpenShift to support Gateway API, with Istio as our supported implementation.
The plug-able nature of the implementation of Gateway API enables support for additional and optional 3rd-party Ingress technologies.
Functional Requirements
Non-Functional Requirements:
Open Questions:
Documentation Considerations:
User Story: As a cluster admin, I want to create a gatewayclass and a gateway, and OpenShift should configure Istio/Envoy with an LB and DNS, so that traffic can reach httproutes attached to the gateway.
The operator will be one of these (or some combination):
Functionality includes DNS (NE-1107), LoadBalancer (NE-1108), , and other operations formerly performed by the cluster-ingress-operator for routers.
Requires design document or enhancement proposal, breakdown into more specific stories.
(probably needs to be an Epic, will move things around later to accomodate that).
Out of scope for enhanced dev preview:
Background
Issue
The default instance type the installer currently chooses for Single
Node Openshift clusters doesn't follow our documented minimum
requirements
Solution
When the number of replicas of the ControlPlane pool is 1, the installer
will now choose `2xlarge` instead of `xlarge`.
Caveat
`2xlarge` has 32GiB of RAM, which is twice as much as we need, but it's
the best we can do to meet the minimum single-node requirements, because
AWS doesn't offer a 16GiB RAM instance type with 8 cores.
Goal: Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.
Problem: There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.
Why is this important: Increased operational simplicity and scale flexibility of the cluster’s control plane deployment.
To enable full support for control plane machine sets on GCP
Any other cloud platforms
Feature created from split of overarching Control Plane Machine Set feature into single release based effort
n/a
Nothing outside documentation that shows the Azure platform is supported as part of Control Plane Machine Sets
n/a
Goal:
Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.
Problem:
There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.
Why is this important:
Lifecycle Information:
Previous Work:
Dependencies:
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL):
As a developer, I want to be able to:
so that I can achieve
Description of criteria:
This does not require a design proposal.
This does not require a feature gate.
On new installations, we should make the StorageClass created by the CSI operator the default one.
However, we shouldn't do that on an upgrade scenario. The main reason is that users might have set a different quota on the CSI driver Storage Class.
Exit criteria:
We need to audit/review and implement all pending feature gates that are implemented in upstream CSI driver - https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/manifests/vanilla/vsphere-csi-driver.yaml#L151
Some of this stuff although is necessary but could break the driver, so we have to be careful.
Epic Goal*
Kubernetes upstream has chosen to allow users to opt-out from CSI volume migration in Kubernetes 1.26 (1.27 PR, 1.26 backport). It is still GA there, but allows opt-out due to non-trivial risk with late CSI driver availability.
We want a similar capability in OCP - a cluster admin should be able to opt-in to CSI migration on vSphere in 4.13. Once they opt-in, they can't opt-out (at least in this epic).
Why is this important? (mandatory)
See an internal OCP doc if / how we should allow a similar opt-in/opt-out in OCP.
Scenarios (mandatory)
Upgrade
New install
EUS to EUS (4.12 -> 4.14)
Contributing Teams(and contacts) (mandatory)
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
RHEL CoreOS should be updated to RHEL 9.2 sources to take advantage of newer features, hardware support, and performance improvements.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Questions to be addressed:
This is the Epic to track the work to add RHCOS 9 in OCP 4.13 and to make OCP use it by default.
CURRENT STATUS: Landed in 4.14 and 4.13
Testing with layering
Another option given an existing e.g. 4.12 cluster is to use layering. First, get a digested pull spec for the current build:
$ skopeo inspect --format "{{.Name}}@{{.Digest}}" -n docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev:4.13-9.2 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b4cc3995d5fc11e3b22140d8f2f91f78834e86a210325cbf0525a62725f8e099
Create a MachineConfig that looks like this:
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: worker-override spec: osImageURL: <digested pull spec>
If you want to also override the control plane, create a similar one for the master role.
We don't yet have auto-generated release images. However, if you want one, you can ask cluster bot to e.g. "launch https://github.com/openshift/machine-config-operator/pull/3485" with options you want (e.g. "azure" etc.) or just "build https://github.com/openshift/machine-config-operator/pull/3485" to get a release image.
Description:
Upstream OKD/FCOS are already using latest ignition that supports [1] writing authorized keys in /home/core/.ssh/authorized_keys.d/ignition . With RHCOS 9, we should also start using new default path /home/core/.ssh/authorized_keys.d/ignition instead of /home/core/.ssh/authorized_keys
[1]https://github.com/openshift/machine-config-operator/pull/2688
Acceptance Criteria:
Part of setting CPU load balancing on RHEL 9 involves disabling sched_load_balance on cgroups that contain a cpuset that should be exclusive. The PAO may be required to be responsible for this piece
Done when
As described in the kubernetes "ephemeral volumes" documentation this features tracks GA and improvements in
OCPPLAN-9193 Implemented local ephemeral capacity management as well as CSI Generic ephemeral volume. This feature tracks the remaining work to GA CSI ephemeral in-inline volume, specially the admission plugin to make the feature secure and prevent any insecure driver from using it. Ephemeral in-line is required by some CSI as key feature to operate (e.g SecretStore CSI), ODF is also planning to GA ephemeral in-line with ceph CSI.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
This Section:
Goal:
The goal is to provide inline volume support (also known as Ephemeral volumes) via a CSI driver/operator. This epic also track the dev of the new admission plugin required to make inline volumes safe.
Problem:
Why is this important:
Dependencies (internal and external):
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL):
Previous Work:
Customers:
Open questions:
Notes:
*As OCP user, I want to be able to use in-line CSI volumes in my Pods, so my apps work.
Since we will have admission plugin to filter out dangerous CSI drivers from restricted namespace, all users should be able to use CSI volumes in all SCCs.
Exit criteria:
* an unprivileged user + namespace can use in-line CSI volume of a "safe" CSI driver (e.g. SharedResource CSI driver) without any changes in "restricted-v2" SCC.
This flag is currently TechPreviewNoUpgrade:
https://github.com/dobsonj/api/blob/95216a844c16019d4e3aaf396492c95d19bf22c0/config/v1/types_feature.go#L122
Once the admission plugin has had sufficient testing and e2e tests are in place, then this can be promoted to GA and eventually remove the feature gate.
Create a validating admission plugin that allows pods to be created if:
Enhancement: https://github.com/openshift/enhancements/blob/master/enhancements/storage/csi-inline-vol-security.md
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
This work will require updates to the core OpenShift API repository to add the new platform type, and then a distribution of this change to all components that use the platform type information. For components that partners might replace, per-component action will need to be taken, with the project team's guidance, to ensure that the component properly handles the "External" platform. These changes will look slightly different for each component.
To integrate these changes more easily into OpenShift, it is possible to take a multi-phase approach which could be spread over a release boundary (eg phase 1 is done in 4.X, phase 2 is done in 4.X+1).
Phase 1
Phase 2
Phase 3
Phase 1
Phase 2
Phase 3
As defined in the External platform enhancement , a new platform is being added to OpenShift. To accommodate the phase 2 work, the CIRO should be updated, if necessary, to react to the "External" platform in the same manner as it would for platform "None".
Please see the enhancement and the parent plan OCPBU-5 for more details about this process.
In phase 2 (planned for 4.13 release) of the external platform enhancement, the new platform type will be added to the openshift/api packages. As part of staging the release of this new platform we will need to ensure that all operators react in a neutral way to the platform, as if it were a "None" platform to ensure the continued normal operation of OpenShift.
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Provide some (testable) examples of how we will know if we have achieved the epic goal.
We are working to create an External platform test which will exercise this mechanism, see OCPCLOUD-1782
This will require OCPCLOUD-1777
As defined in the External platform enhancement , a new platform is being added to OpenShift. To accommodate the phase 2 work, the CIO should be updated, if necessary, to react to the "External" platform in the same manner as it would for platform "None".
Please see the enhancement and the parent plan OCPBU-5 for more details about this process.
In phase 2 (planned for 4.13 release) of the external platform enhancement, the new platform type will be added to the openshift/api packages. As part of staging the release of this new platform we will need to ensure that all operators react in a neutral way to the platform, as if it were a "None" platform to ensure the continued normal operation of OpenShift.
We are working to create an External platform test which will exercise this mechanism, see OCPCLOUD-1782
as described in the epic, the CIO should be updated to react to the new "External" platform as it would for a "None" platform.
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
Per spike Add field in ClusterVersion spec to request the target architecture, create "oc adm upgrade" sub-command that allows a convenient way to update a cluster from homogeneous -> heterogeneous while maintaining the current version, i.e. there will not be an option to specify version.
Goal:
Support migration from dual-stack IPv6 to single-stack IPv6.
Why is this important?
We have customers who want to deploy a dual stack cluster and then (eventually) migrate to single stack ipv6 once all of their ipv4 dependencies are eliminated. Currently this isn't possible because we only support ipv4-primary dual stack deployments. However, with the implementation of OPNET-1 we addressed many of the limitations that prevented ipv6-primary, so we need to figure out what remains to make this supported.
At the very least we need to remove the validations in the installer that requires ipv4 to be the primary address. There will also be changes needed in dev-scripts to allow testing (an option to make the v6 subnets and addresses primary, for example).
We have customers who want to deploy a dual stack cluster and then migrate to single stack ipv6 once all of their ipv4 dependencies are eliminated. Currently this isn't possible because we only support ipv4-primary dual stack deployments. However, with the implementation of OPNET-1 we addressed many of the limitations that prevented ipv6-primary, so we need to figure out what remains to make this supported. At the very least we need to remove the validations in the installer that require ipv4 to be the primary address. There will also be changes needed in dev-scripts to allow testing (an option to make the v6 subnets and addresses primary, for example).
The installer currently enforces ipv4-primary for dual stack deployments. We will need to remove/modify those validations to allow an ipv6-primary configureation.
In the IPI nodeip-configuration service we always prefer ipv4 as the primary node address. This will need to be made dynamic based on the order of networks configured.
Runtimecfg assumes ipv4-primary in some places today and we need to make that aware of whether a cluster is v4 or v6 primary.
Goal:
API and implementation work to provide the cluster admin with an option in the IngressController API to use PROXY protocol with IBM Cloud load-balancers.
Description:
This epic extends the IngressController API essentially by copying the option we added in NE-330. In that epic, we added a configuration option to use PROXY protocol when configuring an IngresssController to use a NodePort service or host networking. With this epic (NE-1090), the same configuration option is added to use PROXY protocol when configuring an IngressController to use a LoadBalancer service on IBM Cloud.
This epic tracks the API and implementation work to provide the cluster admin with an option in the IngressController API to use PROXY protocol with IBM Cloud load-balancers.
This epic extends the IngressController API essentially by copying the option we added in NE-330. In that epic, we added a configuration option to use PROXY protocol when configuring an IngresssController to use a NodePort service or host networking. With this epic (NE-1090), the same configuration option is added to use PROXY protocol when configuring an IngressController to use a LoadBalancer service on IBM Cloud.
Create a Azure cloud specific spec.resourceTags entry in the infrastructure CRD. This should create and update tags (or labels in Azure) on any openshift cloud resource that we create and manage. The behaviour should also tag existing resources that do not have the tags yet and once the tags in the infrastructure CRD are changed all the resources should be updated accordingly.
Tag deletes continue to be out of scope, as the customer can still have custom tags applied to the resources that we do not want to delete.
Due to the ongoing intree/out of tree split on the cloud and CSI providers, this should not apply to clusters with intree providers (!= "external").
Once confident we have all components updated, we should introduce an end2end test that makes sure we never create resources that are untagged.
Goals
Requirement | Notes | isMvp? |
---|---|---|
CI - MUST be running successfully with test automation | This is a requirement for ALL features. | YES |
Release Technical Enablement | Provide necessary release enablement details and documents. | YES |
List any affected packages or components.
This epic covers the work to apply user defined tags to Azure created for openshift cluster available as tech preview.
The user should be able to define the azure tags to be applied on the resources created during cluster creation by the installer and other operators which manages the specific resources. The user will be able to define the required tags in the install-config.yaml while preparing with the user inputs for cluster creation, which will then be made available in the status sub-resource of Infrastructure custom resource which cannot be edited but will be available for user reference and will be used by the in-cluster operators for tagging when the resources are created.
Updating/deleting of tags added during cluster creation or adding new tags as Day-2 operation is out of scope of this epic.
List any affected packages or components.
Reference - https://issues.redhat.com/browse/RFE-2017
Installer creates below list of resources during create cluster phase and these resources should be applied with the user defined tags and the default OCP tag kubernetes.io/cluster/<cluster_name>:owned
Resources List
Resource | Terraform API |
---|---|
Resource group | azurerm_resource_group |
Image | azurerm_image |
Load Balancer | azurerm_lb |
Network Security Group | azurerm_network_security_group |
Storage Account | azurerm_storage_account |
Managed Identity | azurerm_user_assigned_identity |
Virtual network | azurerm_virtual_network |
Virtual machine | azurerm_linux_virtual_machine |
Network Interface | azurerm_network_interface |
Private DNS Zone | azurerm_private_dns_zone |
DNS Record | azurerm_dns_cname_record |
Acceptance Criteria:
Issues found by QE team during pre-merge tests are reported in QE Tracker, which should be fixed.
Acceptance criteria:
Enhancement proposed for Azure tags support in OCP, requires machine-api-provider-azure to add azure userTags available in the status sub resource of infrastructure CR, to the azure virtual machines resource and the sub-resources created.
machine-api-provider-azure has a method CreateMachine() which creates below resources and tags should be applied
Acceptance Criteria
Installer generates Infrastructure CR in manifests creation step of cluster creation process based on the user provided input recorded in install-config.yaml. While generating Infrastructure CR platformStatus.azure.resourceTags should be updated with the user provided tags(installconfig.platform.azure.userTags).
Acceptance Criteria
cluster-config-operator makes Infrastructure CRD available for installer, which is included in it's container image from the openshift/api package and requires the package to be updated to have the latest CRD.
Enhancement proposed for Azure tags support in OCP, requires cluster-ingress-operator to add azure userTags available in the status sub resource of infrastructure CR, to the azure DNS resource created.
cluster-ingress-operator should add Tags to the DNS records created.
Note: dnsrecords.ingress.operator.openshift.io and openshift-ingress CRD, usage to be identified.
Acceptance Criteria
HyperShift came to life to serve multiple goals, some are main near-term, some are secondary that serve well long-term.
HyperShift opens up doors to penetrate the market. HyperShift enables true hybrid (CP and Workers decoupled, mixed IaaS, mixed Arch,...). An architecture that opens up more options to target new opportunities in the cloud space. For more details on this one check: Hosted Control Planes (aka HyperShift) Strategy [Live Document]
To bring hosted control planes to our customers, we need the means to ship it. Today MCE is how HyperShift shipped, and installed so that customers can use it. There are two main customers for hosted-control-planes:
If you have noticed, MCE is the delivery mechanism for both management models. The difference between managed and self-managed is the consumer persona. For self-managed, it's the customer SRE for managed its the RH SRE.
For us to ship HyperShift in the product (as hosted control planes) in either management model, there is a necessary readiness checklist that we need to satisfy. Below are the high-level requirements needed before GA:
Please also have a look at our What are we missing in Core HyperShift for GA Readiness? doc.
Multi-cluster is becoming an industry need today not because this is where trend is going but because it’s the only viable path today to solve for many of our customer’s use-cases. Below is some reasoning why multi-cluster is a NEED:
As a result, multi-cluster management is a defining category in the market where Red Hat plays a key role. Today Red Hat solves for multi-cluster via RHACM and MCE. The goal is to simplify fleet management complexity by providing a single pane of glass to observe, secure, police, govern, configure a fleet. I.e., the operand is no longer one cluster but a set, a fleet of clusters.
HyperShift logically centralized architecture, as well as native separation of concerns and superior cluster lifecyle management experience, makes it a great fit as the foundation of our multi-cluster management story.
Thus the following stories are important for HyperShift:
Refs:
HyperShift is the core engine that will be used to provide hosted control-planes for consumption in managed and self-managed.
Main user story: When life cycling clusters as a cluster service consumer via HyperShift core APIs, I want to use a stable/backward compatible API that is less susceptible to future changes so I can provide availability guarantees.
Ref: What are we missing in Core HyperShift for GA Readiness?
Customers do not pay Red Hat more to run HyperShift control planes and supporting infrastructure than Standalone control planes and supporting infrastructure.
Assumptions:
HyperShift - proposed cuts from data plane
When operating OpenShift clusters (for any OpenShift form factor) from MCE/ACM/OCM/CLI as a Cluster Service Consumer (RH managed SRE, or self-manage SRE/admin) I want to be able to migrate CPs from one hosting service cluster to another:
More information:
To understand usage patterns and inform our decision making for the product. We need to be able to measure adoption and assess usage.
See Hosted Control Planes (aka HyperShift) Strategy [Live Document]
Whether it's managed or self-managed, it’s pertinent to report health metrics to be able to create meaningful Service Level Objectives (SLOs), alert of failure to meet our availability guarantees. This is especially important for our managed services path.
https://issues.redhat.com/browse/OCPPLAN-8901
HyperShift for managed services is a strategic company goal as it improves usability, feature, and cost competitiveness against other managed solutions, and because managed services/consumption-based cloud services is where we see the market growing (customers are looking to delegate platform overhead).
We should make sure our SD milestones are unblocked by the core team.
This feature reflects HyperShift core readiness to be consumed. When all related EPICs and stories in this EPIC are complete HyperShift can be considered ready to be consumed in GA form. This does not describe a date but rather the readiness of core HyperShift to be consumed in GA form NOT the GA itself.
- GA date for self-managed will be factoring in other inputs such as adoption, customer interest/commitment, and other factors.
- GA dates for ROSA-HyperShift are on track, tracked in milestones M1-7 (have a look at https://issues.redhat.com/browse/OCPPLAN-5771)
Epic Goal*
The goal is to split client certificate trust chains from the global Hypershift root CA.
Why is this important? (mandatory)
This is important to:
Scenarios (mandatory)
Provide details for user scenarios including actions to be performed, platform specifications, and user personas.
Dependencies (internal and external) (mandatory)
Hypershift team needs to provide us with code reviews and merge the changes we are to deliver
Contributing Teams(and contacts) (mandatory)
Acceptance Criteria (optional)
The serviceaccount CA bundle automatically injected to all pods cannot be used to authenticate any client certificate generated by the control-plane.
Drawbacks or Risk (optional)
Risk: there is a throbbing time pressure as this should be delivered before first stable Hypershift release
Done - Checklist (mandatory)
AUTH-311 introduced an enhancement. Implement the signer separation described there.
Agent-based installer requires to boot the generated ISO on the target nodes manually. Support for PXE booting will allow customers to automate their installations via their DHCP/PXE infrastructure.
This feature allows generating installation ISOs ready to add to a customer-provided DHCP/PXE infrastructure.
As an OpenShift installation admin I want to PXE-boot the image generated by the openshift-install agent subcommand
We have customers requesting this booting mechanism to make it easier to automate the booting of the nodes without having to actively place the generated image in a bootable device for each host.
As an OpenShift installation admin I want to PXE-boot the image generated by the openshift-install agent subcommand
We have customers requesting this booting mechanism to make it easier to automate the booting of the nodes without having to actively place the generated image in a bootable device for each host.
Only parts of the epic AGENT-356 have landed (in particular, AGENT-438). We shouldn't ship it in a release in its current state due to lack of testing, as well as missing features like iPXE support (AGENT-491). At the moment, it is likely the PXE artifacts don't work at all because AGENT-510 is not implemented.
The agent create pxe-files subcommand should be disabled until the whole Epic is completed in a release.
Goal: Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.
Problem: There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.
Why is this important: Increased operational simplicity and scale flexibility of the cluster’s control plane deployment.
To enable full support for control plane machine sets on Azure
Any other cloud platforms
Feature created from split of overarching Control Plane Machine Set feature into single release based effort
n/a
Nothing outside documentation that shows the Azure platform is supported as part of Control Plane Machine Sets
n/a
Goal:
Control plane nodes in the cluster can be scaled up or down, lost and recovered, with no more importance or special procedure than that of a data plane node.
Problem:
There is a lengthy special procedure to recover from a failed control plane node (or majority of nodes) and to add new control plane nodes.
Why is this important:
Lifecycle Information:
Previous Work:
Dependencies:
Prioritized epics + deliverables (in scope / not in scope):
Estimate (XS, S, M, L, XL, XXL):
As a developer, I want to be able to:
so that I can achieve
Description of criteria:
This does not require a design proposal.
This does not require a feature gate.
ovnk manifests in CNO is not up-to-date, we want to sync it with manifests in microshift repo .
Track goals/requirements for self-managed GA of Hosted control planes on BM using the agent provider. Mainly make sure:
This Section:
Customers are looking at HyperShift to deploy self-managed clusters on Baremetal. We have positioned the Agent flow as the way to get BM clusters due to its ease of use (it automates many of the rather mundane tasks required to setup up BM clusters) and its planned for GA with MCE 2.3 (in the OCP 4.13 timeframe).
Questions to be addressed:
To run a HyperShift management cluster in disconnected mode we need to document which images need to be mirrored and potentially modify the images we use for OLM catalogs.
ICSP mapping only happens for image references with a digest, not a regular tag. We need to address this for images we reference by tag:
CAPI, CAPI provider, OLM catalogs
As a user of Hosted Control Planes, I would like the HCP Specification API. to support both ICSP & IDMS.
IDMS is replacing ICSP in OCP 4.13+. hcp.Spec.ImageContentSources was updated in OCPBUGS-11939 to replace ICSP with IDMS. This needs to be reverted and something new added to support IDMS in addition to ICSP.
This is a clone of issue HOSTEDCP-1046. The following is the description of the original issue:
—
As a user of Hosted Control Planes, I would like the HCP Specification API. to support both ICSP & IDMS.
IDMS is replacing ICSP in OCP 4.13+. hcp.Spec.ImageContentSources was updated in OCPBUGS-11939 to replace ICSP with IDMS. This needs to be reverted and something new added to support IDMS in addition to ICSP.
With the enablement of OpenShift clusters with mixed architecture capable compute nodes it is necessary to have support for manifest listed images so the correct images/binaries can be pulled onto the relevant nodes.
Included in this should be the ability to
ACCEPTANCE CRITERIA
ACCEPTANCE CRITERIA
Notes:
ACCEPTANCE CRITERIA
DOCUMENTATION
OPEN QUESTIONS
ACCEPTANCE CRITERIA
Notes:
Notes:
Nothing new to document.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Resource measurements: https://docs.google.com/spreadsheets/d/1eLBkG4HhlKlxRlB9H8kjuXs5F3zzVOPfkOOhFIOrfzU/edit#gid=0
LVMS requirements are being reduced, so this should be reflected in the Assisted Installer.
Cloned from OCPSTRAT-377 to represent the backport to 4.13
Backport questions:
1) What's the impact/cost to any other critical items on the next release?
Installer and edge are mostly focused on activation/retention and working the list top-to-bottom without release blockers. This is an activation item highly coveted by SD and applicable in existing versions.
2) Is it a breaking change to the existing fleet?
No.
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Backport of 4.13 AWS Shared VPC Feature
Backport of 4.13 AWS Shared VPC Feature
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic —
Enhancement PR: https://github.com/openshift/enhancements/pull/1397
API PR: https://github.com/openshift/api/pull/1460
Ingress Operator PR: https://github.com/openshift/cluster-ingress-operator/pull/928
Feature Goal: Support OpenShift installation in AWS Shared VPC scenario where AWS infrastructure resources (at least the Private Hosted Zone) belong to an account separate from the cluster installation target account.
The ingress operator is responsible for creating DNS records in AWS Route53 for cluster ingress. Prior to the implementation of this epic, the ingress operator doesn't have the capability to add DNS records into an existing Route 53 hosted zone in the shared VPC.
As described in the WIP PR https://github.com/openshift/cluster-ingress-operator/pull/928, the ingress operator will consume a new API field that contains the IAM Role ARN for configuring DNS records in the private hosted zone. If this field is present, then the ingress operator will use this account to create all private hosted zone records. The API fields will be described in the Enhancement PR.
The ingress operator code will accomplish this by defining a new provider implementation that wraps two other DNS providers, using one of them to publish records to the public zone and the other to publish records to the private zone.
See NE-1299
See NE-1299
CgroupV2 is GA as of OCP 4.13 .
RHEL 9 is defaulted to V2 and we want to make sure we are in sync
V1 support in system d will end by end of 2023
What need to be done
https://docs.google.com/document/d/1i6IEGjaM0-NeMqzm0ZnVm0UcfcVmZqfQRG1m5BjKzbo/edit
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->